UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Essays in the economics of education : evidence from choice programs in Canada Shack, Daniel 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2016_september_shack_daniel.pdf [ 1.07MB ]
Metadata
JSON: 24-1.0307170.json
JSON-LD: 24-1.0307170-ld.json
RDF/XML (Pretty): 24-1.0307170-rdf.xml
RDF/JSON: 24-1.0307170-rdf.json
Turtle: 24-1.0307170-turtle.txt
N-Triples: 24-1.0307170-rdf-ntriples.txt
Original Record: 24-1.0307170-source.json
Full Text
24-1.0307170-fulltext.txt
Citation
24-1.0307170.ris

Full Text

Essays in the Economics of Education:Evidence from Choice Programs in CanadabyDaniel ShackBComm. (with High Distinction), University of Toronto, 2006M.A., University of Toronto, 2008A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Economics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)July 2016© Daniel Shack 2016AbstractThis dissertation uses elementary school-choice programs in Canada in order to examine issues relat-ing to the economics of education. Chapter 2 examines the role of parents’ uncertainty and learning ina setting where parents make dynamic education choices for their children and learn over time aboutunknown, child-specific returns to schooling. Using administrative data from the province of BritishColumbia and the French Immersion program, I estimate a dynamic model that incorporates imperfectinformation and parental learning into a school-choice framework. I find that new information parentsreceive after the initial enrolment decision accounts for a large fraction of program attrition, particularlyin earlier grades, and also raises student achievement. In chapter 3, using the same data as in chapter2, I estimate the causal impact that the French Immersion program has on short and medium-run stu-dent outcomes using a control function and instrumental variables approach that exploits variation inthe distance to the nearest immersion and non-immersion schools within a given neighbourhood. I findthat initial entry into the French Immersion program has large negative and significant effects on studentoutcomes in grade 4 in each of math, reading and writing. Over time, these effects decline such that bygrade 10, I find no effect on English scores, but there remains a negative effect on math scores. Chap-ter 4 examines how changes in peer composition from school-choice policies impacts students’ ownachievement. Using administrative data from the Canadian provinces of Ontario and British Columbiaand exploiting the Late French Immersion program, I estimate these peer effects using a two-stage resid-ual inclusion approach along with a school-fixed effects model. I find that as more children leave forthe choice program, there is a negative and significant effect on the remaining students; however, thisresult masks substantial heterogeneity. I find that an increase in the fraction of low-performing studentsto enter the choice program leads to increases in achievement for the remaining students; conversely, Ifind that high-performing leavers cause large reductions in the achievement of the remaining students.iiPrefaceThis dissertation is an original, unpublished, independent work by the author, Daniel Shack. Chapters2, 3 and 4 were all approved by the UBC Human Ethics Research Board under the name “The Impactof French Immersion on Student Ability” and project number H12-02813. The views expressed in thisthesis are those of the author, and do not necessarily reflect the opinions or views of the British ColumbiaMinistry of Education or the Educational Quality and Accountability Office of Ontario.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixChapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Chapter 2: What do Parents Know and When do They Know it. . . . . . . . . . . . . . . . 32.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 French Immersion Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Data and Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.6 Additional Counterfactuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.7 Extensions and Robustness Checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.9 Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.10 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Chapter 3: The Impact of Dual Language Learning on Student Outcomes . . . . . . . . . 603.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603.2 The French Immersion Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633.3 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65ivTable of Contents3.5 Empirical Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.6 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 793.8 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81Chapter 4: Peer Effects, Heterogeneity and School Choice. . . . . . . . . . . . . . . . . . . 874.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874.2 Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 914.4 Empirical Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1024.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1154.7 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117Chapter 5: Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131Appendix A: Data Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141Appendix B: Appendix to Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147B.1 Additional Model Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147B.2 Analysis of Change in Administration of the Standardized Tests . . . . . . . . . . . . 152B.3 Supplemental Figures and Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154Appendix C: Appendix tables to Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 163Appendix D: Appendix tables to Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 168vList of Tables2.1 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552.2 Correlation of Test Scores with Program Exit . . . . . . . . . . . . . . . . . . . . . 562.3 Correlation of Test Scores with Program Exit Across Grades . . . . . . . . . . . . . . 562.4 Residual Change in Test Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572.5 Model Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582.6 Change in Composition of Initial Enrolees from Pre-Enrolment Information . . . . . . 593.1 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 813.2 The Impact of French Immersion on Student Outcomes . . . . . . . . . . . . . . . . . 823.3 Distance Balancing Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833.4 Impact of Distance on Early FI Enrolment . . . . . . . . . . . . . . . . . . . . . . . . 843.5 Impact on Student Outcomes by Gender . . . . . . . . . . . . . . . . . . . . . . . . . 853.6 Impact of FI After Excluding Children Enrolled in Private School . . . . . . . . . . . 864.1 Summary Statistics: School Level Averages . . . . . . . . . . . . . . . . . . . . . . . 1174.2 Distribution of Achievement of LFI Students . . . . . . . . . . . . . . . . . . . . . . 1184.3 Within School Variation of LFI Leavers . . . . . . . . . . . . . . . . . . . . . . . . . 1194.4 Marginal Effects from a Logistic Regression of Late Immersion Enrolment . . . . . . 1194.5 Impact of LFI Leavers on Student Achievement . . . . . . . . . . . . . . . . . . . . . 1204.6 Heterogeneity of LFI Leavers – Ontario . . . . . . . . . . . . . . . . . . . . . . . . . 1214.7 Heterogeneity of LFI Leavers – British Columbia . . . . . . . . . . . . . . . . . . . . 1224.8 TSRI – Impact of LFI Leavers on Student Achievement . . . . . . . . . . . . . . . . . 1234.9 TSRI – Heterogeneity of LFI Leavers – Ontario . . . . . . . . . . . . . . . . . . . . . 1244.10 TSRI – Heterogeneity of LFI Leavers – British Columbia . . . . . . . . . . . . . . . . 1254.11 TSLS Estimates of Change in Peer Composition - Ontario . . . . . . . . . . . . . . . 1264.12 TSLS Estimates of Change in Peer Composition – BC . . . . . . . . . . . . . . . . . . 127B.1 Correlation of Test Scores with Program Exit by Subject . . . . . . . . . . . . . . . . 158B.2 Model Results — Additional Model Parameters . . . . . . . . . . . . . . . . . . . . . 158viList of TablesB.3 Cumulative Composition Changes from Baseline to Simulated Counterfactual . . . . . 159B.4 Baseline Model Extensions – Program Choice Parameters . . . . . . . . . . . . . . . . 160B.5 Baseline Model Extensions – Test Score Parameters . . . . . . . . . . . . . . . . . . . 161B.6 Correlation of Test Scores with Program Exit Pre and Post 2007 . . . . . . . . . . . . 162C.1 Difference in Means Across Standardized Test Scores and Secondary School Exams . . 163C.2 Impact of FI on Student Outcomes - TSLS Results . . . . . . . . . . . . . . . . . . . 164C.3 The Impact of French Immersion on Student Outcomes - With Catchment Area FixedEffects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165C.4 The Impact of French Immersion on Student Outcomes - Adjusted Bias Coefficients . . 166C.5 Impact on Student Outcomes by Current Enrolment . . . . . . . . . . . . . . . . . . . 167D.1 Total Imputed Number of MFI and LFI schools . . . . . . . . . . . . . . . . . . . . . 168D.2 Transition Matrix of Within-School Heterogeneity of LFI Leavers . . . . . . . . . . . 169D.3 School Characteristics by LFI Leavers . . . . . . . . . . . . . . . . . . . . . . . . . . 170D.4 Additional Summary Statistics: Student Level . . . . . . . . . . . . . . . . . . . . . . 171D.5 Distance Balancing Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172D.6 Impact of Non-LFI Leavers on Student Achievement . . . . . . . . . . . . . . . . . . 173D.7 TSRI – Impact of LFI Leavers on Student Achievement – All Scores as Controls . . . 174D.8 TSRI – Heterogeneity of LFI Leavers – Ontario – All Scores as Controls . . . . . . . . 175D.9 TSRI – Heterogeneity of LFI Leavers – BC – All Scores as Controls . . . . . . . . . . 176D.10 TSRI – Ordered Probit Second Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . 177D.11 TSRI – Ontario – Alternate LFI Defintion . . . . . . . . . . . . . . . . . . . . . . . . 177D.12 TSRI – Heterogeneity in Ontario – Alternate LFI Defintion . . . . . . . . . . . . . . . 178viiList of Figures2.1 Distribution of Achievement by Program of Enrolment . . . . . . . . . . . . . . . . . 432.2 Fraction of Children Remaining Enrolled in FI (i) . . . . . . . . . . . . . . . . . . . . 442.3 Fraction of Children Remaining Enrolled in FI (ii) . . . . . . . . . . . . . . . . . . . . 442.4 Correlation Between Program Exit and Test Score . . . . . . . . . . . . . . . . . . . . 452.5 Model Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.6 Model Fit: Correlation of Program Exit and and Test Scores . . . . . . . . . . . . . . 472.7 Model Fit: Out of Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482.8 Weight Placed on New Information Over Time . . . . . . . . . . . . . . . . . . . . . 492.9 Number of Years Remaining in FI by Signal Decile . . . . . . . . . . . . . . . . . . . 492.10 Simulated Attrition Under No Updating Scenarios . . . . . . . . . . . . . . . . . . . . 502.11 Difference in Predicted Test Scores From Baseline . . . . . . . . . . . . . . . . . . . 502.12 Simulated Attrition Under Ex-post Scenarios . . . . . . . . . . . . . . . . . . . . . . 512.13 Simulated Test Scores Under Ex-post Scenarios . . . . . . . . . . . . . . . . . . . . . 512.14 Simulated Attrition Under Additional ex-ante Information Scenarios . . . . . . . . . . 522.15 Predicted Test Scores Under Additional Information Scenarios . . . . . . . . . . . . . 522.16 Robustness Check: Weight Placed on New Information Over Time . . . . . . . . . . . 532.17 Robustness Check: Average Program Exit Grade by Grade K Signal . . . . . . . . . . 532.18 Robustness Check: Simulated Attrition With no Information . . . . . . . . . . . . . . 542.19 Robustness Check: Simulated Achievement with No Information . . . . . . . . . . . . 54B.1 Distribution of Achievement by Program Enrolment . . . . . . . . . . . . . . . . . . . 154B.2 Quantile Regression of Test Score on FI Enrolment . . . . . . . . . . . . . . . . . . . 155B.3 Average Exit Grade by Ability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155B.4 Predicted Test Scores under No Updating Scenarios . . . . . . . . . . . . . . . . . . . 156B.5 Simulated Achievement with No Information – Initial FI Students Only . . . . . . . . 156B.6 Simulated Attrition with Separate High School Switching Costs . . . . . . . . . . . . 157viiiAcknowledgementsThis dissertation would not have been possible without the support and assistance I received from thosearound me. In this section, I try my best to express my deepest gratitude to all of these people; although,it is more likely I am understating everyone’s contribution.I want to start off by thanking my supervisor, Kevin Milligan. From the very beginning of thisproject, Kevin was extremely supportive. Kevin devoted a substantial amount of time and also providedsome financial support just in regards to helping me apply for and obtain the data used throughout thisdissertation. After I obtained the data, Kevin continued to devote much of his time in order to provideme with excellent guidance and feedback as my chapters evolved. Kevin is an excellent teacher andresearcher and I am extremely grateful to have had the opportunity to learn from him — and I havelearned a lot. Next, I want to thank Marit Rehavi and Florian Hoffmann. Marit and Florian wereextremely generous with their time, provided me with valuable feedback and also taught me how to be abetter economist. I also want to thank David Green who, even though he was not on my committee, stilldevoted a lot of time towards helping me with structural equation modelling techniques and empiricalmethods.I am grateful for the many conversations I have had with faculty and colleagues throughout thisproject. I would like thank in particular Matilde Bombardini, Alix Duhaime-Ross, Nicole Fortin, JoshGottlieb, Hugo Jales, Hiro Kasahara, Thomas Lemieux, Timea Laura Molnar, A. Abigail Payne, PaulSchrimpf, Lori Timmins and all the participants in the UBC Empirical Workshop and UBC PublicFinance Reading Group for their insightful comments and valuable discussions.I am indebted to Colleen Hawkey, Victor Glickman and Florie Varga of Edudata for all of theirassistance in accessing the data as well as to the BC Ministry of Education for making the data available.I would also like to thank the Educational Quality and Accountability Office of Ontario for providingthe Ontario data used in this research and to Olesya Vovk for answering my data related questions.Financial support for this thesis was graciously provided by The Canadian Labour Skills and ResearchNetwork and the UBC Faculty of Arts.Finally, I want to thank my very best friend and partner, Amy Ringel, for her patience, support andlove throughout my time in the PhD program. Amy, I look forward to us spending this next phase ofour lives together.ixDedicationTo my parents, Martin and Helen, and my sister, MelissaxChapter 1: IntroductionFrom an economics point-of-view, schooling is one of the most important investments society makes inour children’s human capital. Countries with higher levels education tend to have longer life expectancy,higher incomes, and lower levels of poverty. For these reasons and more, economists have long beeninterested in studying policies related to education. One set of policies in particular that have received alot of attention is school-choice policies, which are growing in popularity all over the world. In school-choice policies, parents are provided with multiple options with regards to their child’s schooling, asopposed to being forced to attend one particular school based on the neighbourhood in which theylive. Furthermore, this increased choice can be either across schools or even within schools in termsof different programs being offered. The former relates to the potential of school-choice policies toincrease the efficiency of schools through competitive forces while the latter allows a diverse studentbody to choose the programs which best suit their needs. Given the important role that education playsin society and the growing importance of school choice, it is vitally important to understand preciselywhat are the costs and benefits of these policies.This dissertation uses variations of a large choice program in Canada known as French Immersionin order to examine issues that relate to school-choice policies more broadly. First, in chapter two, Iuse the French Immersion program in order to examine the role of uncertainty and learning in a settingwhere parents make dynamic education choices for their children and learn over time about unknown,child-specific returns. Using administrative data from the province of British Columbia, I estimatea dynamic model that incorporates imperfect information and parental learning into a school-choiceframework. I find that new information parents receive after the initial enrolment decision accounts fora large fraction of program attrition, particularly in earlier grades, and also raises student achievement.These results are driven by the value parents place on their child’s performance and the large impactthat new information has on parents’ beliefs about match quality. Using simulations, I find that provid-ing parents with perfect information after the initial enrolment decision leads to an overall decrease inattrition — suggesting parents are too responsive to information received — but has no effect on studentachievement. This leads to the surprising result that providing parents with more accurate information(post-enrolment) does not necessary lead to increases in predicted test scores. In contrast, providingparents with additional information prior to initial enrolment leads to large reductions in program attri-tion along with large increases in student achievement. Thus, even with the ability to learn over time,parents still strongly prefer to have information before rather than after the initial enrolment period.Finally, making it easier for parents to change schooling options leads to an increase in attrition, butthe impact on student achievement is small and in some cases achievement even declines. These results1Chapter 1: Introductionsuggest that switching costs are not a large hindrance on the impact of learning. I conclude this chapterby discussing several policy implications of these results.In chapter 3, I use the French Immersion program to analyze the impact that immersion languageprograms have on short and medium-run student outcomes. I estimate this causal impact using con-fidential administrative data on students in the Canadian Province of British Columbia along with acontrol function and instrumental variables approach that exploits variation in the distance to the near-est immersion and non-immersion schools within a given neighbourhood. After checking the validityof the instrument with several specification and balance tests, I estimate the causal impact on studentachievement in grades 4, 7 and 10. I find that initial entry into the French Immersion program has largenegative and significant effects on student outcomes in grade 4 in each of math, reading and writing.Over time, these effects decline such that by grade 10, I find no effect on English scores, but thereremains a negative effect on math and science scores. Additionally, I find that the results do not differby gender and that part of the estimated negative effect is the result of parents enrolling their child inprivate school instead of French Immersion.Both chapters 2 and 3 use a version of the French Immersion program known as Early FrenchImmersion, in which the intake grades are either kindergarten or grade 1. In contrast, chapter 4 uses atype of French Immersion program known as Middle or Late French Immersion, in which enrolmentoccurs in grades 4 or 6. Chapter 4 uses these programs to examine how changes in peer compositionfrom school-choice policies impacts students’ own achievement. Using administrative data from theCanadian provinces of Ontario and British Columbia, I estimate these peer effects using a two-stageresidual inclusion approach along with a school-fixed effects model. On average, the children who enterthese programs perform better academically (on pre-enrolment measures of achievement) than thosewho remain; although, children at all performance levels can and do enrol as well. I find that as morestudents leave for the immersion program, there is a negative and significant effect on the remainingstudents. Breaking down the analysis further shows that these overall effects are the result of substantialheterogeneity in the impact that different students leaving have on the remaining students. For example,I find that an increase in the fraction of low-performing students leaving to enter the choice programleads to increases in achievement for the remaining students. These effects are generally largest on theremaining students in the upper end of the ability distribution. Conversely, I find that high-performingleavers cause large reductions in the achievement of the remaining students. In the final section of thischapter, I use the choice program as an instrument for looking at total changes in peer composition. Ifind significant gains in achievement from replacing low-performing peers with high-performing peers.Finally, in chapter 5 I conclude with a review of each of three previous chapters and discuss importantimplications that each of these chapters has for school-choice policies in general.2Chapter 2: What do Parents Know andWhen do They Know it: How LearningMatters for Parents’ Dynamic EducationChoices2.1 IntroductionAs the number of school districts implementing school-choice policies rapidly increases, so too does theimportance of parents’ ability to learn about the fit between their child and each schooling option.1 Manypapers have documented the ability to learn and imperfect information that parents possess about boththeir children and the schooling options available.2 However, parents’ ability to learn about the matchquality between their child and a given schooling option is central to many open questions surround-ing parents’ schooling decisions.3 For example, how much do parents weight new information whenupdating their beliefs about match quality and how important is this information for parents’ school-ing decisions? And does this learning lead to better matches and improvements in children’s academicachievement?The answers to these questions are essential for understanding how parents make schooling choicesin a dynamic setting. Furthermore, these questions also have policy implications. Examples of suchpolicies include designing school-choice policies, the timing and nature of information provided to par-ents and parental involvement more generally. Parental learning is important because in many settingsthere is a lot of heterogeneity in the match quality between a child and a given schooling option, of whichparents are unlikely to have perfect information ex-ante. If parents are uncertain about the match qualityat the initial point of enrolment, then it is only after enrolment — when one can observe performance— that parents have the opportunity to learn the true match quality. Learning is what allows parentsto discover the schooling choice that provides the greatest value to their child and potentially improveupon the optimal ex-ante choice. For example, parents can minimize the negative effects of a poor match1In the 2013-2014 school year, 55% of school districts in the United States had choice-policies in place for all students.This is more than double the 24% figure in the 2000-2001 school year (Whitehurst and Klein, 2015)2See, for example, Hanushek et al. (2007), Hastings and Weinstein (2008), Jensen (2010), Friesen et al. (2012), Andrabi etal. (2014), Bergman (2014) and Dizon-Ross (2014).3By “match quality”, I am referring to the difference in terms of how a child performs in a given schooling environmentwhen compared to some standard baseline alternative.32.1 Introductionthat was unknown at the initial point of enrolment. In many other areas, research has documented largewelfare gains that come from the ability to learn over time about unknown variables.4 However, muchless is known about the role parental learning plays in the decisions of parents of primary school-agedchildren.This paper is the first to model and estimate the impact of uncertainty and learning in a setting whereparents learn about child-specific returns and make dynamic educational choices. To study this issue,I exploit a unique institutional feature in Canadian public education whereby most parents have theoption of enrolling their child in a French Immersion language program (hereafter “FI”). This choiceprogram possesses several features that make it a good setting for studying parental learning. First,enrolment in the program primarily occurs when children are four or five years old. Thus, parents mustmake their initial enrolment decision at a time when they are unlikely to have perfect information. Anadditional advantage of the early start time is that it allows one to observe parents’ choices from the verybeginning of their child’s primary schooling years.5 Second, since the program is offered by the publicschool system, there are no financial entry or exit costs such as tuition or school fees and many of theinstitutional rules (for example, the curriculum, policies related to parental involvement, how studentsare graded, etc.) are identical in both settings. This potentially further reduces the cost of switchingbecause parents will be enrolling their child in a familiar setting.6 Finally, qualitative evidence suggeststhat a large fraction of FI program attrition is caused by parents responding to their child struggling inschool.7 Intuitively, in French Immersion, parents are forced to make an enrolment decision withoutany knowledge about how their child will cope in a foreign language environment. Only after observingtheir child’s performance do parents begin to believe their child might be better off out of the program.Since this pattern is not limited to only immersion language programs, the conclusions I draw from FIcarry over more generally to other school choice settings.Using administrative panel data from several large school districts in the province of British Columbia(BC), I estimate a structural equation model that models the joint dynamics of both the program en-try and exit decisions of parents and children’s academic achievement. The model explicitly capturesparental learning and flexibly allows for multiple dimensions of ability, systematic quality differencesacross the programs, and heterogeneity in terms of program preferences and parents’ private informa-tion about their children. The key source of parents’ uncertainty in the model is their child’s (relative)performance in the FI program; however, parents are able to learn about this match quality throughnoisy signals received during each school year. These signals can be interpreted as the culmination ofall information parents receive in a given year; for example, parents observing their child’s school workand interactions with their child’s teacher. The parameters related to parental learning are estimated offthe variation in how parents’ program exit decisions vary and respond to information at different points4Examples include learning about labour market outcomes (Nagypál, 2007), health care (Handel and Kolstad, forthcoming,Chernew et al,˙ 2008) and post-secondary education (Stange, 2012, Stinebrickner and Stinebrickner, 2012, 2013, 2014).5This contrasts with the current school-choice literature which tends to focus on students entering middle or high-school.A large early-childhood development literature emphasizes that the key formative years of a child’s development are aroundages 6–10 (Cunha and Heckman, 2008).6In some cases, parents can switch their child out of the FI program while keeping their child in the same school.7See, for example, Croll and Lee (2008) and Obadia and Theriault (1997).42.1 Introductionin time. This estimation is aided by the fact that I am able to follow the same child over time for theirentire primary schooling years and the fact that I possess data on student achievement in the form ofstandardized test scores.This very parsimonious model fits the data extremely well. This is true for both the model’s in-sample and out-of-sample predictions. I find that the information parents receive after initial enrolmentis an important component of parents’ decisions to either keep their child enrolled in the program orswitch him or her into an alternative program. This result follows from two key estimates of the model.One is that parents derive a high utility from having their child perform well in school. The other is thatthe new information parents receive after the initial enrolment decision has a large impact on parents’beliefs about the match quality between their child and the FI program. These updated beliefs are whatmotivate parents to choose alternative schooling options for their child. For example, I find that parentswho receive information that their child is better off exiting the program will, on average, remove theirchild up to five years earlier than parents who receive information that their child is better off in theprogram. How responsive parents are to new information varies substantially by grade. When updatingtheir beliefs, parents weight new information in earlier grades up to 7 times higher than informationreceived later on; however, the results also suggest parents are over weighting the information theyreceive in the earlier grades by as much as a factor of 3. In order to further explore the impact of parentallearning and uncertainty, I simulate the model assuming that the state of the world is such that parentsreceive no new information after the initial enrolment decision. This leads to several interesting results.First, I find that initial enrolment drops by 36%. Initial enrolment declines because the simulationeliminates the value of the program to parents that comes from the opportunity to learn in the futureabout their child’s fit. Secondly, I find large decreases in program attrition especially in earlier grades.In particular, program attrition rates between kindergarten and grade five decline by approximately 70%.Finally, I find that student achievement decreases with the predicted test scores of children enrolled inthe program declining by an average of 0.06 standard deviations (σ ) and a high of 0.09σ by grade eight.These findings are robust to a variety of extensions to the baseline model.As mentioned above, there is evidence that parents are over weighting information they receive inthe earlier grades. This leads to the possibility that many parents are making program choices thatthey would not otherwise make with more accurate information. Furthermore, even if parents weightedinformation “correctly”, it would still be the case that having more accurate information causes someparents to alter their schooling choices. In order to explore these issues, I simulate the model underthe assumption that parents receive perfect information after the initial point of enrolment. I find thatthis simulation leads to large changes in terms of which children remain or exit the FI program. Forexample, 40% of parents who had removed their child from the program by the beginning of gradesix now choose to keep their child in the program as a result of the extra information received. Onesurprising result is that the extra information does not lead to an increase in predicted test scores. Manyof the children who now remain in FI are slightly better off academically out of the FI program, but dueto preferences unrelated to performance, parents still prefer to have their child stay in the program. Thus,to the extent that initial enrolment is a function of preferences, additional information on performance52.1 Introductionpost-enrolment will not necessarily lead to higher levels of achievement.In general, it is much harder for parents to transfer their child between schooling choices in themiddle of their child’s primary schooling education than it is to make an initial enrolment choice. Thiseffect is seen in the model through large estimated psychic costs to parents of switching their child out ofthe immersion program. Therefore, even if parents are learning, it could be that with the exception of themost extreme mismatches, many parents are prevented from leaving (or choose to leave at a later grade).In order to quantify this impact, I run several counterfactuals where I make it easier for parents to leavethe FI program. I find that the children induced to leave FI from the lower (psychic) switching costsare those children whose achievement gains from leaving are small (and in some cases even negative).These results suggest that — in this context — the impact of learning is not greatly diminished by thepresence of switching costs.The simulations discussed above all involve interventions taking place after the point of initial enrol-ment. But, the main reason that learning is important is because it alleviates the “cost” of the uncertaintyparents face when making their initial schooling decisions. This cost comes in the form of parents mak-ing initial choices that would not be optimal if parents had access to perfect (or additional) informationex-ante. I explore these issues by running simulations giving parents additional information ex-ante upto and including full information. I find that these simulations lead to better matching of children toprograms at the point of initial enrolment which in turn causes large decreases in program attrition andlarge increases in student achievement. Therefore, even with parental learning, parents are much betteroff learning information about match quality before rather than after the initial enrolment decision.This paper makes several contributions. First, it contributes to the literature examining the role ofinformation on schooling decisions. Most of what we know in this area comes from parental responsesto average quality measures at a given time period (c.f. Jensen, 2010, Friesen et al., 2012, Hastingsand Weinstein, 2008, Andrabi et al., 2014).8 In contrast, this paper focuses on parents’ responses tochild-specific information while making choices in a dynamic setting. These distinctions are importantbecause in practice: (i) parents are making choices every school year and (ii) children are not identicaland average quality will not perfectly convey to parents the specific value that a choice will provide totheir child. These two points imply a setting in which each school year parents receive new informa-tion, update their beliefs, and decide whether or not to continue in their current choice. More recently,several papers have examined the implications of randomly providing parents with information abouttheir child’s performance and how this can affect schooling choices (c.f. Andrabi et al., 2014) andparental investments (c.f. Bergman, 2014, Dizon-Ross, 2014).9 Many of these papers are randomized8A separate strand of literature shows that when given a choice, being accepted into more preferred schools leads to betteroutcomes (Deming et al., 2014). This suggests that parents are somewhat informed when making schooling choices. Thisclaim is backed up by a large literature showing that school performance impacts housing prices in the area (see, for example,Black, 1999 and Black and Machin, 2011).9Dizon-Ross (2014) conducts a field experiment in Malawi where parents were given information about their child’s per-formance in school. She find that parents that received the information updated their beliefs and were more willing to makecorresponding investments on behalf of their child. Andrabi et al. (2014) examines an intervention in Pakistan where informa-tion on school and child quality was provided to parents in randomly selected villages. Bergman (2014) looks at the imperfectinformation of parents in the context of a principal-agent problem between parents and their children. Children know their62.1 Introductioncontrol trials looking at specific interventions. While they all find that new information affects parents’behaviour, it remains an open question how parents’ dynamic schooling choices are impacted by in-formation received at different points in time.10 Since my paper models parents’ choices throughouttheir child’s entire primary schooling years, I can examine the effect and value of information acrossdifferent grades. Furthermore, the setting in this paper also differs since in this context parents makechoices knowing they will receive information in the future.Learning about match quality in an education and dynamic setting has been studied in more detail inpost-secondary education (c.f. Arcidiacono, 2004, Stange, 2012, Stinebrickner and Stinebrickner, 2012,2013, 2014). Examples of sources of uncertainty studied in a post-secondary environment include thereturns to 4-year or 2-year colleges and the returns to different majors or fields of specialization.11 Ageneral finding in these papers is the divergence between the optimal ex-ante and ex-post decisions.12However, because these papers focus on college students learning about themselves, it is unlikely onecan simply apply the estimates of this literature to parents learning about their (primary school-aged)children.Another contribution of this paper is to the literature on the impact of student mobility and switchingschools (c.f. Hanushek et al., 2004; Cullen et al., 2006). This study differs from these papers in severalcritical ways. First, by framing this analysis around immersion language programs, I focus on a veryspecific subset of parents’ schooling decisions rather than every instance in which a child attends a newschool. It is not even the case that the former is a subset of the latter. In my data, a child can changeschools but remain in FI while another child can leave FI, but stay in the same school.13 As discussedabove, an advantage of focussing on the FI program is that there is likely more heterogeneity of the im-pact the program has on a child, and thus larger potential gains from switching. A second way this paperdiffers from this literature is by imposing much more structure on the analysis by explicitly modellingthe parents’ schooling decisions. This allows me to focus on the impact from parents responding to newinformation about their child’s match quality with the immersion program.Finally, this paper contributes to the literature on specialty schools. “Specialty Schools” encompassa variety of school types including Magnet Schools, Immersion Language Programs, STEM focussedschools and Career Academies. Specialty Schools are an important aspect of any school-choice policy.performance and how much effort they are exerting in school, but parents do not. By randomly selecting parents to receive ad-ditional information about their child’s performance in school and level of effort, Bergman (2014) finds that providing parentswith this information leads to an increase in parental involvement, student effort and student achievement.10Another important difference is that many of these papers look at parents in developing countries.11In many of these models, there is some parameter that is unknown to the student at the time he or she enters college(e.g. college aptitude). Individuals have an initial belief about this parameter and update their beliefs as they progress throughschool and as a result some decide to switch majors (Arcidiacono, 2004, Stinebrickner and Stinebrickner, 2012, 2013) whileothers drop out of school entirely (Stange, 2012, Stinebrickner and Stinebrickner, 2012, 2014). The model used in this paperis closest to the one used by Stange (2012).12For example, Cunha, Heckman and Navarro (2004) examine the role of uncertainty about future wages when individualsare making their college entry decision. They find that if individuals knew at age 19 about their specific returns to educationthen 25% of high-school graduates would have chosen to enrol in college and 30% of college graduates would have neverenrolled in college.13Approximately just less than half of parents who have the option of remaining in the same school after taking their childout of FI choose to do so.72.2 French Immersion ProgramThey are likely to be where the issue of heterogeneity within students is most acute. The overall welfaregains from matching students with tastes for science and drama to schools that specialize in STEM andtheatre, respectively, are likely far greater than the other way around. Similarly, it would be inefficientto enrol an average-performing student in a gifted or remedial program. All of these examples serve toemphasize the importance of match quality when making schooling decisions.The remainder of this chapter is organized as follows. Section 2.2 gives an overview of the FrenchImmersion program. Section 2.3 describes summary statistics and descriptive results with regards to exitand entry from the program. Section 2.4 describes the dynamic structural model. Section 2.5 presentsthe results of the model. In section 2.6 I present counterfactual simulations using the model estimates.In section 2.7, I show that the model’s results are robust to several robustness checks and extensions ofthe main model. Section 2.8 concludes.2.2 French Immersion ProgramFrench Immersion refers to a (fully) publically funded immersion language program offered throughoutmost of Canada. The goal of the program is to promote bilingualism. French is an official language ofCanada and many occupations (such as those in the Federal Government or select service jobs) requirefluency in both English and French as a condition of employment. Since the data for this chapter consistssolely of students in the province of British Columbia (BC), the remainder of this section describes theprogram as it pertains to BC (although the program generally has a similar structure throughout Canada).The program is designed for parents and students who speak little to no French.14 FI is primarily offeredthrough local public school boards and any child is free to initially attend an FI program as they wouldattend any other public school.15 While the FI enrolment rate in BC is approximately 10%, this numberunderstates the true demand for FI because enrolment is often constrained due to a limited number ofclassrooms and provincial rules on class sizes.16 The curriculum taught in FI mirrors that taught inthe regular school program; the major difference is the language of instruction. For elementary schoolchildren, the most common type of French Immersion program is the Early French Immersion programwhich children can enter in grades kindergarten (K) or one.17 In other types of French Immersionprograms such as Late Immersion entry occurs in grade six. The focus of this chapter is purely on theEarly French Immersion program. The exact distribution of classroom time in English and French variesby school district. For example, the Vancouver School Board requires that the program be 100% Frenchin grades K-3, introduces English in grade 4 at an 80/20 split favouring French and for the remaining14Separate schools exist for children of Francophone parents. These are schools in which French is not simply used forclassroom instruction, but also communication amongst staff and to parents.15Private FI schools do exist, but are extremely rare. In the dataset used in this paper, private schools make up less than 2%of the overall FI population and less than 0.2% of the total student population16I discuss this issue more in section 5, where the model will incorporate these capacity constraints.17Children are allowed to enter in the later grades if they are transferring from an FI program in a separate school board orcan more generally demonstrate some proficiency in French. In my data, fewer than 0.1% of all children enrol in the programin grades 2 or 3.82.2 French Immersion Programyears until grade 7 approximately 50–80% of the classroom instruction time remains in French.18 Inpractice, most school districts have a language distribution that closely follows the Vancouver example.Students successfully completing the FI program through grade 12 are awarded a special diploma knownas a “Bilingual Graduation Certificate.”19The remainder of this section deals with aspects of the FI program which are important in the con-text of this chapter. First, since it is the variation in parents’ decisions to exit the FI program that is theprimary measure of parental learning, I begin with a discussion of what qualitative studies have foundsurrounding the issues of FI attrition. Next, I include a discussion on the literature focussing on predict-ing individual success in the FI program. This literature finds that while many characteristics are ableto predict success in both FI and non-FI programs (implying the two are highly correlated), what is lesswell understood is why some children perform relatively better or worse in an FI program.2.2.1 Program AttritionThroughout school boards across Canada, FI attrition rates are quite high, particularly in early grades.For example, in the sample used in this chapter, 17% of children initially enrolled in the FI program exitby the beginning of grade 3.20 Current evidence suggests that one of the key factors behind parents’decision to remove their child from the program is their child’s performance in school. These studiesare qualitative in nature and based on surveying various stakeholders (e.g. parents, teachers and admin-istrators). For example, in a research report written on request of the New Brunswick government, Crolland Lee (2008) claim thatThe first point of significant attrition, which pertains only to the Early Immersion program,is at the end of Grade 1, the first year of entry into the FI program in New Brunswick, drivenprincipally by the belief that the children are not capable of adjusting to the choice of animmersion program.Obadia and Theriault (1997) surveyed principals, teachers and program coordinators in British Columbiaschool districts about their thoughts on the causes of attrition in the French Immersion program. Theauthors focus on students enrolled in FI in middle school and the beginning of secondary school (grades7-10). The authors find that across respondents, academic difficulty is most often cited as the main causeof FI attrition.212.2.2 Performance in FIWhat do we know about a child’s performance in FI both in absolute terms as well as when comparedto not being enrolled in the FI program? The answer to this question has important implications for18Source: http://www.vsb.bc.ca/programs/early-french-immersion. A common practice is to gradually increase the amountof instruction time in English in each grade.19This is in addition to the regular diploma received by all students upon graduation.20Using data from the NLSCY – a survey in Canada analogous to the NLSY in the USA – I find that approximately 14% ofchildren exit by the beginning of grade 2.21Specifically, 87% of teachers, 62% of coordinators and 59% of principals cited academic difficulty.92.2 French Immersion Programhow one should model FI performance. A large education and linguistic literature exists examiningthese very issues. One of the main areas of interest in this literature is trying to identify screeningtests (in the child’s native language, “L1”) that will predict success in an immersion program, or moregenerally success in second language (“L2”) acquisition. Jared et al. (2011) provides a review of thisliterature along with their own analysis. Tests shown to be correlated with success in FI include tests ofphonological awareness (which measures how well the child can distinguish between different soundsand syllables), grammatical ability (where children were asked to match pictures to given sentences)and “Rapid Automized Naming” (which measures the speed at which children can read off a list ofletters and numbers on a given grid). Furthermore, research looking into predicting success of Englishspeaking students in a standard monolingual program shows that many of the tests mentioned above arealso correlated with success in this setting as well (Bowey, 2005, National Reading Panel, 2000). Thissuggests that expected performance in an FI program is positively correlated with expected performanceoutside of FI.22Despite the high correlation between performance in and out of the FI program, perhaps it is thecase that only gifted students or those with high cognitive skills are able to succeed in an FI programwhile children with low cognitive skills are better off not enrolling in FI altogether. As summarized in areview by Genesee (2015), this belief is not supported by the current literature. There is little evidenceto suggest that only high ability children will succeed in the FI program.23 Furthermore, in section2.4, I present additional evidence using the data used in this paper that also counters the idea of a highcomplimentarity between FI performance and ability. However, this does not preclude the possibility ofother dimensions through which some children benefit from FI while others do not. It could be that achild with a low baseline ability finds FI a more stimulating environment and actually performs better inthe program while a child with a high baseline ability dislikes the French program and is better off in thestandard curriculum. Predicting which children will perform better or worse in an FI program remainsa very open question in this literature (Paradis et al., 2011).While there is a large literature that attempts to evaluate the average impact of the French Immer-sion program, fewer papers look at the impact of children once they switch out of the program. Genesee(2007) and Genesee and Jared (2011) both provide reviews of this literature. Overall, there is no con-sensus on whether children who switch out of FI are unambiguously better off than similar childrenremaining in FI. Some papers find that switching out of FI had either a negative or no effect, whileother papers find that students showed improvement once exiting the program. However, all of thesepapers generally suffer from small sample sizes which makes it difficult to obtain precise estimates.2422Newer research expands on these results and tests to see if interventions designed to improve a child’s phonologicalawareness skills leads to better performance in an FI program (Wise and Chen, 2014, Wise, 2014). It is interesting to notethat even though these papers focussed on developing the child’s English phonological skills, it still translated to success in animmersion environment.23In fact, some researchers will argue that even for children who might be at risk to struggle in reading, “their learningdifficulties do not impair their language abilities beyond that seen in monolingual children with the same learning challenges”(Genesee, 2015, pg. 12).24For example, Bruck (1985a and b) finds that students who switched out of FI performed no better than students whoremained. However, a closer look at the results suggest many tests may have suffered from a lack of power. In total thereare 74 children in the sample, 30 of whom end up switching out of FI. In six of the eight cognitive tests administered, the102.3 Data and Descriptive StatisticsA potential confounder in this research is the likely endogeneity of the decision to exit FI and thereforesimply comparing children who switched out of FI to those that remained — even after controlling forbaseline measures — could lead to biased results. In this paper, I address both of these concerns byobtaining administrative data on the universe of students in several BC school districts and by explicitlymodelling the program choices of parents.2.3 Data and Descriptive Statistics2.3.1 DataI use confidential administrative data on entering kindergarten cohorts in school districts in the Canadianprovince of British Columbia for the time period 1999–2009. The data covers 95% of students in theMetro Vancouver Area as well as the school district of Greater Victoria — the two largest urban centresin the province.25 The data is defined at the cohort level; a student observed in the data in kindergartenis followed over time and remains in the sample so long as he or she attends an elementary, middle,or secondary school anywhere in British Columbia. The data goes up to the 2012/2013 school year;therefore, there will be students who I do not observe all of way through their primary schooling years.A nice feature of this dataset is that both private and public school students are observed, allowingme to reduce the censoring that occurs in other datasets when children leave the public school system.For each child, I have information on gender, special-education status, English-as-a-Second-Language(hereafter ESL) status26, language spoken at home, school attended, FI enrolment, standardized testscores and home postal code.27 In order to obtain additional characteristics, I link each child’s postalcode to Canadian Census data by merging on data at the Dissemination Area level (a Dissemination Areais a geographic area with approximately 400-700 people). Dissemination Areas are the smallest unit forwhich aggregate Census data is made available. Census variables extracted include: average householdincome, average home prices, unemployment, immigration and education (percentage of residents withbachelor degree or higher). The final variable I construct is the driving distance between a postal codeand the nearest type of school (see Appendix A).The 2.5% of children who ever repeat, skip or go back a grade are dropped from the sample. Afurther 3% of children are dropped by the exclusion of all special-education children with “severe”conditions.28 Appendix A contains more information on the raw data and the construction of the finalswitching students outperformed the non-switching students (but not significantly). In addition, 83% of the switching parentsclaimed their child was doing better than last year versus 58% for the non-switching parents.25The exact school districts are: Vancouver, Surrey, Burnaby, Richmond, Coquitlam, Abbotsford, Langley, Delta, NorthVancouver, West Vancouver and Greater Victoria. Ten of the 11 largest school districts in the province are in this dataset. Intotal, these school districts make up just over half of the province of British Columbia.26Note that starting in January 2012, the English-as-a-Second-Language program was renamed as the English LanguageLearning program.27A postal code is a designated geographic area analogous to zip codes in the US (but much smaller) or postcodes in theUK. The average number of households in a postal code is approximately 19 (Statistics Canada).28These are defined as those conditions for which more than 50% of children were excused from writing the standardizedtest scores.112.3 Data and Descriptive Statisticssample. The final sample contains 248,517 students, 26,063 of whom initially enrol in the FI program.Data on student achievement comes in the form of standardized test scores known in BC as the Foun-dation Skills Assessment. These are a series of tests in numeracy (hereafter referred to as the “math”test), reading comprehension (hereafter referred to as the “reading” test) and writing administered to allBC students in grades four and seven. These are low-stakes tests. Their results have no direct implica-tions for student advancement, teacher evaluations or school funding. Most important in the context ofthis paper is the fact that all students write the exact same test in the exact same language — English.29In order to increase precision, for most of this paper I use the average over the standardized test scoresin math, reading and writing as the primary measure of achievement. All scores are standardized tobe mean 0 with variance 1 at the year-grade level.30 Appendix A contains additional details about theFoundation Skills Assessment.2.3.2 Descriptive ResultsTable 2.1 compares the characteristics of children who initially enrol in FI (that is, children enrolled inFI in either kindergarten or grade 1) with those who do not. All of the statistics in table 2.1 are basedon each child’s characteristics in kindergarten. The largest differences between FI and non-FI childrenare seen in variables that relate to the child’s first or home language. Children in FI are far less likely tobe designated an ESL student (4% vs 31%) and much more likely to come from a household in whichEnglish is spoken at home (87% vs 65%). These differences suggest that the results which follow will bemost applicable towards native language speakers. Other differences include that FI children are morelikely to be female, live in areas with slightly higher average levels of income and education and lowerunemployment, and live 1 km closer to the nearest FI school.31Figure 2.1 presents the distribution of achievement of the standardized tests in grades four and seven.The top panel shows the distribution of achievement for children enrolled in FI in grades four and sevenand children not enrolled in FI. For children in grade four, the FI students slightly outperform the non-FI students. The average scores of the FI and non-FI students are 0.08σ and 0.04σ , respectively. Forchildren in grade seven, the FI children continue to outperform the non-FI children with even largerdifferences. The average scores of the FI and non-FI students are now 0.15σ and 0.01σ , respectively.3229This fact is confirmed by both information on the BC Ministry of Education website and the data itself which contains avariable indicating language of the test. The only children who write the Foundation Skills Assessment in French are childrenin schools designed for Francophone children, who are not included in the sample.30These standardizations were primarily performed using the values of the mean and variance for the entire population ofstudents in British Columbia (which are publically available). Exceptions include the standardization of the writing score andthe variances of the combined scores which were calculated using the children in this dataset.31Many of these results are consistent with Worswick (2003) who compares the characteristics of FI and non-FI studentsusing data from the National Longitudinal Survey of Children and Youth (NLSCY). In table 2.1, the percentage of a dissem-ination area with a bachelors degree or higher is defined using a denominator of the total population 15 years or older in theDA. This was done in order to make this variable comparable across the aggregated census data years. More generally, theDA variables should be viewed as proxies for household differences as opposed to the actual differences in variables such asincome and education.32This result of the FI students performing better in later grades is a common stylized fact in many jurisdictions (See Hartet al., 2003, 2006 for evidence from Ontario). One common explanation for this result is that since the tests are administered122.3 Data and Descriptive StatisticsBoth of these differences are significant at the one percent level. However, some of these differencescould be the result of survivorship bias within the program. I explore this possibility in the bottompanel of figure 2.1 which shows the distribution of test scores for grades four and seven after splittingthe sample by initial enrolment into the FI program. The distributions are now much closer togetherthan those seen in the top two graphs. The differences in standardized test scores between students whoinitially enrol in FI and those who do not are now -0.01σ (standard error of 0.01) and 0.07σ (0.01)in grades four and seven, respectively. These results suggest that higher performing students are morelikely to remain in the FI program. Similar results can also be seen in appendix figure B.1 which showsthe distribution for the math and a combined reading/writing scores (hereafter “English language arts”or “ELA”).Program AttritionNext, I present correlations related to program attrition for all children who initially enrol in the FIprogram. Attrition is important because it is variation in the exit rates of FI students that is used to backout estimates of parental learning. Figure 2.2 shows the unconditional Kaplan Meier survival graphwhich calculates the percentage of children remaining in the FI program at the beginning of each grade.Approximately 20% of initial enrolees exit the program by the beginning of grade four and 30% ofinitial enrolees leave by the beginning of grade seven. The vertical height of the step function is thehazard rate between two consecutive grades. Focusing for now on grades 1–7, the hazard rate is highestin the first two years averaging approximately 6% per year. The hazard rate then drops to approximately3–4% per year over the next three grades and then reaches 2% between grades six and seven. This isconsistent with the qualitative evidence discussed above which notes relatively higher attrition rates inthe first few grades. Between grades seven and eight, approximately 11% of initial enrolees exit theprogram. This change is largely driven by the fact that in most school districts in British Columbia,grade eight marks the start of secondary school.33 In order to see this, approximately 20% of my sampleresides in school districts where secondary school begins in grade nine.34 The survival graph limited tostudents in these districts is shown in figure 2.3. We now observe a large exit rate between grades eightand nine and the exit rate between grades seven and eight is now where the hazard rate is closer to itslowest point.35The patterns of attrition observed in the previous graphs are consistent with the notion that parentsare learning about the match quality between their child and the program. Since parents have lessinformation about their child at the start of the program, they will be more responsive to new informationin English and English is not formally introduced into the curriculum until grades 3 or 4, then the FI children need time beforethey can fully catch-up to their counterparts in the standard program (Genesee, 2007).33For example, parents might view the payoffs to the FI program in secondary school differently than they do when theirchild is in elementary school.34These are the school districts of Greater Victoria, Coquitlam and Abbotsford.35Figure 2.3 also shows a much larger drop between grades 5 and 6 than that seen in figure 2. This drop is caused by thefact that districts where secondary school begins at grade 9 also have separate schools for elementary (grades K–5) and middleschools (grades 6-8). In contrast, schools in the remaining school districts will offer all grades from kindergarten all the wayup to grade seven.132.3 Data and Descriptive Statisticsin earlier grades. This evidence, however, is far from conclusive as other phenomena — unrelated tolearning — could also generate these attrition rates. For example, one possibility is that in the earliergrades, many parents are close to indifferent with regards to having their child enrolled in FI; thus, evensmall negative idiosyncratic shocks will induce these parents to remove their child from FI. However, asparents with a low preference for FI exit the program, the remaining parents all have a high preferencefor the program. For these parents it takes a larger negative shock for them to remove their child fromthe program, and as a result, attrition declines compared to earlier grades. Given this and other potentialnon-learning explanations for the pattern observed in figure 2.2, I now focus on descriptives related tofactors correlated with FI attrition.In order to ascertain whether parents are responding to their child’s performance when making theirprogram decisions, I would ideally estimate hazard regressions of the following formExiti(t+1)(g+1) = G(X′t β +θ Iitg+δst + εi(t+1)(g+1)) (2.1)where G(.) is the logistic function, Exitit+1g+1 is an indicator variable equal to 1 if student i exited theFI program between grade g in year t and grade g+1 in year t +1, Xt is a set of (not necessarily) timevarying observable characteristics, δst are school year fixed effects and Iitg represents all informationreceived by parents about their child’s performance in school during grade g in year t. We can thinkof Iitg as an index of all the information received by parents in a given school year such as their child’sreport card, interactions with their child’s teacher or their child’s homework. Unfortunately, I do notobserve Iigt directly and the only outcome measures I have are the students’ standardized test scores ingrades four and seven. Therefore, I instead run the following regressionExiti(t+1)(g+1) = G(X′t β +θT Sitg+δst + εi(t+1)(g+1))where T Sigt is the student’s own average test score in grade g in year t. Here, I continue to assumethat parents receive all of the information I mentioned above and make an additional assumption thatthe standardized test scores proxy for the information received by parents in a given school year.36(Reassuringly, as shown in figure 2.4, there is a clear negative relationship between program exit andthe standardized test scores.) The sample in all regressions is limited to students who are enrolled inFI at the start of grade g for g = 4,7. For ease of interpretation, all results displayed are transformedodds-ratios:(eθ −1). These odds ratios can be interpreted analogously to elasticities; a 1σ increase inthe test score reduces the probability of FI exit by −100∗ (eθ −1)%.Table 2.2 shows the results of the hazard regressions from equation (2.1). Columns 1–3 of table 2.2show that the relationship between the test score and program exit is quite consistent even as we add inadditional controls. On average, a 1σ increase in the average test score is associated with a decrease inthe probability of program exit of 27-29%. In columns 4 and 5, I introduce additional fixed effects into36I am not saying that parents are responding to the Foundation Skills Assessment tests directly, but that the tests arecorrelated with a child’s overall performance in a given year. In Appendix B, I present evidence that parents do not responddirectly to the Foundation Skills Assessment tests.142.3 Data and Descriptive Statisticsthe model in the form of school fixed effects and school year fixed effects, respectively.37 The purposeof these specifications is to make sure that the correlation between test scores and program exit is notbeing driven by the school attended or shocks in a given school year (e.g. the effect of a teacher or peereffects). Even with these additional fixed effects, we continue to observe a similar correlation betweentest scores and program exit to those observed in columns 1–3.38 All of these results remain suggestiveof parents responding to information about their child’s performance when making program enrolmentdecisions for the following year.By taking the average across the Foundation Skills Assessment components, I might be suppress-ing substantial heterogeneity in terms of the responses of parents across different subject matter. Forexample, one concern might be that parents of children in FI are more sensitive to their child’s englishlanguage ability than their math skills. However, this belief is not supported by data. I test this hy-pothesis by running separate regressions for the child’s math and ELA scores. The results are shownin Appendix table B.1. I find a similar correlation with FI exit across each of the components. Forexample, in columns 3 and 6, a 1σ increase in the test scores for math and ELA is correlated with a31% and 32% reduction in the probability of FI exit, respectively. Furthermore, running a regressionwith both the math and ELA scores included finds remarkably similar results for both components. A1σ increase in either the test scores for math and reading/writing is correlated with a 16-23% reductionin the probability of FI exit. Testing the equality of the coefficients in every specification fails to rejectthe null hypothesis that the two coefficients are the same.If the values in table 2.2 reflect learning on the part of the parents, then we should see a smaller(in magnitude) effect in grade seven. The reason is that parents of children in grade seven have moreinformation about their child and thus should be less responsive to any new information. In order to testthis hypothesis, I modify equation (2.1) to include an interaction term between a dummy variable forgrades 4 or 7 and the standardized test score:Exiti(t+1)(g+1) = G(X ′t β +θT Sitg ∗ (grade = 4)+piT Sitg ∗ (grade = 7)+δst + εi(t+1)(g+1))(2.2)Table 2.3 displays the associated odds ratios for both grades four and seven as well as the test of thehypothesis that the two coefficients are equal (the row labelled “P-Value”). I find that a 1σ increasein the average test score is associated with a reduction in the probability of FI exit by approximately44–51% and 11-22% in grades four and seven, respectively. As before, this result is robust to a varietyof specifications. Across all specifications, the log-odds ratio in grade four is significantly higher thanthe corresponding value in grade seven with a p-value < 0.001. All of these results suggest that —consistent with the learning hypothesis — parents are more responsive to information received in grade37Note that the logistic regression automatically drops all school years in which either every student attritted or none ofthem did.38In order to further test the robustness of this result, I also re-run the specification in table 2.2 while including an interactionterm between T S and a dummy variable equal to 1 if the number of FI students in the school in a given grade is less than 20.The idea is that we not only want to look within grades, but within classrooms as well. In the data, I do not know to whichclassrooms children were assigned. However, by limiting the sample to schools with less than 20 children, it is highly likelythat there was only 1 class in that year-grade. The coefficient on this interaction term was close to 0 and insignificant.152.3 Data and Descriptive Statisticsfour than in grade seven.The above results suggest parents are learning about the match quality between their child and theFI program. If this is indeed the case, then we should also observe that students who exit the FI programsubsequently improve. The formal model in the next section addresses this question in greater detail,but for now I present some descriptive results. For each child in grade four, I compute the value addeddifference between their grades 7 and 4 test score, VA = T S7−T S4. In order to purge the value of VAof any “regression to the mean” effect, I follow the methodology of Rivkin et al (2005) and calculate astandardized value added measure.39 Using this measure, I calculate the average performance changeof children exiting FI after grade four. The results are shown in table 2.4.40 The first row of panel Acompares the average performance of the FI stayers to the remainder of the population (which includesboth children who remained in FI and children who never enrolled in the first place). Children exiting FIhave an average standardized VA score of 0.14σ compared to -0.001σ for the remainder of the sample(significant at the one percent level). This suggests the students who exit the FI program do experiencean improvement in achievement after leaving.41 Panel B of table 2.4 performs a similar analysis to panelA, except now I compare students who switched out of FI in grade four to those students who remainedin FI in grade five. The results in panel B are qualitatively similar to those in panel A, except nowthe differences are not as large (and also less precise because of smaller sample sizes). We continue toobserve that, overall, switchers experience a higher value added than stayers (0.14σ to 0.06σ with thedifference significant at the ten percent level).The results presented in this section are consistent with the claim that not only are parents quiteresponsive to their child’s performance in school, but also that parents are learning and are more sensitiveto information received earlier on. However, these estimates are also quite limited. First, the samplewas restricted to those children only in FI in grades four and seven, thus ignoring all students who hadalready left the program. Second, the estimates did not account for any of the existing informationalready available to parents. Finally, these estimates can not tell us anything about how parents wouldrespond to more or less information (or any other counterfactual scenario). In order to account forthese issues and further explore deeper questions related to information and uncertainty, I now turn to astructural model of program choice and student achievement.39The standardized value added measure is computed as follows. I first computed the difference between the grade seventest score and grade four test score, VAi = T Si7−T Si4, for all grade four children (not just those in FI). Next, I split the gradefour test score into 40 different bins, B j, and standardize the value added measure based on the mean and standard deviationof VA in each bin.40In table 2.4, I use a residual standardized value added measure derived from a regression of the standardized value addedonto grade four child characteristics. I do this in order to make sure I am not picking up the effects of characteristics that mightbe correlated with program exit and student achievement.41It is also important to remember that this result is not saying that most children would have been better off exiting the FIprogram.162.4 Model2.4 ModelThe model used in this paper combines a dynamic discrete choice model with a model of student out-comes. The key source of uncertainty in the model is how a child will perform academically in FIrelative to not being enrolled in FI, which is unknown to parents at the initial point of enrolment. Eachyear a child is enrolled in the FI program, parents receive information about this unknown ability. Wecan think of this information as encompassing multiple sources such as a child’s report card/schoolwork and interactions with a child’s teacher. Using this new information, parents update their beliefsas to how their child will perform in FI and next period decide whether or not to keep their child in theprogram. The model focuses on decisions of parents from kindergarten all the way up to the start ofsecondary school.2.4.1 SetupParent’s ProblemAt the beginning of each year, parents make educational choices for their children. The choice setavailable to parents is C = {0,1} where C = 1 refers to a choice of FI and C = 0 implies a child notenrolled in FI. The parents’ objective is to maximize total expected utilityMax{Cit∈{0,1})Tt=0{EAi,Ct[T∑t=0β tut(Cit ;Zit ,Ai,t,Cit )]}(2.3)where Ai,t,Ct refers to a child’s level of ability (which potentially depends on parents’ enrolment choices),Zit are observable characteristics, and β is the discount rate.42 The term uit(.) is the per-period utilityfunction and is given byuit(Cit ;Zit ,Ai,Cit )≡ ZitδC−SCt +αAi,t,Cit +χi,Cit ,t (2.4)where SCt refers to a variety of switching costs and χi,Cit ,t represents unobserved shocks to utility. Inorder to allow for unobserved preferences for a given program choice, I make the following parametricassumption on χ:χi,Cit ,t = θi1(Cit = 1)+ξi,Cit ,twhere θi is an unobserved (to the econometrician) preference for FI and ξi,Cit ,t is a program-specificidiosyncratic shock drawn from a Generalized Extreme Value Distribution, where the variance of theseshocks depends on the current grade. For all grades t ≥ 1, I assume that the utility shocks follow astandard Gumbel distribution (that is, they have a variance of pi26 ). For grade t = 0, I allow the varianceto differ and treat it as a parameter to be estimated by the model, τ pi26 . The purpose of allowing τ tovary in period 0 is to allow the shocks to have a different variance for parents’ initial enrolment decision42I follow the previous literature by setting the discount rate, β , equal to 0.95. All results are robust to using a discount rateof 0.9 as well.172.4 Modelas opposed to their decision to keep their child enrolled in a given program. Alternatively, since thenormalization of the error variance in the remaining periods is required since logistic regressions haveno inherent scale, this assumption can also be viewed as a way to allow the independent variables seenin equation (2.4) to have an impact on program choice in the initial period that is greater or lower inmagnitude compared to the remaining periods.The parents’ choice set in a given period follows directly from the institutional rules discussed insection 2.2. In each grade, parents are free to enrol their child in a non-FI school; however, parents mayonly enrol their child in FI in grades K or 1. For all grades after grade one, only parents of childrenalready enrolled in the FI program can continue on in FI.Ct ∈={0,1} if t ≤ 1 Or (Ct−1 = 1 for t ≥ 2)0 if t−1 = 0 And t ≥ 2The model contains two types of switching costs, SC10 and SC101 which represent the costs of switchingout of and into FI, respectively. Note that the latter switching cost only applies to children switchinginto FI in grade one as opposed to kindergarten. There are many possible interpretations of these costs.For example, it could be that switching schools requires parents to make a significant investment oftheir time or that children enrolled in school make friends and thus do not wish to leave the program orschool.Test ScoresIn addition to program choices, I also have data on student achievement in the form of the children’sgrades four and seven standardized test scores. As in section 2.3, the term “test score” here refers to theaverage of the each child’s scores across the three Foundation Skills Assessment components. I assumethat a child’s test score, T S, is given by the following equation:T Si,t,Cit = Ai,t,Cit + εitwhere test score is equal to the child’s ability (which could depend on program of enrolment, Ct) plussome error term εit , such that εit ∼ N(0,σ2ε ).AbilityAbility, Ai,t,C, is defined asAi,t,Cit ≡ Xitβ 0+FIit(βFI +ηi)+Ψi (2.5)where Xit are observable characteristics, FIit is a dummy variable equal to 1 if a child is enrolled inFrench Immersion, βFI represents the average effect of the FI program and ηi is an FI-specific randomtreatment effect. Ψi represents baseline ability that is independent of program enrolment. Recall (fromsection 2.2) that studies have shown that several traits are correlated with success for both children in FI182.4 Modeland children in monolingual programs. Ψi is known to parents but unobserved by the econometricianwhile ηi is unknown to both the parents and the econometrician and is distributed according to η ∼N(0,σ20 ).While parents do not observe η , they are able to learn about it over time. At the end of each period,all parents receive a noisy measure of ability, It , defined as:Iit = Ai,t,Cit + vit (2.6)where vit ∼ N(0,σ2v ) represents the noisy component of the signal. If we substitute equation (2.5) into(2.6) and rearrange terms, then we can express equation (2.6) as saying that at the end of each period,parents of children in FI receive a noisy signal about ηSit = Iit −Xiβ 0−FIβFI−Ψi = ηi+ vit (2.7)The signal, Sit,, is used by parents to update their beliefs about the true value of η . How parents updateis a function of their beliefs over the distributions of ηi and vit ; therefore, I make the assumption thatparents’ beliefs about these distributions are given by η ∼ N(0,σ20,par) and vit ∼ N(0,σ2v,par). Sinceboth η and vit have normal distributions, there is a closed form solution for the updated distribution(Chamley, 2004). Given the parents’ initial prior and signals St = {S1,S2, ...,St}, we have thatη |St ∼ N (µt ,σ2t )µt = tWt∑tr=1 Srtσ2t = Wtσ2v,parWt =1t+σ2v,parσ20,par(2.8)where Wt above also represents the weight parents place on a new signal when updating E[η ]. Inparticular, given a prior distribution of η ∼ N(µt−1,σ2t ) and signal St we have thatη |{µt−1,σ2t−1,St} ∼ N (µt ,σ2t )µt = (1−Wt)µt−1+WtSt (2.9)Wt =1t+σ2v,parσ20,parDiscussion of Ability SetupEquation (2.5) makes important assumptions with regards to both what parents are learning about andhow enrolment in FI affects performance. In this section, I discuss both of these assumptions in greaterdetail.192.4 ModelThe key source of parental uncertainty in the model is their child’s specific impact of being enrolledin FI, represented by parents’ uncertainty over the parameter η . In practice, parents likely have uncer-tainty over several dimensions of ability (e.g. for both how a child will perform inside and out of theFI program). However, in a dynamic discrete choice framework like the one described in section 2.4.1,all that will matter to parents — and all the econometrician will be able to identify — is the differencebetween how a child performs in and out of the FI program. The setup of the model is designed toreflect this fact. Any richer model with additional sources of uncertainty will not be identified withoutadditional or stronger assumptions.43Next, by making η additively separable in ability, the model assumes that it is possible for anychild to be well suited for the FI program. Alternatively, one might believe only children with a highbaseline level of ability are likely to perform well in the program while children with a low baselineability are better off out of the FI program. However, there are several reasons to believe that this is notin fact the case. First, in section 2.2, I discussed how this belief is not consistent with the current FrenchImmersion research. Secondly, if it is the case that only children with strong cognitive skills succeedin the FI program, then a quantile regression of the average test score on FI enrolment should yieldcoefficients that are increasing over the quantiles. The results, shown in Appendix figure B.2, show thatnot only are the coefficients not increasing over the quantiles, but if anything there is a slight decrease.Finally, a world in which FI performance is increasing in ability has implications for the distributionof test scores. If parents were initially sorting on ability, then based on initial FI enrolment we wouldexpect to see a much larger percentage of low-performing non-FI children when compared to children inFI. However, looking at the bottom panel in figure 2.1, we see that the distributions are nearly identicalfor the grade four test scores and only slightly shifted to the right for the grade seven test scores ofthe initial FI children. In either case, there remains a very large percentage of low-performing studentsinitially enrolling (and remaining) in the FI program.Capacity ConstraintsThe model incorporates the limited capacity of the FI program. All parents that wish to enrol their childin FI are assigned some probability that their child actually gets in. This probability is the total numberof children who wish to enrol in FI in a given district-year-grade divided by the number of availablespaces. The former is estimated within the model while the latter is exogenous and is calculated asfollows: For every district-year-grade combination, I calculate the number of available FI spaces by firstimputing the total number of FI classes from the data and then multiplying the number of classes by theprovincial limit on the maximum number of students permitted per class.44 Appendix B discusses in43This setup also has implications for the assumption of risk-neutrality on the part of parents. Now, the choice to not enroltheir child in FI is “risk-free”. Any aversion on the part of the parents to enrol their child in the “risky” FI alternative willbe captured by estimates of the coefficients in Z, γˆ , (and most likely in the constant term). However, as discussed below,enrolling their child in FI provides parents with an option value with regard to FI or non-FI enrolment, which helps to mitigatethe riskiness of enrolling their child in FI.44The imputation is done according to the following formula. Let Rsty be the number of classrooms in school s in grade t inyear y, Enrol be FI enrolment and Mty be the provincially mandated maximum number of students permitted in a classroom.202.4 Modelgreater detail how these capacity constraints are incorporated into the estimation of the model. Capacityconstraints only apply to students who are entering FI for the first time. Children enrolled in FI in gradeK are guaranteed a spot in grade one.Unobserved HeterogeneityThe model as constructed allows for both an unobserved preference for FI, θi, and unobserved ability,Ψi. In order to allow for a correlation between these terms, I assume that each parent is characterizedby a finite type m = 1...M with probability pm. Each type consists of a taste for FI, θm, and unobservedability, Ψm. All types are known to the parents but unobserved to the econometrician.Solving the ModelThe model is easily solved using backward induction and Bellman equations. The key state variablesof the model are a child’s enrolment in the previous period, Ct−1, and the set of signals parents havereceived so far, St = {S1, ...,St}. For simplicity defineUt,(Ct ;St ,Ct−1)≡ZitδCt +θi,Ct −SC10 ∗ (1−Ct)+αE[Ai,t,Ct |St]if t > 1ZitδCt +θi,Ct −SC10 ∗ (1−Ct)∗Ct−1−SC01 ∗ (1−Ct−1)Ct +αE [Ait |St ] if t = 1ZitδCt +θi,Ct +αE [Ait ] if t = 0(2.10)This term represents parents’ direct grade t expected utility from choosing choice Ct given St and Ct−1.The Bellman equation in the final period is given byVT (CT−1,ST) =maxCT∈{0,1}{UT (CT ;ST )+βE(VT+1(CT ))+ξi,T,C}if CT−1 = 1UT (0)+βVT+1(0)+ξi,T,0 if CT−1 6= 1(2.11)where VT+1(.) refers to the continuation value of a given choice. I discuss estimating this function forperiod T +1 below. Equation (2.11) says that parents choose the option with the highest expected utility,provided that they have a choice.Next, I focus on the value functions for grades 2 ≤ t < T . It will be useful to first define the term,Vt+1(Ct ,St+1|St), which represents the expected value of the value function in grade t + 1 when theThe number of classrooms in a given school is calculated by:Rsty = Round(EnrolstyMty)∗MtyI chose to use the Round function in order to be more conservative in the number of FI spaces. It also produces percentcapacity values that are much more in line with anecdotal evidence. The drawback of using the round function is that itpotentially leads to cases where the actual total FI enrolment is greater than the number of imputed spots. This occurs inapproximately 6% of all district-grade-years. In these situations, I increased the capacity to match the actual total number ofFI children.212.4 Modelunobserved shocks ξi,t,+1,Ct have not yet been observed:45Vt+1(Ct,St+1|St) ≡ Eξi,t+1,C[Vt+1(Ct,St+1|St)](2.12)The Bellman equations for grades 2≤ t < T are given byVt(Ct−1,St) =maxCt∈{0,1}{Ut(Ct ;St)+βESt+1(Vt+1(Ct,St+1|St))+ξi,t,C}if Ct−1 = 1Zitδ 0+αE[Ait |St−1]+βESt+1(Vt+1(0,St+1|St))+ξi,t,0 if Ct−1 6= 1(2.13)Note that when predicting next period’s value functions, parents take expectations over both futureshocks and future signals. Appendix B describes how parents take expectations over the signal they willreceive next period.For grade t = 1, we have to take into account that all children are eligible to enrol in FI:V1(C0,,S1) = maxC1∈{0,1}{U1(C1;St ,C0)+βES2(V2(C1,S2|S1))+ξi,1,C}(2.14)The value function for the initial period, t = 0, is similar to equation (2.14). The key difference is thatparents must now take into account the fact that if they do not enrol their child in FI in grade K, theymight not get into FI in grade one even if they desire to do so (because of the capacity constraints). Thiswill complicate the expression for V1(0) — the expected utility in grade one conditional on not enrollingyour child in FI in grade K. In Appendix B, I describe how to adjust V1(0) in order to account for thecapacity constraints.Option Value of Learning and the Immersion ProgramIn the model parents do not know exactly how their child will perform in the immersion languageprogram and the only way parents can learn this information is by enrolling their child in the programand observing his or her performance. Since parents can remove their child from the program at anytime, there is value to parents from being able to try out and learn about their child’s match quality withthe program. If the match quality is really good, then parents likely choose to remain in the program;conversely, if the child is poorly suited for the immersion program, parents can switch their child to thestandard curriculum, limiting any downside risk.Formally, this option value enters into the parents’ utility function through the equation:ESt+1(Vt+1(1,St+1|St))The term inside the expectation, Vt(1,St+1|St), is both convex and increasing in St+1 (see proof in45An advantage of using logit errors is that the expectation in equation (2.12) has a closed form solution (see Appendix B).222.4 ModelAppendix B). Using Jensen’s Inequality, this implies that46ESt+1(Vt+1(1,St+1|St))≥Vt+1(1,St) (2.15)Equation (2.15) says that the expected utility parents derive from knowing they will receive additionalsignals in the future, ESt+1(Vt+1(1,St+1|St)), is greater than the expected utility in the case parents willnot receive any new information, Vt+1(1,St).Period T +1 Value FunctionsIn most finite period models, the continuation value in the final period is assumed to be 0. In thismodel, this would involve setting VT+1(CT ) = 0. This would imply there is no benefit to continuingon in FI after period T. Given that some students continue on in this program up until grade 12, this islikely an unrealistic assumption. In practice I treat the value function as a parameter to be estimated:VT+1(CT = 1) = Vˆ . This methodology assumes that the continuation value of being in FI after periodT is constant for all students.47 The value of not continuing on in FI, VT+1(0), is normalized to theexpected value of the unobserved shock ξ .482.4.2 IdentificationIdentification of the model parameters comes from two primary sources. The first is the dynamic dis-crete choice model of program choices. Variation in choices is used to estimate the learning parameters(σ20,par,σ2v,par), parental preferences (α,δC), unobserved types ({θm, pm}), the utility shock error vari-ance (τ) and the continuation value (VT+1(1)). The second source is the two periods of standardizedtest scores which will help estimate: the coefficients in the ability equation(β 0,βFI), unobserved types({Ψm, pm}), variance on FI heterogeneity (σ20 ) and the variance on the measurement error (σ2ε ).One parameter that cannot be identified is the (actual) variance on the noise term in the signalprovided to parents (the variance of vit in equation (2.7)). I therefore normalize σ2v by making theassumption that Iit = T Sit . This assumption implies that σ2v = σ2ε . It is a special case of the more generalassumption that both T Sit and Iit are drawn from some joint distribution. The advantage of making thisassumption is that it allows the econometrician to observe two of the eight signals received by parents.46This follows from ESt+1(Vt+1(1,St+1|St))≥ Vt+1(1,E(St+1|St)) = Vt+1(1,{µt ,St}) ≥ Vt+1(1,St). Writing outE(St+1|St) = {µt ,St} is to say that, in expectation, the signal parents expect to receive in period t + 1 is equal to theirbelief of their child’s match quality.47I also estimated a version where the value of Vˆ depending on the parents’ type m, but that did not lead to any materialchanges in the model estimates.48An alternate method I tried is based on the innovation by Hotz and Miller (1993) that expected value functions can bewritten in terms of choice probabilities:log(1+ eUT+1(1)−UT+1(0))=−log(PT+1(FIT+1 = 0))and estimated outside of the model. However, I found this method consistently did not fit the data as well as the baseline modelpresented in section 2.4.232.4 ModelInstead of the identification of the learning parameters coming off unexplained residual variation in FIexit rates (discussed further below), these parameters are now partially estimated off observed variationin FI exit in response to observed variation in information. The disadvantage of this assumption is that itmight be subject to proxy error. One case where this assumption is violated is if parents respond to theFoundation Skills Assessment tests directly. This would imply that the tests represent signals in additionto the information parents usually receive. In Appendix B, I use a policy change in the administrationof the Foundation Skills Assessment to present evidence that suggests parents’ FI exit decisions are notresponding to the tests themselves. These results from Appendix B also suggest that proxy error is nota huge concern because while parents are not responding to the test scores directly, in section 2.3, wesaw that there are very large and significant correlations between the test scores and program exit. Thissuggests that the test scores do in fact have significant informational content as any measurement errorin the test scores would bias the coefficients in table (2.2) towards zero.Program Choice ParametersGiven an average signal, St , the coefficient on St in the program choice equation is tαWt . One of thekey identification problems is how do we distinguish between α — the coefficient on performance inparents’ per-period utility — and Wt — the weight parents place on the signals when updating theirbeliefs about η (recall from equation (2.8) that Wt is a function ofσ2v,parσ20,parand time). Intuitively, theselearning parameters are identified off the variation in exit rates which respond to new information overtime. For example, suppose parents are given the exact same signal in two different time periods, t, t ′,where t < t ′. If parents are more responsive to the information in grade t than in grade t ′, then thisindicates parents are placing a higher weight on the new signal in earlier periods (and as such σ2v,parσ20,parisvery low). If, however, parents are equally responsive to the information in the two time periods, thenthis suggests parents place a low, more equal weight on the information they receive over time.Formally, suppose that for some t 6= t ′, we knew the values of tαWt and t ′αWt ′ — how informationaffected choices in these two time periods. Taking the ratio of these two terms allows us to cancel outthe α term and we are left with an expression that is a function of known time values and σ2v,parσ20,par. Thus,we can calculate σ2v,parσ20,parand then use it to back out α . The proof in the case where none of the signals areobserved is similar. Instead of having values of αWt and αWt ′ , we have values of different variances,(t2α2W 2t var(St))and(t ′2α2W 2t ′ var(St ′)), where once again, we can isolate σ2v,parσ20,parand then back out α .Disentangling the variance of the parents’ prior distribution, σ20,par, separately from the ratioσ2v,parσ20,paris done through the implied option value of learning discussed above. Intuitively, this option valueis increasing in the variance of the parents’ prior distribution. If parents believe that there is a lot ofvariation in terms of how children are able to perform in an immersion program, then this increases theprobability that their child is either a very good or a very bad match for the FI program. As before, sinceparents have the option to remove their child from the program, the increase in the benefits to parentsfrom their child being a good match with the immersion program will outweigh the increase in the242.4 Modelcosts of being a bad match. Formally, this implies that∂ESt+1(Vt(Ct,St+1|St))∂σ20,par> 0.49 Thus, the identifyingvariation of the parents’ prior distribution comes from the model’s estimates for the implied option valueto parents of being able to learn their child’s comparative advantage in FI.50The estimation of the remaining parameters in the program choice problem is straightforward. Thecoefficients on the observable characteristics are identified off variation in FI exit rates in response toobserved variation in the relevant variable. Unobserved preferences for FI are modelled as a finitemixture of mass points (Heckman and Singer, 1984), which are extensively used in dynamic discretechoice models (e.g. Keane and Wolpin, 1997). They are identified off the distribution of unexplainedvariation in the FI entry/exit decisions. Note that we cannot identify all types as well as a constant in Z;therefore, I normalize the value of θ2 to be 0.Ability ParametersIdentification of the finite mixture of mass points, {Ψm, pm} once again comes from residual variationin test scores not explained by any of the observable characteristics (or the error term). As before, weneed to make the normalization Ψ2 = 0. For cases where FIt = 1, we can view the combination ofthe unobservables Ψim and ηi as a finite mixture of normals with means {Ψm} and common varianceσ20 . Identification of the parameters in the case of a finite mixture of normals is also well-established.It is important to note the role played by estimating the program choice equation and the test scoreequation jointly when it comes to estimating the population values of these parameters. If we onlylooked at the test score equation, our estimates of σ20 are solely based on children enrolled in the FIprogram in grades four or seven. However, it is unlikely that the distribution of η for these childrenequals the distribution of η in the entire population. By modelling the choice equation, we can backout the population distribution of η given the conditional distribution σ20|{FI} and the probabilities ofenrolment.51While the identification of the remaining ability parameters from the test score equation is straight-forward, there are concerns that must be addressed. The first is that in such analyses we often wish tocontrol for factors such as unobserved neighbourhood or school quality. Usually, this would take theform of neighbourhood or some alternative set of fixed effects. This option is not feasible in an analysissuch as this one because it would be too computationally burdensome; therefore, I take an alternative49See proof in Appendix B. This is analogous to standard results in finance that the option value of a stock is increasing thevariance of the stock price.50The parsimony of the model is what allows the identification of the parents’ prior distribution.51For example, suppose we have a conditional variance σ20|{FI}. Then using Bayes rule, we have that:σ20|{FI} =ˆx2 f (x|{FI})dx=ˆx2[P({FI}|x) 1σ0 φ( xσ0 )P{FI}]dxwhere both P({FI}|x) and P{FI} come from the program choice problem. Thus, the only remaining unknown parameter onthe right-hand side is the population standard deviation σ0252.4 Modelapproach. For every FI child in a given neighbourhood-year, I randomly select a non FI child in thesame neighbourhood-year. I define a neighbourhood by the first 3 digits of the child’s postal code, for-mally known as a Forward Sortation Area (FSA). This is a geographic area that on average containsapproximately 5,000 households or 18,000 persons.52 In section 2.7, I present the results of a versionof the model without this grouping approach and show that many of the baseline model’s predictionsremain.In addition, I include a variable in the program choice equation that is not in the test score equa-tion. The purpose of this variable is to ensure that the identification of the coefficients in βFI arenot coming solely from functional form assumptions. The variable I have chosen for this purpose isthe log of the ratio of the driving distance to the closest FI school to the driving distance of the clos-est non-FI school, rel_dist = ln( Closest FI SchoolClosest non-FI School). The exclusion restriction that must be satisfied isE [rel_dist ∗ εit |Xit ,Ψm,θm] = 0; that is, conditional on observable characteristics X, and unobservedtypes and the fact that I have grouped people at the neighbourhood level, the relative distance to thenearest FI school is uncorrelated with the error term in the test score equation. One way to think aboutthis assumption is that conditional upon having chosen a neighbourhood where to live, the exact locationparents end up at is a function of factors that relate more to the available supply of housing and otherexogenous characteristics. In chapter 3, I discuss this assumption in greater detail and show that the cor-relations between the observable characteristics and the relative distance variable are all close to zeroonce we look within a given neighbourhood. These balance tests provide further credibility regardingthe underlying exclusion restriction assumption.2.4.3 EstimationSample RestrictionsSeveral restrictions are imposed on the sample prior to estimation. First, in order to keep the institu-tional setting among students as constant as possible, I limit the sample to children in school districtswhere secondary school begins in grade 8.53 Next, since the key variables of the model are identifiedoff variation in attrition rates over time, I exclude all students who left the data prior to the start of grade3. This restriction ensures that I observe all children a minimum of four years. In total, this restrictionremoves approximately 5% of the sample. Children missing key variables (e.g. test scores) are also ex-cluded. This drops approximately 8% of the remaining students. The next step is exactly as is describedin section 2.4.2 above. For every remaining early-FI student, I randomly select a non-FI student in theexact same FSA-year (without replacement).54 Finally, I take a 50% sample for computation purposes.This leaves me with a sample of 15,448 children — 50% of whom initially enrol in the early-FI program— and 107,412 student-year observations.52Card, Dooley and Payne (2010) use FSAs in their analysis of Catholic schools in Ontario as a proxy for the characteristicsof the households in a school’s catchment area.53This step leads to approximately 25% of children being removed from the sample.54This grouping occurs based on the neighbourhood the child lives in during kindergarten.262.4 ModelLikelihoodEstimation of the model parameters is calculated using maximum simulated likelihood estimation (MSLE).I first calculate the likelihood assuming all unobserved terms are known to the econometrician and thenproceed to integrate them out. Appendix B contains much more detail on how the likelihood is calcu-lated than what is described below. With no unobserved heterogeneity, the likelihood for child i of typem given signals S is given by:55Lim({FI},{T S}|η ,St) =Ti∏t=0Limt(FIit ,T Sit |η ,St)=Ti∏t=0Limt(FIit |η ,St)Lismt(T Sit |FIit ,η) (2.16)The first term in equation (2.16), Limt(FIit |η ,St), is the probability of FI enrolment. It is given byLimt(FIit |η ,St) =P(FIit = 1)FIt (1−P(FIit = 1))1−FIt if t ≤ 1 Or (FIt−1 = 1 for t ≥ 2)1 Otherwise (2.17)The second term in equation (2.16), Lismt(T Sit |FIit ,η), is the conditional PDF from the test score equa-tion:Lismt(T Sit |Cit ,η) =1σε φ(T Sit−(x′i,tβ1+FIt(βFI2 +ηi)+Ψi)σε)if t = 4,71 Otherwise(2.18)where φ(.) is the pdf of the standard normal distribution.The next step, given equation (2.16) is to integrate out the types. This is done by multiplying eachlikelihood by the type probabilities and then adding up the results:Li({FI},{T S}|η ,S) =M∑m=1pmLim({FI},{T S}|η ,S) (2.19)Finally, we need to integrate out the unobserved FI-specific random effect, η and the unobserved signalsSt . Because these are continuous variables, I integrate them out by taking draws from the joint distribu-tion of η and St . For each draw, d = 1...D, I calculate the likelihood as in equation (2.19), and then atthe end take the average over each of the draws.Li =1DD∑d=1Li({FI},{T S}|ηd ,Sd)55For notational simplicity I have suppressed that fact that all likelihoods are conditional upon {Z,X ,θm,Ψm} and theestimated parameters.272.5 ResultsOnce we have the individual likelihoods, we can calculate the total log-likelihood, L, given byL =N∑i=1log(Li)Our objective is to find the parameters that maximize the value of L.2.5 Results2.5.1 Model FitIn this section I examine the ability of the model to fit the data using both in-sample and out-of-sampletests. Figure 2.5 compares how well four different versions of the model are able to match moments inthe data related to FI enrolment and attrition. Two models are based on the setup above (with parentallearning and the number of types, M, set to M = 2,3). The remaining two models are estimated underthe assumption that there is no learning on the part of the parents (once again I estimate versions withM = 2,3).56 This exercise allows us to see how much adding in parental learning improves the overallfit of the model.57 Figure 2.5a examines how well the models match the initial enrolment rates into FI,showing both the true percentages from the sample and the simulated values. Actual initial enrolment(for the grouped estimation sample) in grades K and one are 46% and 8.8%, respectively. While eachmodel does a reasonable job at matching the initial enrolment rates, in general the overall fit of the modelis improved by adding in additional types and going from a model without learning to a model withlearning. For example, in the model with no learning and two types, the simulated initial enrolment ingrades K and one are 49% and 7.6%, respectively. The corresponding values in the model with learningand three types are 46% and 8.9% respectively.Figure 2.5b displays the fraction of children remaining in FI. As in figure 2.5a, the models do a de-cent job at matching the attrition rates. Typically, the estimates are usually a few percentage points offfrom the true value. This is despite the fact that none of the models contain any time-varying preferenceparameters in the parents’ per-period utility function. The one area that each model has the most dif-ficulty with is the transition between grades six and seven, during which each model overestimates thepercentage of students who exit FI. This result can be viewed as “smoothing” on the part of the models.Instead of matching the sharp drop in FI exit between grades seven and eight, the models “smooth” thisdrop out across the later grades.58 Once again, we see that the models which include parental learningproduce the best overall fit with the data, especially in the earlier grades. All of the results presented inthe remainder of this section are based on the model with parental learning and three types.5956In this version of the model there are no estimates of σ20,par or σ2v,par57In order to test the fit of the models, I simulate each model 75 times by drawing values for θm,Ψm,εit , ηi and for theprobability of enrolment when enrolment is constrained.58 This follows from the fact that if utility in the final period has a low continuation value, then this will also be reflected inthe grade seven utility because parents are forward looking and discount the future.59The log-likelihoods of the models with no learning and two and three types are -49,065 and -48,804, respectively. Thelog-likelihoods of the models with learning and two and three types are -48,832 and -48,504.282.5 ResultsNext, I increase the demands on the model by looking at how well the model is able match out-of-sample moments using data not used in estimating the parameters. Out-of-sample tests provide addi-tional credibility for the ability of the model to predict parents’ behaviour. First, in figure 2.6, I replicatefigure 2.4, by examining the correlation between test scores and program exit in grades four and seven.In the model, parents decompose their child’s performance into those factors that are FI dependent andthose that are not and it is only this latter group that impacts parents’ program choices. In reality, parentsmight not have such perfect information or act so rationally. Thus, this exercise can be viewed as a testof external fit because the relationship being examined is not entirely accounted for in the model. Figure2.6 shows that the model generates a negative relationship between the test score and FI exit that closelymirrors the actual values. The slopes of the two curves are very similar across a majority of the testscores. The one area that the simulated graph deviates from the true values is at the very end where theactual graph begins to flatten out and then increase while the simulated graph keeps decreasing.60The other out-of-sample test I preform is to: first estimate the model using students who enrolledin kindergarten between 1998 and 2003, use the estimated parameters to simulate the model for thosestudents who entered kindergarten between 2004 and 2009 and finally compare the actual and predictedmoments. Figure 2.7a presents both the observed and predicted initial enrolment and program attritionrates for this out-of-sample period. Looking at initial enrolment, the model almost perfectly matchesenrolment in grade K (46% actual versus 45.9% predicted) and slightly overshoots initial enrolmentin grade one (7.6% actual versus 9.5% predicted). Figure 2.7b shows that the model is also able tomatch most of the moments with regards to program attrition. Overall, the model does extremely wellat predicting observed outcomes in each of these out-of-sample tests.2.5.2 Model ParametersTable 2.5 presents the estimates of the program choice and test score parameters for the preferred modelspecification (the model with learning and three types). In the program choice equation, most of theestimated parameters are what we would expect given what we saw in Table 2.1. FI enrolment is highlycorrelated with gender, relative distance, special education status and speaking English at home. Theestimated parameters also suggest that psychic switching costs have a large effect on parents’ choices,even though there are no financial costs of entry or exit.61 Table 2.5 also shows the estimates of theparameters which govern how parents update their beliefs (σ20,par,σ2v,par) and how much parents preferto see their child perform well academically (α). The estimated value for α of 1.03 suggests parentshighly weight their child’s performance when making their program choices. The estimates of σ20,parand σ2v,par represent parents’ beliefs about the variances of the prior and signal noise respectively. Justby looking at these parameters themselves, it is difficult to understand what they imply in terms ofparents’ behaviour. In section 2.5.3 below, I discuss in greater detail the implications of these parameterestimates for both how parents’ update their beliefs and how information affects parents’ choices.60This is due to children exiting FI to enter private school. I discuss the issue of private school choice in section 2.7.61I discuss the role of switching costs in greater detail in the counterfactuals in the next section.292.5 ResultsAverage test scores are positively correlated with gender, home values in a dissemination area, per-centage of people in dissemination areas with university degrees or higher, and private school enrolmentand negatively correlated with special-education designations. While FI is found to have a positive over-all effect, the variance of the distribution of η is estimated to be 0.16σ . Thus, there is economicallymeaningful variation in how different children will perform in the FI program. The difference betweena child 1 standard deviation above the mean and a child 1 standard deviation below the mean is a verylarge test score decline of 0.8σ .Appendix table B.2 shows the results of the remaining parameters in the model including the val-ues of the unobserved types. Relative to the constant terms (which by virtue of the normalization of(θ2,Ψ2) = (0,0) represents the values for type 2 children), type 1 children have a lower preference forFI but a higher unobserved ability while type 3 children have roughly the same preference for FI, buta lower level of unobserved ability. Approximately 53% of the sample is estimated to be type 2, 29%are estimated to be type 3 and 18% are estimated to be type 1. Other parameters of note include anestimated value of continuing on in FI of 1 and the estimated variance of the utility shocks in grade Krelative to the normalized variance in the remaining years is 0.6. This latter value suggests that eachof the characteristics has a larger impact in the initial enrolment period than they do in the remainingperiods.2.5.3 Role of LearningInterpreting the Choice ParametersThe model’s estimates by themselves cannot tell us much about the roles uncertainty and new infor-mation play in parents’ decision making or their impact on student achievement. These parameters areinputs into other more complicated functions that govern parents’ behaviour. Here, I present the im-plications of these parameters for both how parents’ update their beliefs and how information affectsparents’ choices. First, I discuss what the estimated values of σ20,par and σ2v,par tell us about how parentsupdate their beliefs about their child’s match quality with the program. In figure 2.8, I graph how theweight parents place on new information changes over time in accordance with equation (2.8) (in figure2.8, this is the graph with the solid line), with the remaining weight placed on the prior expected value.These weights range from 0.93 in grade one and decrease to 0.12 by grades seven and eight. This im-plies that the weight parents place on new information in earlier grades is as much as seven times largerthan in later grades.Next, I explore how information given to parents impacts their program choice decisions. Simulatingthe model 75 times, I divide the parents’ signal, η+ εit , into 10 deciles, and then for each grade–decilecombination, I take the average FI exit grade. Figure 2.9 shows the results for FI children in grades Kand four. In order to make the graphs more comparable, I de-mean each of the averages; therefore, they-axis should be interpreted as the difference between the average exit period for a given decile and theaverage exit period among all children in FI in the given grade. Information impacts parents’ choicesand information received earlier on has a larger impact on program choices than information received at302.5 Resultslater grades. Parents of children enrolled in FI in grade K that receive a signal in the 5th decile remainin the program an average of 4 years longer than parents that receive a signal in the bottom decile. Forparents of children enrolled in FI in grade four, that same difference is at most 0.5 years.62An alternative method to examining the relationship between information and choice is to look atthe relationship between the underlying ability η and the average exit grade among all FI enrolees. Inappendix figure B.3 I split up η into 10 deciles and look at the average exit grade across each decile.The results are such that parents of children with a value of η in the 1st, 5th and 10th decile spend onaverage 3.7 years, 6.5 years and 7.6 years in the program, respectively.No-Information SimulationsIn this section, I further explore the role that new information plays on parents’ program choices and alsostudent achievement by asking what would happen if we took away all of the post-enrolment informationthat parents receive. This involves simulating the model under the assumption parents receive no signals,which takes away from parents the information they use to update their beliefs about their child’s matchquality. This implies that µt = 0 ∀t. The results of this simulation are shown in figures 2.10 and 2.11.First, as shown in the bottom of figure 2.10, removing parents’ ability to update causes a drop in initialenrolment (from 50% to 32%). This drop in enrolment is caused by the decrease in the option value ofenrolling in FI that arises when parents know they will not receive any new information.63 Since initialenrolment drops by 36%, this suggests that parents’ knowledge that in the future they will be able toobserve their child’s performance and switch them out if need be plays an important role in their initialprogram choice. This would not be the case if, for example, parents’ enrolment decisions were purely afunction of types or preferences beyond program performance.Figure 2.10 also shows that taking away parents’ information leads to large drops of program attri-tion. These decreases in attrition are concentrated in the early grades of the program. By the beginningof grade 5, only 5.5% of children exit the program in the no-information simulation compared to 19%in the baseline case — a drop of 70%. For the remaining grades (six and up) an additional 25% of ini-tial enrolees leave the program in the no-information scenario versus 21% in the baseline model; thus,only a small fraction of the “missing” attrition in the earlier grades is made up for in the later grades.Furthermore, that the percentages in the later grades are so similar suggests that attrition in these gradeshas more to do with the decreasing value from remaining in FI (and idiosyncratic shocks) than parentallearning.Taking away information given to parents also affects student achievement if children with a poormatch quality — who would have otherwise exited the program — remain. Figure 2.11 graphs the62The corresponding values focussing on the difference in years remaining for signals in the highest decile and the fifthdecile are 1 year and 0.4 years, respectively.63The difference between the baseline and no-information simulations consists of both a direct and an indirect effect oflearning. Given parents’ utility as described in the Bellman equation (2.13), the direct effect of learning is the expectationon ability in period t, E[A|St−1] = µt . The indirect effect of learning operates through the option value in the expected valuefunction tomorrow, ESt+1(Vt(Ct ,St+1,|St)).312.5 Resultsdifference between the average predicted test score in the the no-information scenario and the baselinemodel for all children in every grade from K through 8. These predicted outcomes are based on theprogram of enrolment and the parameters estimated in table 2.5 using the grades four and seven testscores.64 Predicting outcomes across all grades shows how differences between the counterfactual andbaseline outcomes change over time. For example, figure 2.11 clearly shows a widening gap betweenthe outcomes in the no-information scenario and baseline model. These differences range from -0.01σin grade one to -0.045σ in grade eight. Intuitively, the predicted outcomes decrease because parents whowould have otherwise chosen to remove their child from the FI program (due to poor match quality) nowchoose to remain. The outcome gap widens over time due to the cumulative effect of the children whoare now remaining in FI.Figure 2.11 is based on test scores for the entire estimation sample; however, the no-informationcounterfactual will only affect those children predicted to enrol in the program in the baseline model.These are the children for whom parents have the ability to learn about the match quality between theirchild and the program. In this sense, the estimates above are analogous to Intent to Treat Effects (ITT).In order to recover the average treatment on the treated (ATT), Appendix figure B.5 reproduces figure2.11, but now limiting the sample to children originally enrolled in FI. Here we see larger effects ofremoving information ranging from -0.03σ in grade one to -0.09σ in grades seven and eight.A potential concern with the no-information simulation above is that changes in the composition ofFI students (caused by the initial drop in enrolment) are driving the attrition and achievement results.This could occur if, for example, removing the option value of learning removes those children from FIwhose parents are close to the utility threshold of FI enrolment. These parents are also likely to be moresensitive to poor information and have a much higher likelihood of attrition. In order to address thispossibility, I run an alternative “no-information” simulation based on Stinebrickner and Stinebrickner(2014a). In this alternative simulation, I set parents’ beliefs about their child’s ability in period t equalto their prior beliefs (i.e. µt = 0); however, parents still act as if they will receive future signals. Thelogic behind this simulation is that although parents are not learning, they are acting as if they will learnin the future. This simulation results in an initial enrolment that is nearly identical to that of the baselinemodel, but afterwards the direct effect of learning is removed. The results for this simulation are shownin figures 2.10–2.11.65 Despite there being no change in initial enrolment, the survival function inthe alternative no-information simulation is nearly identical to the previous no-information simulation.Thus, it is not the change in the composition of enrolees that is driving the attrition results. Figure 2.11(and Appendix figure B.5) also show similar achievement results across the two simulations.In summary, the above results show the important roles played by learning and the informationparents receive after the initial enrolment decision. This new information is what allows parents toovercome initial schooling choices that turned out to be poor fits with their children. This leads tobetter sorting of children to schooling choices and increases in academic achievement (estimated to be64It is possible to predict outcomes for all grades because the relevant observable characteristics are known in every period.Appendix figure B.4 graphs the predicted test scores for each grade.65While grade K enrolment is identical across the baseline and alternative no-information simulations, there will be minordifferences in the grade 1 enrolment since the lack of attrition in the latter case reduces the number of available spaces.322.6 Additional Counterfactuals0.09σ by the start of secondary school). The results also show that parents are much more responsiveto information in earlier grades and therefore act quickly on information received. Learning is a muchlarger determinant of program attrition in grades 1–5 than in grades 6–8. Finally, the results also showthat parents highly value the opportunity to learn in the future. Without the ability to update, 36% ofparents no longer elect to enrol their child in the FI program.2.6 Additional Counterfactuals2.6.1 Post-Enrolment InterventionsThe results in the previous section are primarily based on comparing parental learning to a situationin which parents do not learn at all. In this section, I explore additional important issues surroundingparents’ post-enrolment behaviour. First, it could be the case that parents are placing too high or low aweight on new information they receive. For example, the weight parents place on new information ingrades K and one is 0.93 and 0.5, respectively. However, based on the actual values for the variancesof σ2ε and σ20 (seen in table 2.5), these weights would be 0.3 and 0.23, respectively. These values —as well those for the remaining periods — are shown in the dashed line in figure 2.8 and suggest thatparents are over weighting the information they receive early on. More generally, this relates to thequestion of how do parents’ post-enrolment actions compare to an optimal scenario where parents areprovided with perfect information. An additional question concerns the role of the psychic switchingcosts and how much they are preventing parents from choosing alternative schooling options for theirchild. For some parents, even if they learn that their child is a poor fit in the immersion program, theyare prevented from leaving or choose to leave at a later grade because of the high cost of exit. Thus, tosome degree, switching costs hinder the impact of learning.Motivated by these issues, I run three counterfactual simulations. The first counterfactual (i) involvesproviding parents of children enrolled in the program with additional information about their child’sprogram specific match quality. Formally, I run this simulation assuming that parents perfectly learn thevalue of η after the first year their child is enrolled in the program. In the second simulation (ii) I reducethe ‘cost’ to parents of switching their child out of the program by cutting the estimated switching cost,SC, in half.66 In the third simulation (iii), I reduce the cost of switching, but only for those children withbelow average performance.67 Formally, I set SCt = 0.5∗SC if T St−1 < 0. To make the counterfactual alittle more tractable, I further assume that parents do not take the possibility of the lower switching costinto account when making their program choices. In contrast to counterfactual (ii) where the reductionin SC was universal and permanent, in counterfactual (iii) the reduction is targeted and temporary. In66In this simulation I also make a corresponding change to the estimated continuation value, FV = VT+1(1). Under theassumption that the continuation value has the functional form FV = ln(ex + e−SC), it therefore follows that ∂FV∂SC = e−FV−SC67This counterfactual relates to various policies (not necessarily specific to the FI program) that are designed to make iteasier for parents of certain students (typically those in low performing schools) to enrol their child in an alternative choice.For example, today many school districts are required to provide all parents of children in struggling schools the opportunityto transfer to higher performing schools (with the district also paying for the child’s transportation to the new school).332.6 Additional Counterfactualseach of these three counterfactuals, I hold initial enrolment in the FI program constant (i.e. identical tothe baseline model) in order to focus on post-enrolment behaviour.The results for the three counterfactuals are shown in figures 2.12–2.13. Figure 2.12 compares thesimulated attrition rates for each of the counterfactuals to the baseline model. Counterfactuals (ii) and(iii) involve lowering the cost of switching out of the FI program and thus lead to increases in attrition.In counterfactual (ii), I find that attrition between grades K and one goes from 6% in the baseline modelto 9.5% and attrition between grades one and two goes from 4% to 6%. In total, by the beginning ofgrades seven and eight, the percentage of initial enrolees that have exited the program when comparedto the baseline model increases by 26pp and 42pp, respectively.68 In counterfactual (iii), attrition ratesalso increase, but — somewhat surprisingly — attrition starts off at a higher rate than in counterfactual(ii), only becoming smaller in the later grades. This is despite the fact that fewer parents are eligiblefor the lower switching cost. The attrition in earlier grades is greater in counterfactual (iii) than incounterfactual (ii) because — from the parents’ point of view — the decrease in SC is a one-time onlyevent. If parents remain in the program, they will have to pay the higher value of SC in the future. Thisinduces many parents who receive the option of the lower switching cost to exit the program now. Incontrast, if SC permanently decreases then there are two opposing forces at work. The lower value ofSC today induces more parents to exit the program today while a lower value of SC next year inducesmore parents to remain in the program for the current period. The latter result follows from the fact thatthe value of remaining in FI increases as the cost of leaving in the future goes down.Unlike in counterfactuals (ii) and (iii), it is much more ambiguous whether total attrition for counter-factual (i) will increase or decrease. The total effect on attrition is now made up of two sets of parents.One set are parents who originally decided to remain in FI, but with perfect post-enrolment informationnow choose to exit the program (hereafter the “new leavers”). The other set are parents who otherwiseexited the program, but now choose to remain (hereafter the “new stayers”). The former group causesattrition to increase while the latter causes attrition to decrease. The attrition results for counterfactual(i) presented in figure 2.12 show the net effect of these two sets of parents. I find that the total percentageof initial enrolees that exit FI in grade one decreases from 6% in the baseline model to 2.5%. Declinesin attrition are observed in the next few grades as well. By the beginning of grade 3, the total number ofchildren enrolled in FI is 5pp higher when compared to the baseline model — a difference that remainssteady for the remainder of the grades. As shown in Panel A of Appendix table B.3, this decrease inattrition is the result of a 3-4pp increase in attrition caused by the new leavers and 8-9pp decrease inattrition caused by the new stayers. In the case of the new stayers, these are parents who originallyresponded to information they received by removing their child from the program — a decision that wasnot only suboptimal ex-post, but also one they could not undo. The “new stayers” also make up a largefraction of parents that originally exited the program. Focussing on grades 1–5, 40% (0.08/0.20) of par-ents who originally removed their child from the program now instead choose to remain.69 Conversely,68In the baseline model, 40% of initial enrolees have exited by the start of grade 8.69From section 2.5, we know that a majority of program exit choices made are in response to information received prior tograde 5.342.6 Additional Counterfactuals4% (0.03/0.8) of parents who originally remained in the program now instead choose to remove theirchild from FI.One might expect both the additional information and lower switching costs to lead to better sortingand large gains in student achievement; however, this does not end up being the case. Figure 2.13shows the difference in the average test scores of children in the three counterfactual simulations andthe baseline model. There is clearly a very small overall effect on average test scores across every grade.No effect in a given grade is larger than 0.012σ and in counterfactual (i) the difference in outcomes failsto even be statistically significant. For counterfactuals (ii) and (iii), the difference is only marginallypositive (and significant) in the middle years. In counterfactual (ii), the difference actually becomesnegative for grades seven and eight.These results are driven by the characteristics of the children whose status changes from the baselinemodel (the “new leavers” and “new stayers”). The key characteristic in the model relating to howachievement changes with a child’s program of enrolment is the parameter η , which represents thechild-specific treatment effect from being enrolled in FI. A negative (positive) value for η represents again (loss) in achievement from leaving the FI program. In Appendix table B.3 I calculate the averagevalue of η at grades 2–8 for both the new leavers and new stayers in each of the three counterfactuals.Focussing first on counterfactuals (ii) and (iii), both of which make it easier for parents to leave FI, Ifind that it is primarily the children leaving FI in the earlier grades that gain the most from switchingout. In counterfactual (ii) the average value of η among the new leavers at the beginning of grades fourand eight is -0.17σ and 0.07σ , respectively. The corresponding values in counterfactual (iii) are -0.23σand -0.07σ , respectively. Therefore, while the lower switching costs lead to slightly better sorting inthe early and middle grades, by grades seven and eight many of the children induced to leave are thosethat actually perform better in the program. This explains the hump-shaped achievement differencesfor these counterfactuals seen in figure 2.13. These results imply that switching costs are not a largeconstraint on parents’ ability to optimally sort their children into the program where they can succeedacademically.In counterfactual (i), I also find that the average test score gain among the new leavers is largerin the earlier grades (Appendix table B.3 — Panel A). For example, the average value of η amongthe new leavers at the beginning of grades three and six is -0.52σ and -0.36σ , respectively. However,any gain in achievement from these students is being offset by the new stayers. The average valueof η for these children in grades three and six is -0.22σ and -0.15σ , respectively. In other words,although these children would have higher academic outcomes outside of FI, giving parents additionalinformation actually leads them to keep their child in the program. This explains why figure 2.13 showssuch small overall gains in achievement; the two effects are almost perfectly offsetting. However, why isit that the added information leads many parents of children who perform better outside of FI to remainin the program? The answer is parents’ preferences. Intuitively, without any information on matchquality, initial program enrolment is primarily a function of parental preferences. For parents with ahigh preference for FI, being told that their child will slightly underperform in FI is likely not sufficientto induce these parents to remove their child from the program. By providing these parents with more352.6 Additional Counterfactualsaccurate information, a subset of parents learn that while their child performs worse in the program, itis not nearly as bad as they originally thought.70 An interesting implication of this result is that to theextent that initial enrolment is a function of (non-performance) preferences, any additional informationpost-enrolment will not necessarily lead to improved outcomes and sorting across choices.71In summary, the above results have interesting implications for parents’ post-enrolment behaviour.First, by providing parents with additional information, I found that 40% of parents who would haveotherwise removed their child from the program by the end of grade six, now choose to remain. Thisis consistent with the notion that parents are placing too high a weight on the information they receive,especially in the earlier grades. Providing parents with perfect information prevents them from pullingtheir child too quickly from the program in response to a given poor signal. One might expect that thiswould lead to better matching and large increases in student achievement; however, this did not end upbeing the case. On average, the children who went from leaving FI to remaining experienced a dropin predicted test scores. What drives this result is the fact that initial enrolment is largely a function ofparental preferences and types. While parents will remove their child from the program if they believethe match is really poor, learning that their child only marginally underperforms in the program is notsufficient to induce these parents to choose an alternative. The second implication concerns the issue ofswitching costs. The above results in counterfactuals (ii) and (iii) did not present any evidence that theestimated cost of switching children out of the program is a large hindrance on the impact of learning.The children induced to leave FI from the lower switching costs are those children whose achievementgains from leaving are small (and in some instances negative).2.6.2 Providing Parents with Pre-Enrolment InformationIn the baseline model, parents can only learn about match quality after the initial enrolment decision.However, even with learning, there is still a welfare loss from parents’ initial uncertainty. Parentsof students who are great matches with the program do not apply while parents of students who arepoor matches with the program apply and remain in the program for several years. The next set ofsimulations address the question: how do the results change if parents have access to this informationprior to the initial enrolment decision? Formally, in each of the next three simulations, parents are givenan additional signal, S∗, prior to their initial entry decision in grade K. S∗ is defined similarly to S above,S∗ = η+ε∗. The variance of the noise term ε∗ is given by Ω and will take on the values σ2ε , 0.25σ2ε and0. The latter case is identical to giving parents full information ex ante.7270For the parents that now keep their child enrolled in FI because of the extra information, Panel A in Appendix table B.3shows the average value of S¯t (i.e. the average over all signals a parent received) these parents received at the point when theyoriginally removed their child from FI. The results clearly show that these are parents who received very negative signals. Forexample, by the beginning of grade 6, the average signals parents had received was -0.82σ compared to an actual average FIability of −0.15σ .71Another possible explanation relates to the discussion above that these estimates should be interpreted analogously to ITTeffects.72An additional contribution of these simulations is that they relate to the many situations (though not in FI) in whichstudents are required to write an entrance exam or submit to some other screening mechanism. A stated purpose of theseexams is for the school to select the children most likely to succeed in the given environment. Here, the notion of learning362.6 Additional CounterfactualsThe results for these simulations are shown in table 2.6 and figures 2.14 and 2.15. First, givingparents additional information leads to a drop in initial enrolment. As the accuracy of the initial infor-mation given to parents increases, initial enrolment drops to 46% (Ω = σ2ε ), 44% (Ω = 0.25σ2ε ) and43% (full info). However, these seemingly small overall changes in initial enrolment mask the fact thata large percentage of parents are changing their enrolment choices. On one hand, providing parentswith information ex-ante might induce some parents who would have otherwise enrolled their child inFI to no longer choose to do so. On the other hand, parents who would have otherwise not enrolled theirchild in FI might now choose to do so upon learning their child is an excellent match for the program.These breakdowns are shown in detail in table 2.6. For each simulation, at least 40% of parents thatenrolled their child in FI no longer choose to so while at least 30% of parents that originally did notenrol their child in FI, now choose to enrol in the program. Thus, the extra information ex-ante leads tolarge changes in the enrolment decisions of parents in both directions.Table 2.6 also shows that many of the children who no longer enrol in FI are children who are betteroff out of the program. For example, in the full-info case, the average value of η among these childrenis -0.34σ . Similarly, many of the children whose parents now enrol them in FI are much better off in theFI program. The average value of η among these children in the full-info case is 0.44σ . Similar valuesare also observed in the other two counterfactuals.By providing parents with information ex-ante, we are causing parents to update their beliefs priorto the enrolment choice as opposed to afterwards. This is what leads some parents to no longer enroltheir child in FI. Meanwhile, the new parents induced to enrol in FI are parents of children with a highvalue of η and thus are at a lower risk of exiting the program (Appendix figure B.3). Therefore, thesecounterfactuals effectively replace children at a high risk of leaving FI with students who are a low riskfor leaving FI. This implies that each of these counterfactuals should yield large decreases in programattrition. Figure 2.14 shows that this is in fact the case. For example, between grades K and one. thepercentage of initial FI enrolees who exit the program ranges from 2% (Ω = σ2ε ) to 0.01% (full info).As before, we continue to see that the biggest drops in attrition occur in the earlier grades.In these simulations, parents have the opportunity to choose the program that will produce the bestoutcomes for their child. In figure 2.15, we see that giving parents additional information leads to anoverall increase in test scores ranging from an average of 0.06σ (σ2∗ = σ2ε ) to 0.11σ (full info). Thesedifferences are quite large (especially in relation to the counterfactuals in section 2.6.1) and reflect thegreater amount of positive selection into FI that is occurring in these simulations. These differences alsodecline over time as parents in the baseline model “catch up” in terms of information.In summary, these simulations show that, even with learning, there are meaningful gains to be had interms of decreased program attrition and increased student achievement by providing parents with addi-tional information prior to the initial enrolment decision. Parents clearly prefer to have this informationbefore, rather than after, the initial enrolment decision. Absent the counterfactual, many parents wouldabout match quality remains except now it is the school that is attempting to predict the child’s future performance. Thecounterfactuals in this section will speak to the potential benefits of optimally matching children to the correct program — atleast in a context where heterogeneity in match quality is found to be present.372.7 Extensions and Robustness Checkshave enrolled and subsequently removed their child from the FI program. The counterfactual allowsthese parents to not enrol in FI altogether, avoiding both the negative effects of the poor match and thecosts of removal. Meanwhile, for parents who otherwise did not enrol their child in the program, theextra information gives them the opportunity to increase their child’s academic performance.2.7 Extensions and Robustness ChecksIn this section I check to see if the results of the main model with regards to the role of learning are robustto both extensions of the model and alternative specifications. For each model extension discussed, Irecreate figures 2.8 and 2.9 which show the weight parents place on new information and how (gradeK) signals are correlated with program exit. Furthermore, I also present the associated counterfactualsfrom estimating the no-information simulation, recreating figure 2.10 and appendix figure B.4. I choseappendix figure B.4 in order to make the results more comparable across each of the extensions.73 I alsolimit the results to the alternate no-information simulation (where parents act as if they will learn in thefuture) in order to focus on differences of the impact of learning and not changes in the composition ofstudents.74Including Trend in FI PerformanceThe first extension is one in which I allow for trends in both the average effect of the program, replacingβ2 with β2+pi1t, and the cost of switching out of the FI program, replacing SC with SC+pi2t. In section2.3, I presented evidence that FI children perform better in grade seven than in grade four (see alsochapter 3). One of the common explanations of this result is that since the tests are administered inEnglish and English is not formally introduced into the curriculum until grades three or four, then the FIchildren need time before they can fully catch-up to their counterparts in the standard program (Genesee,2007). Allowing the average FI impact to vary over time would be a way of capturing this effect. Themain assumption here is that unknown FI ability (η) is a level effect and that all students improve by thesame amount by remaining in the program. The other change in this model is to include a correspondingtrend in the program selection equation entering through the estimated switching costs.75 For example,it could be that parents do not wish to remove their child from the school so soon after enrolment, butare more amenable to removing their child in later grades.The results for this model are presented in column 2 of Appendix tables B.4 and B.5. As expected,allowing the FI effect to vary by grade implies a larger effect in grade seven than in grade four ofapproximately 0.06σ ((7− 4) ∗ 0.02). The coefficient on the estimated trend in SC is 0.09, suggestingthat the cost of switching out of the program is slightly over time. Most of the remaining parameters are73Recall that figure 2.11 shows the average effect across the entire sample and this can be viewed analogously to ITT effects.However, if the FI take-up rate varies across each of the models extensions, these effects will not be comparable. By focussingon the effects for only those children initially enrolled in the program, we implicitly account for these differences.74This will primarily be an issue for the extension which adds in private school to the parents’ choice set and is discussedfurther below. Most of the remaining results are comparable regardless of the chosen figure to duplicate.75I put the trend in the switching cost parameter in order to give it a more intuitive interpretation.382.7 Extensions and Robustness Checksvery similar to those estimated in the baseline models. From figure 2.16, we see that the weights parentsplace on new information are smaller in the early grades when compared to the baseline base. Theseweights now range from 0.77 in grade one to 0.12 in grade eight. Furthermore, figure 2.17 shows thatin this version of the model, the information parents receive impacts their program choices in a mannersimilar to that seen in the baseline model.Next, I once again simulate this model under the assumption that parents do not receive any newinformation. Figures 2.18 and 2.19 show the impact on attrition and student achievement, respectively.Just as in the baseline model, we continue to observe that taking away parents’ information has largeeffects on program attrition, particularly in the earlier grades. Figure 2.19 shows that taking awayparents’ information continues to adversely impact student achievement. However, unlike in the baselinemodel, here we observe the rate of change in the decline in achievement become much smaller over timeand thus by the start of secondary schools, achievement only declines by 0.06σ . This outcome followsfrom the fact that the declines in achievement are being offset by the upward trend in performance thatcomes from remaining in the program.Separate High School Switching CostThe next variant of the model examined is one in which I allow for a separate switching cost once thechild reaches the beginning of high school. In the baseline model, the estimated utility cost of exit is thesame at the start of high school as it is in grades two or three. However, there is reason to suspect thatthis is not the case. Most of the schools in the sample are either elementary schools containing gradesK-7 or secondary schools containing grades 8-12. Therefore, most parents of children entering the startof secondary school are forced to enrol in a new school anyway and this could form a natural exit pointfrom the FI program.As expected, the estimating cost of switching out of FI in grade eight is approximately 15% lowerthan the cost of switching out in the earlier grades (column 3 of Appendix table B.4). Furthermore,as shown in Appendix Figure B.6, this version of the model does a much better job at matching thetransitions out of FI between grades six and eight. Despite these better matches, the model’s predictionswith regards to the role of learning and information are very similar to that in the baseline case. Parentscontinue to weight new information much higher in earlier grades (figure 2.16) which in turn affects theirprogram choices (figure 2.17). Finally, as shown in figures 2.18 and 2.19 , taking away information fromparents continues to have large effects on overall attrition and student achievement.Estimating the Model on the Entire SampleNext, I test to see the sensitivity of the results to how the main estimation sample is constructed. Thebaseline model is estimated on a sample of children grouped at the FSA-year level. For every FI childin the data, a non-FI child in the same FSA-year was randomly selected (without replacement). Here,I estimate the model using the entire sample. The only restriction I impose is that all included FSA-392.7 Extensions and Robustness Checksyears had to have at least one early-FI student.76 Once again, for computational tractability, I selecteda subset (25%) of the sample for estimation. In the new dataset, there are now 33,903 children; 10% ofwhom initially enrol in FI. The estimated parameters for this specification are shown in column (4) ofAppendix tables B.4 and B.5. Figures 2.16–2.19 show that the learning and information results of themain model are robust to this alternative estimation sample.Extending the Model to include a Private School OptionThe final extension of the model I examine is one in which I expand the choice set of parents to includethe choice of private school. The parents’ choice set is now given by C = {0,1,2} where C = 2 refersto a choice of private school, C= 1 refers to a public FI program and C= 0 implies a child enrolled in apublic non-FI program. Unlike FI, which had restricted entry points in grades K or one, parents are freeto enrol their child in private school in every period. This implies that parents’ available choice set eachperiod is given by:Ct ∈={0,1,2} if t ≤ 1 Or (Ct−1 = 1 for t ≥ 2){0,2} if Ct−1 = 0 And t ≥ 2In order to add in a private school choice, several adjustments to the model are required. Parents’ per-period utility from observable characteristics Z will now differ depending on the program of enrolment(δFI,δPri). Switching costs will also differ depending on whether the child is switching into and outof FI (SC10, SC101) or switching into or out of private school (SC20, SC02). A type now consists of thetuple(θFI,θPri,Ψ)where θFI is a taste for FI and θPri represents a taste for private school. Finally,I allow the variance of the utility shocks to differ (from the standard Gumbel distribution assumption)for both the initial enrolment period (t = 0) and also when parents are choosing between non-FI publicand private school. Note that in this model there is no heterogeneity or learning about private schoolability.77 The estimated coefficients of this model are presented in the last two columns in Appendixtable B.4 and the last column in appendix table B.5, while the simulated learning outcomes are shownin figures 2.16–2.19. Despite giving parents an additional choice, the simulated outcomes with respectto parental learning continue to be very much in line with the outcomes of the main model. The weightsparents place on new information range from 0.8 in grade one to 0.11 in grade 8 and a lower signalinduces parents to exit the FI program at a faster pace. Taking away information also leads to declinesin program attrition and student achievement.7876This restriction drops approximately 10% of the sample77The key assumption of the model with private school choice is the Independence of Irrelevant Alternatives (IIA) assump-tion. This assumption states that the addition of private school does not effect the relative odds that a parents opts to enrol theirchild in FI versus the non-FI public option. This could occur if, for example, parents first chose to enrol their child in privateor public school and then conditional on public school, then chose FI or not FI. One way we can quickly examine this notion isto look where the FI students go after they leave FI. If parents were first choosing between private or public, then those parentsthat chose FI must have a strong overall preference for public school. However, on average, students who exit FI go to privateschool at roughly the same rate as the rest of the population (15%).78While not shown, this model does generate different predictions for the no-learning simulations in which parents knowahead of time they will not receive any new information. In this model, approximately 10% of parents who initially enrolledtheir child in FI, now elect to enrol in private school instead. Since the model estimates higher returns to achievement for402.8 Conclusion2.8 ConclusionThis chapter expands on the current literature that examines the role of informational frictions on par-ents’ education choices in school-choice settings. This study is the first attempt to estimate the rolesof learning and uncertainty in a setting where parents make dynamic schooling choices and learn aboutuncertain (child-specific) returns over time. This is in contrast to the current literature which primarilyfocusses on parents’ responses to average quality measures (for example, district report cards) at theinitial point of enrolment. The existing research mirrors the fact that, in school choice settings, averagequality measures are the primary source of information provided to parents by the school districts.I find that parents are very responsive to new information received after the initial enrolment deci-sion. The ability of parents to update their beliefs in the face of new information accounts for a largepercentage of the fraction of students who exit the FI program, especially in earlier grades. In particular,I find that if parents did not have the ability to update, the percentage of students who exit the programwould be 70% lower by the beginning of grade 5. Learning allows parents to better sort their childreninto the program where they can succeed academically; specifically, I find that learning causes an in-crease in test scores of 0.09σ by the start of secondary school. Finally, learning does not only affectparents’ post-enrolment choices, but their initial choices as well. This follows from the fact that parentshighly value the opportunity to learn about unknown returns in the future. Notably, I find that if parentsknow they will not receive any new information about their child’s ability in FI, then 36% of FI parentswill no longer choose to enrol their child in the program.I also find interesting results simulating the model under a variety of counterfactual scenarios. Byproviding parents with perfect information after the initial enrolment decision, I find evidence that manyparents are responding too quickly to signals received in earlier grades by pulling their child from theprogram too soon. This result is consistent with the fact that parents’ beliefs of the variances of thedistribution of FI heterogeneity and signal noise cause them to weight early information at much higherlevels than those suggested by the true values. Furthermore, many of the parents who now decide toremain in FI are those parents whose child’s performance would actually improve upon exiting theprogram. This leads to the interesting result that providing parents with more accurate information willnot necessarily lead to higher test scores. Next, I explore the benefits of lowering the implied cost ofswitching out of the FI program. While lower switching costs do induce more parents to remove theirchild from the program, these children are, on average, at the margin of being better off out of the FIprogram. Therefore, the gains in achievement from making it easier to exit the program are very small(and in some cases negative). Finally, I find that providing parents with information pre-enrolment leadsto large changes in the composition of FI enrolees, which in turn leads to large decreases in attritionand large increases in test scores. Thus, even with learning, there are still large differences between theparents’ optimal sequence of choices ex-ante and the optimal sequence ex-post.Understanding the role that uncertainty and learning play has important policy implications for bothprivate school students, the result is that, compared to the baseline model, average student achievement initially increases.However, it is still the case that the difference decreases over time. In addition, parents are also worse off since they are nowforced to make a choice that yields a lower expected utility.412.8 Conclusiondesigning school-choice programs and the types of information provided to parents. This study findsthat parents are very responsive to the information they receive, particularly in the child’s early years.This emphasizes the importance of making sure parents receive information about their child’s perfor-mance and of using the appropriate evaluation tools and methods. While the model is agnostic as towhat “learning” represents, it is likely not a stretch to suggest that learning is correlated with parentalinvolvement. Any policy that allows parents to have a better sense of how their child is performingin school will allow parents to be better informed about their child’s ability and make more optimalchoices.79 The downside to many of these policies is that they only tell parents about their child’s over-all performance in school. One of the key difficulties for parents is disentangling how much of theirchild’s performance is the result of being in a given program. While it may be difficult to do in practice,providing parents with information about the characteristics of students who succeed in a given program(going beyond average quality measures) will potentially make parents much better off, particularly ifthe information is given to parents prior to the initial enrolment decision. Finally, the results of thecounterfactuals which make it easier for parents to exit FI (which lead to very small achievement gains)suggest that it is better to try and find a way to help out children struggling in a particular program thanit is to simply make it easier for parents to choose some alternative.79In practice, many such policies already exist. Some examples include requiring parents to sign off on all homework ortests given to their child or increasing the amount of feedback parents receive from teachers through additional report cards orparent-teacher interviews.422.9 Figures2.9 FiguresFigure 2.1: Distribution of Achievement by Program of Enrolment0.1.2.3.4Density−3−2.5−2−1.5−1 −.5 0 .5 1 1.5 2 2.5 3Test ScoreThe mean (se) performance of the FI students is .08 (.007)The mean (se) performance of the non−FI students is .04 (.002)Grade 4 − Current Enrollment0.2.4.6Density−3−2.5−2−1.5−1 −.5 0 .5 1 1.5 2 2.5 3Test ScoreThe mean (se) performance of the FI students is .15 (.009)The mean (se) performance of the non−FI students is .01 (.003)Grade 7 − Current Enrollment0.1.2.3.4Density−3−2.5−2−1.5−1 −.5 0 .5 1 1.5 2 2.5 3Test ScoreThe mean (se) performance of the FI students is .04 (.007)The mean (se) performance of the non−FI students is .05 (.002)Grade 4 − Initial Enrollment0.2.4.6Density−3−2.5−2−1.5−1 −.5 0 .5 1 1.5 2 2.5 3Test ScoreThe mean (se) performance of the FI students is .08 (.008)The mean (se) performance of the non−FI students is .01 (.003)Grade 7 − Initial EnrollmentNon−FI Students FI StudentsNotes: This figure shows the distribution of student achievement by program of enrolment for both grade 4 and grade 7. In the top panel, thesample is split depending on whether or not the child is enrolled in the (early) FI program during the actual grade of the test. In the bottompanel, the sample is split depending on whether or not the child was initially enrolled in the FI program. The “test score” here corresponds tothe average of the three FSA components.432.9 FiguresFigure 2.2: Fraction of Children Remaining Enrolled in FI (i)0.500.600.700.800.901.000 1 2 3 4 5 6 7 8GradeNotes: This figure shows the fraction of FI students remaining in the program at the beginning of each grade. The graph accounts for the factthat some children are censored and that others only enter FI in grade one.Figure 2.3: Fraction of Children Remaining Enrolled in FI (ii)0.400.500.600.700.800.901.000 1 2 3 4 5 6 7 8 9GradeDistricts where Secondary School Begins in Grade 9Notes: This figure shows the fraction of FI students remaining in the program at the beginning of each grade. The graph is limited to studentsenrolled in a school district where secondary school begins in grade nine. The graph accounts for the fact that some children are censored andthat others only enter FI in grade one.442.9 FiguresFigure 2.4: Correlation Between Program Exit and Test Score−.04−.020.02.04Residual Exit Probability−1.5 −1.25 −1 −.75 −.5 −.25 0 .25 .5 .75 1 1.25 1.5Normalized Test ScoreNotes: Residual FI Exit is the probability of exit from the program after accounting for year−grade fixed effects. The graph displays the result of a local polynomial regression of residual FIexit on average test scores. The sample is limited to grades 4 and 7 — the grades of the tests. Each dot represents the mean exit rate for a given range of test scores.452.9 FiguresFigure 2.5: Model Fit(a) Initial Enrolment0.1.2.3.4.5.6.7Fraction EnroledGrade K Grade 1No Learning, 2 Types No Learning 3 types Learning, 2 Types Learning 3 TypesActual(b) Program Attrition.4.5.6.7.8.91Fraction Remaining0 1 2 3 4 5 6 7 8GradeActual No Learning, 2 Types No Learning 3 typesLearning, 2 Types Learning 3 TypesNotes: This figure compares the actual enrolment and attrition rates associated with the FI program with the corresponding predicted values from four different versions of the main structuralmodel. The four models are: (i) A model estimated without any learning on the part of the parents and 2 types. (ii) A model estimated without any learning on the part of the parents and 3types. (iii) The baseline model described in section 2.4 with 2 types. (iv) The baseline model described in section 2.4 with 3 types. All displayed predicted results are the average over 75simulated draws of each model while the confidence intervals are the 2.5th and 97.5th percentile values from the 75 simulations.462.9 FiguresFigure 2.6: Model Fit: Correlation of Program Exit and and Test Scores−.020.02.04.06Residual Exit Probability−1.75−1.5−1.25 −1 −.75 −.5 −.25 0 .25 .5 .75 1 1.25 1.5 1.75Normalized Test ScoreActual SimulatedNotes: Sample is limited to all children observed in FI in grades 4 or 7 and a valid test score within the 5th and 95th percentile.Residual exit probability is the residual from a regression of program exit on year and grade fixed effects472.9 FiguresFigure 2.7: Model Fit: Out of Sample(a) Initial Enrolment0.1.2.3.4.5.6.7Fraction EnrolledGrade K Grade 1Actual Predicted(b) Program Attrition.4.5.6.7.8.91Fraction Remaining0 1 2 3 4 5 6 7 8GradeActual PredictedNotes: This figure compares the actual enrolment and attrition rates associated for children who entered kinder-garten in 2004 or later to the corresponding predicted values. These predicted values are based on parameterestimates from a version of the model estimated using only students who entered kindergarten in 1998–2003. Allpredicted results are the average over 75 simulated draws of the model while the confidence intervals are the 2.5thand 97.5th percentile values from the 75 simulations.482.9 FiguresFigure 2.8: Weight Placed on New Information Over Timeµt = (1−Wt)µt−1+WtSt.93.48.32.25.2.16.14.12.3.23.19.16.14.12.11.1Values based on parents’ beliefs.Values based on actual parameters0.2.4.6.81Weight1 2 3 4 5 6 7 8GradeNotes: This solid line graphs the weight parents place on a given signal, Wt , when updating their beliefs about their child’s match quality. Thedashed shows the implied weights from the actual estimated values of the parameters.Figure 2.9: Number of Years Remaining in FI by Signal Decile−4−3−2−1012De−meaned Average Years Remaining1 2 3 4 5 6 7 8 9 10Decile of Signal of FI Ability (η+ε)Signal in Grade K Signal in Grade 4Notes: This figure shows the effect of receiving a signal in a given grade-decile for both grades K and four. The y-axis corresponds to thedifference between the average exit grade in a given decile and the average exit grade across all deciles. All displayed results are the averagetaken over 75 simulated draws of each model while the confidence intervals are the 2.5th and 97.5th percentile values from the 75 simulations.492.9 FiguresFigure 2.10: Simulated Attrition Under No Updating Scenarios.6.7.8.91Fraction Remaining0 1 2 3 4 5 6 7 8GradeBaseline No Info Alt No InfoInitial program enrolment in the baseline case is 50%Initial program enrolment in the no−information case is 32%. Initial program enrolment in the alternate no−information case is 50%Simulated Hazard RatesNotes: This figure displays the simulated attrition and initial enrolment rates for both the baseline model and two counterfactual no-informationsimulations. In these counterfactuals, parents are stripped of all information. The difference between the “no-information” and “Alt no-information” simulations is that in the former parents know they will not receive any information while in the latter parents still act as if theywill receive information in the future. All displayed results are the average taken over 75 simulated draws of each model.Figure 2.11: Difference in Predicted Test Scores From Baseline−.05−.04−.03−.02−.010Difference in Simulated Test Score from Baseline0 1 2 3 4 5 6 7 8GradeNo Info No Info (Alternate)Notes: This figure displays the difference in the average simulated student achievement in grades K–8 between the two counterfactual no-information simulations and the baseline model. For each simulation-grade, the mean is taken over all children observed in the sample forthat year. All displayed results are the average taken over 75 simulated draws of each model while the confidence intervals are the 2.5th and97.5th percentile values from the 75 simulations.502.9 FiguresFigure 2.12: Simulated Attrition Under Ex-post Scenarios.2.4.6.81Fraction Remaining0 1 2 3 4 5 6 7 8GradeBaseline 1/2 SC Revealed Info Targeted 1/2 SCNotes: This figure displays the simulated attrition rates for both the baseline model and three counterfactual simulations. The first simulation,“1/2 SC” involves a permanent, 50% reduction in the estimated cost of leaving FI. The second simulation, “Targeted 1/2 SC”, involves atemporary, one-time only 50% reduction in the switching costs for only those students who scored below average achievement in the previousyear. In the final simulation, “Revealed Info”, a child’s type is perfectly revealed to parents after the initial enrolment decision. All displayedresults are the average taken over 75 simulated draws of each model.Figure 2.13: Simulated Test Scores Under Ex-post Scenarios−.03−.02−.010.01.02.03Difference in Simulated Test Score from Baseline0 1 2 3 4 5 6 7 8GradeRevealed Info  1/2 SC Targeted SCNotes: This figure displays the difference in the average simulated student achievement in grades K–8 between the three counterfactualsimulations described in figure 2.12 and the baseline model. For each simulation-grade, the mean is taken over all children observed in thesample for that year. All displayed results are the average taken over 75 simulated draws of each model while the confidence intervals are the2.5th and 97.5th percentile values from the 75 simulations.512.9 FiguresFigure 2.14: Simulated Attrition Under Additional ex-ante Information Scenarios.6.7.8.91Fraction Remaining0 1 2 3 4 5 6 7 8GradeBaseline Ex ante (Ω=σ2ε) Ex ante (Ω=0.25*σ2ε) Ex ante (full)Notes: This figure displays the simulated attrition rates for both the baseline model and three counterfactual pre-enrolment simulations. Eachcounterfactual simulation involves an additional signal with variance Ω being provided to parents prior to the initial enrolment decision. Alldisplayed results are the average taken over 75 simulated draws of each model.Figure 2.15: Predicted Test Scores Under Additional Information Scenarios.04.06.08.1.12.14Difference in Simulated Test Score from Baseline0 1 2 3 4 5 6 7 8GradeEx ante (Ω=σ2ε) Ex ante (Ω=0.25*σ2ε) Ex ante (full)Notes: This figure displays the difference in the average simulated student achievement in grades K–8 between the three counterfactualpre-enrolment simulations (described in section 6.2) and the baseline model. For each simulation-grade, the mean is taken over all childrenobserved in the sample for that year. All displayed results are the average taken over 75 simulated draws of each model while the confidenceintervals are the 2.5th and 97.5th percentile values from the 75 simulations.522.9 FiguresFigure 2.16: Robustness Check: Weight Placed on New Information Over Time.1.2.3.4.5.6.7.8.91Weight1 2 3 4 5 6 7 8GradeBaseline Grade Trend High School SC Unmatched Private ChoiceNotes: For each of the model extensions discussed in section 2.7, this figure graphs the weights parents place on a given signal, Wt , whenupdating their beliefs about their child’s match quality.Figure 2.17: Robustness Check: Average Program Exit Grade by Grade K Signal−4−202De−meaned Average Years Remaining1 2 3 4 5 6 7 8 9 10Decile of Signal of FI Ability (η+ε)Baseline Trends High School SC Whole Sample Private ChoiceNotes: For each of the model extensions discussed in section 2.7, this figure shows the effect of receiving a signal in a given grade-decilefor grade K. The y-axis corresponds to the difference between the average exit grade in a given decile and the average exit grade across alldeciles. All displayed results are the average taken over 75 simulated draws of each model while the confidence intervals are the 2.5th and97.5th percentile values from the 75 simulations.532.9 FiguresFigure 2.18: Robustness Check: Simulated Attrition With no Information0.2.4.6.81Fraction RemainingGrade K Grade 1 Grade 2 Grade 3 Grade 4 Grade 5 Grade 6 Grade 7 Grade 8Baseline Trends High−School SC Whole Sample Private ChoiceNotes: For each of the model extensions discussed in section 2.7, this figure displays the simulated attrition under the alternate no-informationcounterfactual. In the alternate no-information counterfactual, parents do not receive any new information but parents act as if they will receiveinformation in the future. All displayed results are the average taken over 75 simulated draws of each model.Figure 2.19: Robustness Check: Simulated Achievement with No Information−.08−.06−.04−.020Test Score Difference0 1 2 3 4 5 6 7 8GradeBaseline Trends High School SC Whole Sample Private ChoiceNotes: For each of the model extensions discussed in section 2.7, this figure displays the difference between the average simulated studentachievement in grades K–8 for the counterfactual no-information simulation and the predicted achievement in the baseline version of themodel extension. In the alternate no-information counterfactual, parents do not receive any new information but parents act as if they willreceive information in the future. Averages are only calculated over children originally predicted to be enrolled in FI. All displayed results arethe average taken over 75 simulated draws of each model.542.10 Tables2.10 TablesTable 2.1: Summary StatisticsEnrolled in FI Not Enrolled in FIMean Std Dev Mean Std Dev DifferenceFemale 0.55 0.50 0.49 0.50 0.06∗∗∗ESL 0.04 0.21 0.31 0.46 -0.27∗∗∗English at Home 0.87 0.34 0.65 0.48 0.22∗∗∗Closest FI School (km) 2.16 2.05 3.10 2.66 -0.94∗∗∗Closest nonFI School (km) 2.48 2.15 2.69 2.46 -0.21∗∗∗Distance (FI – non-FI) (km) 0.95 1.07 0.98 1.15 -0.02∗∗∗DA – Unemp % 0.06 0.05 0.07 0.05 -0.01∗∗∗DA – % Univ Degree 0.24 0.13 0.19 0.12 0.05∗∗∗DA – ln(Average HH Income) 10.98 1.14 10.93 1.11 0.05∗∗∗DA – ln(Average Home Value) 12.51 1.69 12.39 1.85 0.12∗∗∗Observations 26,063 222,454Notes: All summary statistics are based on those observed at the time the child is observed in grade K. Achild is considered to be enrolled in FI if they are observed in the FI program in either kindergarten or grade1. Distance (FI – non-FI) is the difference of the distance to the closest FI school and closest non-FI school.All distance variables are the calculated driving distance between the centroid of the child’s postal code andthe longitude and latitude coordinates of the individual schools. All dissemination are variables (below thedashed line and indicated by a “DA” in front) are in real 2004 $. % Univ Degree uses the total population15 years and older as its denominator. This was done in order to make the variable consistent across censusyears. * p < 0.1, **p < 0.05, ***p < 0.01552.10 TablesTable 2.2: Correlation of Test Scores with Program Exit(1) (2) (3) (4) (5)Test Score-0.269∗∗∗ -0.269∗∗∗ -0.287∗∗∗ -0.322∗∗∗ -0.363∗∗∗[0.040] [0.035] [0.031] [0.028] [0.027]Observations 22581 22581 22581 22216 17906Pseudo R-squared 0.079 0.097 0.124 0.171 0.199Mean Exit Rate 0.085 0.085 0.085 0.086 0.104Child Controls No Yes Yes Yes YesDA Controls No No Yes Yes YesSchool FE No No No Yes NoSchool year FE No No No No YesNotes: Standard errors clustered at the school level are in brackets. Displayed coefficients are trans-formed odds-ratios, eβ −1, where β is the coefficient from the logistic regression. Test Scores is theaverage of the numeracy, reading and writing tests — standardized to be mean 0 and variance 1 at theyear-grade level. All regression include year and grade fixed effects. "Child" controls include gender,aboriginal status, ESL status, special-ed status, a binary variable for whether the child speaks Englishat home, a binary variable for whether the child attends a private FI school, gifted status, binary vari-ables for quarter of birth, a binary variable if the school the child attends ends in the current gradeand the driving distance between the child’s home address and the closest FI school. The driving dis-tance is the value that the variable took when the child was enrolled in kindergarten. "DA" controlsrefer to the dissemination area of the child’s home address. They are the unemployment rate, % withbachelor degree or higher, log(real average household income), % immigrants and % homeowner.All regressions are limited to all children originally enrolled in FI in either grade K and 1 and whoare currently enrolled in the program. The dependent variable is an indicator variable equal to 1 ifchild exited the program by the beginning of the following year.* p < 0.1, **p < 0.05, ***p < 0.01Table 2.3: Correlation of Test Scores with Program Exit Across GradesPanel A - All Students(1) (2) (3) (4) (5)Test Score - Grade 4-0.429∗∗∗ -0.425∗∗∗ -0.428∗∗∗ -0.449∗∗∗ -0.514∗∗∗[0.041] [0.044] [0.042] [0.040] [0.036]Test Score - Grade 7-0.113∗ -0.114∗∗ -0.134∗∗∗ -0.190∗∗∗ -0.218∗∗∗[0.063] [0.053] [0.046] [0.039] [0.041]Observations 22581 22581 22581 22216 17906P-Value 0.000 0.000 0.000 0.000 0.000Child Controls No Yes Yes Yes YesDA Controls No No Yes Yes YesSchool FE No No No Yes NoSchool–Year FE No No No No YesNotes: Standard errors clustered at the school level are in brackets. Displayed coefficients aretransformed odds-ratios, eβ −1, where β is the coefficient from the logistic regression. Test Scoresis the average of the numeracy, reading and writing tests - standardized to be mean 0 and variance1 at the year-grade level. All controls are identical to those described in table (2.2). The dependentvariable is an indicator variable equal to 1 if child exited the program by the beginning of thefollowing year. “P-Value” is the p-value from the test of equality between the coefficients on thegrades 4 and 7 test scores (the null is that the difference is zero), * p< 0.1, **p< 0.05, ***p< 0.01562.10 TablesTable 2.4: Residual Change in Test ScoresPanel A - All StudentsLeft FI at Grade 4 All Other StudentsMean Std Error Count Mean Std Error Count DifferenceStandardized Value Added 0.14 0.03 473 -0.0008 0.002 132175 0.15∗∗∗Panel B - Children in FI in Grade 4 OnlyLeft FI after Grade 4 Remained in FIMean Std Error Count Mean Std Error Count DifferenceStandardized Value Added 0.14 0.03 473 0.06 0.01 9863 0.09∗Notes: Each cell displays the average residual value added test score among all children with the given characteristics. Residual valueadded is the residual from a regression of the standardized value added measure between grades 4 and 7 onto gender, ESL status,special-ed status, a binary variable for whether the child speaks English at home, FSA-year fixed effects and variables indicatingif the child moved. The standardized value added measures are based on the method used in Rivkin et al. (2005) and are designedto purge the basic value added measures of any “regression-to-the-mean effect.” See text for details on how these measures areconstructed.572.10 TablesTable 2.5: Model Results# of Children = 15448 Draws = 100; Log Likelihood = -48504Program Choice Parameters Test Score ParametersAbility/ Learning and Preferences Standard Deviationsσv,par0.23 σε0.617[0.07] [0.003]σ0,par0.81 σ00.4[0.12] [0.02]α 1.03[0.05]Observable Characteristics Observable Characteristicsln(Relative Distance)-0.038FI0.08[0.004] [0.02]Female0.03Female0.12[0.005] [0.01]Special Ed-0.161Special Ed-0.35[0.029] [0.03]English at Home0.132English at Home-0.05[0.011] [0.02]ln(Mean Home Value)0.004ln(Mean Home Value)0.015[0.002] [0.004]% Univ-0.002% Univ0.99[0.02] [0.05]Year0.002Year-0.02[0.001] [0.002]Switching CostsPrivate0.58Switch Into FI-1.67 [0.02][0.06]Grade0.017Switch Out of FI6.49 [0.004][0.22]Notes: Standard errors in brackets are calculated using the outer product gradientmethod. ln(relative distance) is held constant for each child at the value the variabletook at the time the child was in kindergarten. The year variable starts at 0 for the year1998. σ2v,par and σ20,par are the estimated beliefs that parents have over the variance ofthe signal noise and variance of the prior distribution of ability, respectively. Switchinto FI only applies to parents first enrolling their child in FI in grade 1.582.10 TablesTable 2.6: Change in Composition of Initial Enrolees from Pre-Enrolment Informationσ2∗ = σ2ε σ2∗ = 0.25σ2ε σ2∗ = 0 (Full Info)New Entrants % of all Parents 0.16 0.15 0.15Mean of η 0.23 0.35 0.44No Longer Enter % of all Parents 0.2 0.21 0.22Mean of η -0.2 -0.26 -0.34Notes: This table examines the change in the initial composition of FI enrolees in the counterfactuals wherean additional signal is provided to parents prior to the initial enrolment decision. “New Entrants” are parentswho enrol their child in FI as a result of the extra information. “No Longer Enter” refers to parents whooriginally enrolled their child in FI, but now do not. All displayed results are the average over 75 simulateddraws of each model59Chapter 3: The Impact of Dual LanguageLearning on Student Outcomes3.1 IntroductionThe past decade has seen a large growth in the number of parents enrolling their children in foreignlanguage education programs. In this era of globalization, many parents perhaps feel that learning anadditional language provides their children with unique comparative advantages and improves futurelabour market prospects. Bilingualism can aid one’s prospects in the public and private sectors. Forexample in Canada — a country where both English and French are official languages — many FederalGovernment jobs require bilingualism as a condition of employment. Even politicians are beginning toview bilingual education as an important component of a nation’s overall education strategy and futureeconomic competitiveness. According to former United States Secretary of Education Arne Duncan,“It’s absolutely essential for the citizens of the United States to become fluent in other languages...”.80 InCanada, the Federal Government subsidizes bilingual education by providing funding to the provincesto support French language programs. The most popular of these programs is the French Immersion(FI) program — an immersion language program attended by 10% of all elementary school children inCanada and is growing in popularity every year (FSL Enrolment Trends, 2012). In a typical immersionlanguage program, children are primarily taught in the foreign language and it is not until the later gradesthat the children’s “native” language is introduced into the classroom. Immersion language programsare one of the most effective ways of teaching foreign languages to children (see Genesee, 1994 orArchibald et al., 2004 for reviews). These types of programs are not just growing in popularity inCanada. According to the Centre for Applied Linguistics (CAL), the number of U.S. schools offeringimmersion language education increased from 92 in 1990 to 278 in 2000 to 433 in 2011. Immersionlanguage programs make up a small, but growing segment of the U.S. student population.81Immersion language programs are an interesting case study because while studies show that they areeffective ways of learning a new language, what is less clear are the trade-offs involved that come withimmersing a child in a foreign language. Learning topics like math and science in French might not bethe most efficient or effective way to develop a child’s skills in these areas; the child might not fullyunderstand all of the information being conveyed by his or her teacher. There also might be substitution80In addition, see Wiley et al. (2012) for a similar claim by the Council on Foreign Relations.81For more local evidence, Harris (2015) reports on the rise of immersion language programs being offered in New YorkCity.603.1 Introductioneffects in play with children needing to substitute time away from other areas in order to develop theirFrench language skills. This idea also leads to issues of specialization and comparative advantage. Onecommon refrain about FI is that it provides a more “well-rounded education”; however, it is not clearthat this is welfare improving. Perhaps children with a proclivity towards fields such as math are betterserved by being taught these subjects in their native language. In addition to cognitive outcomes, theseprograms have potential implications for non-cognitive outcomes. Students who end up struggling inan immersion language program might also be at risk for behavioural issues if they are bored or notenjoying school.While the discussion so far has focussed on the negative effects of immersion language programs,there are channels through which these programs can lead to positive outcomes. One example is throughpositive peer effects. This would occur if parents positively sort into immersion language programs orif parents remove students who are struggling in the program, and thus the students who remain are allabove-average academically (see chapter 2). There is also a strand of the education literature that arguesthat immersion language programs can lead to overall cognitive benefits (see, for example, Cummins,1979, 2000). Given all of these potential costs and benefits, it remains an empirical question as towhat the effects of immersion language programs are on the cognitive outcomes of the students. Mostof the Education literature that examines immersion language programs looks at the FI program inCanada. These studies are typically based on either small field experiments in individual schools orschool boards or use large administrative datasets. The general consensus of the Education literature isthat while children in FI might struggle in earlier grades in subjects like math and English language arts,in later grades — particularly once English is introduced into the classroom — the FI children are ableto catch up if not surpass their peers in the traditional English program (see, for example, Genesee 2007,Paradis et al., 2011). Since both FI entry and exit are endogenous, one of the major concerns with thisliterature is that they tend to not adequately account for the the dual biases of selection bias (in terms ofwho enters the FI program) and survivorship bias (in terms of who remains in FI at the time the tests areadministered).The main contribution of this paper is that I am able to address both selection and survivorship biasby taking advantage of a longitudinal administrative data that follows children over time from entry intoprimary school and contains detailed information about each child. In order to control for survivorshipbias, I focus on the impact on future academic achievement from initial enrolment in the FI program.In order to control for the fact that selection into FI is non-random, I use a control function approachand also instrument for FI enrolment using the relative distance to the nearest FI school within a givenneighbourhood. As I show below, controlling for the neighbourhood of the child alleviates some ofthe usual concerns associated with using distance as an instrument for choice of schooling (Altonji etal., 2005b). These concerns primarily arise because parents choose where they want to live and it iswell established that one of factors parents care about is the quality of the schools in the area (see, forexample, Black, 1999 and Black and Machin, 2011). It could be that the parents for whom schoolingis an important factor in determining location are also those parents whose children preform well inschool regardless of the program of enrolment. However, by looking at relative distance within a given613.1 Introductionneighbourhood, I am implicitly allowing parents to choose the area in which they want to live, butconditional upon this chosen area, where parents actually end up is likely more a function of exogenouscircumstances such as the availability of housing, actions of other buyers, or possibility even schoollocations. I discuss the credibility of this argument by showing correlations between relative distanceand other observables both with and without looking within a given neighbourhood. The results showthat looking within neighbourhoods eliminates or greatly reduces many of the observed correlationsbetween relative distance and the remaining observable characteristics.An additional contribution of this paper relates to the issue of sample attrition when looking atoutcomes in an education setting. Specifically, in my data, I am able to observe children even if theyenrol in a private or independent school. In contrast, in many datasets children are not observed if theyare not enrolled in a publically funded school, which can lead to issues of differential attrition. If privateschool is not a realistic option for the type of children being studied, then this is likely not a major sourceof bias; however, if private school is a viable outside option, then not accounting for children in privateschool potentially leads to biased estimates.82 In this paper, I am able to directly speak to this effect bycomparing the results both with the private school students included in the sample and also with privateschool students excluded from the sample.I find that initial enrolment into FI has large negative and statistically significant effects on children’smath, reading and writing scores in grade 4 on the order of -0.46 test score standard deviations (hereafter,σ ), -0.37σ and -0.35σ , respectively. For grade 7, I find that while initial enrolment into the FI programstill has negative and significant effects, the magnitudes for both reading and writing are smaller thanthey are in grade 4. Finally, for grade 10, I find no effect of initial FI enrolment on students’ Englishexam scores, but negative and significant effects on exam scores in Math (-0.25σ ) and Science (-0.33σ ).Therefore, even after accounting for both selection and survivorship bias, I still find at least a partialcatch-up effect. Students who initially enrol in FI are able to fully catch up to their non-FI peers interms of English Language Arts and partially catch up in technical subjects such as Math and Science,but negative effects do remain by grade 10. Next, I test whether these results differ depending onthe gender of the child. A large literature emphasizes gender differences with regards to languageacquisition and it is possible this is reflected in terms of the children’s performance in the FI program.However, I find that the impact that FI has on both boys and girls is nearly identical. Finally, I explore thepossibility that the estimated causal impact of FI is not actually driven by the FI program, but by parentswho are enrolling their child in private school instead of FI. In order to examine this issue, I run severalspecifications controlling for initial and future private school enrolment. Across each specification, I findthat accounting for private school enrolment leads to an improvement in the estimated causal impact ofFI. The actual differences vary by grade and subject, ranging from 0.05σ to 0.1σ . These differencesare often both economically meaningful and statistically significant. However, these changes are notenough to qualitatively change our previous findings; initial enrolment into FI continues to cause large,82According to the What Works Clearinghouse, acceptable differential attrition rates range from 6–10pp in order to ensurethat coefficients are not significantly biased. (What Works Clearinghouse, 2014a, b). But, these results are primarily based onoverall sample attrition in randomized control trials that take place across a variety of settings and are not specifically aboutprivate school enrolment.623.2 The French Immersion Programsignificant declines in achievement in grade 4, but these declines either do not last (for English examscores) or are smaller in magnitude (for Math and Science) by grade 10.The remainder of this chapter is organized as follows. Section 3.2 describes the FI program. Section3.3 reviews the economics and education literature about immersion language programs. Section 3.4describes the data. Section 3.5 describes the empirical strategy. Section 3.6 presents and discusses theresults and finally section 3.7 concludes along with a discussion of future topics of interest.3.2 The French Immersion ProgramAs described in chapter 2, French Immersion refers to a publically funded immersion language pro-gram offered throughout most of Canada. The primary goal of the program is to promote bilingualism;French is one of two official languages of Canada, with English being the second. Since the data for thischapter consists solely of students in the province of British Columbia (BC), the remainder of this sec-tion describes the program as it pertains to BC (although the program generally has a similar structurethroughout Canada). The program is designed for parents and students who speak little to no French.FI is primarily offered through local public school boards and any child is free to initially attend an FIprogram as they would attend any other public school. While the FI enrolment rate in BC is approx-imately 10%, this number understates the true demand for FI because enrolment is often constraineddue to a limited number of classrooms and provincial rules on class sizes. The curriculum taught inFI mirrors that taught in the traditional public school program; the major difference is the language ofinstruction. For elementary school children, the most common type of French Immersion program is theEarly French Immersion program which children can enter in grades kindergarten (K) or one. In othertypes of French Immersion programs such as Late Immersion entry occurs in grade six. The focus ofthis paper is purely on the Early French Immersion program. The exact distribution of classroom time inEnglish and French varies by school district. For example, the Vancouver School Board requires that theprogram be 100% French in grades K-3, introduces English in grade 4 at an 80/20 split favouring Frenchand for the remaining years until grade 7 approximately 50-80% of the classroom instruction time re-mains in French. In practice, most school districts have a language distribution that closely follows theVancouver example.The French Immersion program is not without controversy. One of the biggest criticisms of theprogram is that it is a form of tracking or streaming. Examples of headlines in newspapers and newsmagazines include “French Immersion is Education for the Elite” (Vancouver Sun, 2008) and “Just say’Non’: The problem with French Immersion” (Maclean’s, 2015). The general argument is that childrenwith lower cognitive ability either do not enter the FI program or are the first to leave after experienc-ing academic difficulty. Thus, the students who remain in the program are all above average in termsof cognitive skills. Further adding to the controversy is that one also gets a divide in terms of socioe-conomic status with FI children coming from wealthier households and having more educated parents(Worswick, 2003). For their part, the information that school districts convey to parents generally mir-rors the literature in that they acknowledge that children might struggle early on, but are able to catch-up633.3 Literature Reviewin later grades as English is gradually introduced into the classroom.833.3 Literature ReviewAs discussed in the introduction, immersion language programs are seen as an excellent way for childrento learn a foreign language. There is, however, a subset of the education literature that argues thatlearning a second language (hereafter “L2” — from the education literature) can aid the development ofa child’s first language (“L1”) and vice versa. A main proponent of this theory is Cummins (1979, 2000)who calls one aspect of the theory the “Interdependence Hypothesis.” Cummins’ argument is that thereis a common set of skills that both influences and is influenced by a child’s language development andthis is true regardless of the language of instruction. Therefore, even students who are being taught in anL2 language are developing skills that will aid their L1 development. As long as children are exposedto their L1 language regularly (in settings besides school) and their language skills are sufficientlydeveloped, then having classroom instruction in L2 will not lead to any negative effects on the child’sL1 development. Cummins (1976, 1979) also argues that cognitive development is a function of thelevel of bilingualism attained; however, a child will only experience positive cognitive development ifhe or she reaches a sufficiently high proficiency in both languages. For this reason, this theory is calledthe “threshold hypothesis”.84A sizeable literature attempts to evaluate the impact of immersion language programs and the FIprogram in Canada in particular. This literature can be split up into two main groups: longitudinalstudies that follow a small number of students in both FI and non-immersion programs over time andcross sectional studies that utilize administrative data. Most of early literature (c.f. Genesee, 1977,1978or Swain and Lapkin, 1982) finds that FI has either a negative or no effect on children in earlier grades,but by later grades the FI children either outperform the non-FI children or there are no significantdifferences across performance. The big drawback of most of these earlier papers are that they arebased on a small sample of students. However, more recently, similar results are found in a seriesof papers examining the performance of all immersion children in the Canadian province of Ontario(Hart, Turnball and Lapkin, 2001, 2003). A bigger concern with all of these papers is that they are allsubject to both selection and survivorship bias. Selection bias arises if parents’ unobserved preferencefor enrolling their child in FI is correlated with factors related to a child’s performance in school. Forexample, it might be the case that parents that place their child in FI are also those parents that exertmore effort into seeing their child do well in school. Furthermore, since the authors look at grade-by-grade averages, their results are also subject to survivorship bias if only the “cream of the crop” remainin FI (see chapter 2).This key contribution of this paper to the literature discussed above is that I explicitly attempt toovercome the issues of selection and survivorship bias by taking advantage of a longitudinal admin-83For example, on the Vancouver School board website, parents are told that, “Research shows that sometime betweenGrades 4 and 6, Early French Immersion students from English-speaking homes catch up to (and sometimes surpass) the levelof English skills of their regular program counterparts...”.84For a deeper review of this literature, see Bournot-Trites and Tellowitz (2002).643.4 Dataistrative dataset that follows children over time and contains detailed information about each child. Iavoid the issue of survivorship bias by focussing on the impact on future academic achievement frominitial enrolment in the FI program. I am able to do this because I observe children in every grade andnot simply the grade of the standardized tests. In order to control for the fact that selection into FI isnon-random, I instrument for FI enrolment using the relative distance to the nearest FI school withina given neighbourhood. I am able to construct such an instrument because the data contains the homepostal code of the child, which I am able to map to longitude and latitude coordinates.Moving away from the FI program in Canada, a paper by Steele et al. (2015) examines variousimmersion programs in the Portland school district. Here, the authors use lotteries in the event demandfor an immersion program exceeds supply in order to identify the causal impact of these immersionprograms. Steele et al. (2015) find that children in immersion outperform non-immersion children bybetween 0.2σ– 0.5σ in reading and -0.03σ to 0.25σ in mathematics between grades 3–8; however, themath results are all insignificant. While the authors have a clean identification strategy, there are reasonsto think that the results of this paper do not generalize to the FI program in Canada. First, while theinstitutional features of the individual school programs vary, they are all different from the FI programin Canada. Approximately one third of the elementary programs studied are bilingual programs —meaning there is a 50-50 split between English and the immersion language. In contrast, the FI programis 100% French in grades K-3 with English language Arts only gradually introduced in grade 4 at an80/20 split favouring French. The remaining two-thirds of programs also have a language distributionthat differs from FI. The Portland immersion programs start off at 90% in favour of the immersionlanguage and this percentage declines by ten percentage points each year until reaching 40–50% bygrades 4 or 5. The distribution of language throughout an immersion program is potentially an importantfactor influencing a child’s performance. Second, the authors claim that approximately 50% of studentsin the immersion classes (i.e. those that start off with only 10% of classroom instruction in English) arenative speakers of the immersion language. In contrast, as described above, the FI program is explicitlydesigned for non-French speaking households. This potentially impacts students through peer effectsassociated with the immersion programs.853.4 DataThe data used in this chapter is nearly identical to the data used in chapter 2 and described in greaterdetail in Appendix A, and thus in this section I only discuss those aspects of the data that are relevantto this chapter only. There are two important changes to the data from what I described in chapter 2.First, I further limit the sample to children with at least one valid test score. The major change causedby this restriction is that it forces all of the children from the 2009 entering cohort to be dropped from85An additional difference is that the immersion programs in Portland studied by the authors are those in Spanish, Russian,Japanese and Korean. Of these, only Spanish is in the same category of languages as French and uses the same alphabet asEnglish, which potentially increases the burden on the enrolling students. Additionally, none of these languages are officiallanguages as French is in Canada. This can potentially affect both the quality of the teachers (with Canada being able to drawfrom a larger pool of qualified applicants) and also the level of exposure to the immersion language outside of school.653.5 Empirical Strategythe sample since they are only observed until grade 3. Excluding these children, approximately 8% ofstudents are observed to be missing a test. The final sample contains 201,041 students, 20,468 of whominitially enrol in the FI program.86 Secondly, in addition to examining elementary school outcomesas measured by the Foundation Skills Assessment tests, in this chapter I also look at outcomes at thesecondary school level.Data on student achievement at the secondary school level comes in the form of province-wide ex-aminations for specific courses at the grade 10 level. The courses I have chosen are: English, Science,and two math courses: Principals of Mathematics and Foundations of Mathematics and Pre-Calculus.The reason there are two math courses is because starting in the 2010/2011 school year, the BC govern-ment changed their mathematics curriculum at the secondary school level. Additional details about thispolicy change and the mathematics courses available are found in Appendix A. Each of the courses Ihave chosen make up the core of the grade 10 curriculum. All of the grade 10 provincial examinationsincluded in this analysis were mandatory and accounted for 20% of a student’s final grade. Studentsare permitted to retake the provincial exams. Ideally, I would follow the literature and use the first testscore (Dobbie and Fryer, 2014), but that information is not in the data. Instead, I use the average of thestudent’s test scores. Exam retaking rates in the data range from 1.7% to 4.5%. Appendix A containsadditional details regarding these provincial exams.3.5 Empirical StrategyThe main estimating equation of this paper is given by:Yigtn = X′itβ + γFIi+δn+δt + εint (3.1)where Yigtn is an outcome of child i in grade g who entered kindergarten in year t and lives in neighbour-hood n. FIi is a binary variable equal to one if the child initially enrolled in the FI program in eithergrade K or 1 and zero otherwise, Xit is a set of demographic controls, δn is a set of neighbourhood fixedeffects and δt is a set of entering cohort-year fixed effects. The main parameter of interest is γ , whichyields the causal impact of initial enrolment in the FI program on student outcomes. In practice, I esti-mate equation (3.1) separately by subject and grade. The observable characteristics are split up into twocategories: child characteristics (age, gender, language spoken at home, special education status, ESLstatus, and dummy variables for month of birth) and census characteristics at the dissemination arealevel based on the child’s listed six-digit postal code (household income, percentage with bachelors de-gree or higher, unemployment status, home prices and percent home-ownership). All of the observablecharacteristics are based on the values observed while the child was enrolled in grade K.Equation (3.1) can be estimated via OLS; however, it is likely that the OLS estimation is biasedbecause of unobserved factors that are correlated both with student outcomes and FI enrolment. In table86The sample is also limited to FSA and catchment areas with at least 15 observations. This restriction affects less than0.1% of the sample.663.5 Empirical Strategy3.1 below I show that FI children are positively selected based on observable characteristics. There maybe differences along unobserved dimensions as well. These unobserved dimensions could include char-acteristics such as parents that place a greater emphasis on education or parents that are more involvedin their child’s schoolwork or even parents that were in French Immersion themselves.Several methods exist which are designed to account for these potential biases. Suppose that wehave some instrument Z, such that Z is independent of ε1 given Xit , δn,δt . The standard TSLS setup isgiven by:Yint = X′iβ + γFIi+δn+δt + εint (3.2)FIi = Z′pi+X′iβ′+δ ′n+δ′t +uint (3.3)which is estimated using standard techniques. Since the endogenous variable, FI, is a binary variablewith a small take-up rate, it is possible that the TSLS estimates are imprecise despite the overall largesample size. In order to address this possibility, I turn to a control function (CF) approach first describedby Heckman (1976, 1979). From equation (3.1), we have that:E [Yit |X,FI,Z] = X′iβ + γFIi+δn+δt +E [ε1it |X,FI,Z] (3.4)now suppose further thatFI = 1(X′pi1+δ ′n+δ′t +Zpi2+ ε2 > 0) (3.5)Z ⊥ ε1,ε2ε2 ∼ N(0,1)ρ = corr(ε1,ε2)ε1 = ρε2+η η ∼ N(0,σ2η) (3.6)where η is a normally distributed term with mean zero that is independent of each of the error terms.From equations (3.5) and (3.6), it follows that:E [ε1it |X,FI,Z] = ρE [ε2it |X,FI,Z]= ρ[FIλ (X′pi1+δ ′n+δ′t +Zpi2)− (1−FI)λ (−X′pi1−δ ′n−δ ′t −Zpi2)]≡ vwhere λ () is the inverse mills ratio function. The control function approach involves regressing Y onX, FI, δ ′n ,δ ′t and v. However, v is unknown since we do not know the true values of pi1, pi2 or the fixedeffects, δ ′. Instead, I calculate vˆ using the predicted values in a probit regression and then regress Y onX, FI, δn, δt and vˆ:Yigtn = X′itβ + γFIi+δn+δt +ω vˆ+ εint (3.7)The advantage of the control function method is that it is more efficient than the TSLS approach. The673.5 Empirical Strategydisadvantage is that the results are potentially sensitive to the specification in the first-stage probit re-gression. However, I have also run the estimation including additional interaction terms in the first-stageprobit equation and find that it does not materially alter any of the results. For these reasons, the controlfunction approach is my preferred estimation method.The instrument I have chosen is the relative distance of the child to the nearest FI school. Define dFIi jto be the distance of child i to school j — where school j offers FI — and dnonFIi j similarly for non-FIschools. These distance measures are calculated using the given postal-code of the child. Using thesevalues, the distance instrument is calculated as follows:distancei = log(min j{dFIi j }min j′{dnonFIi j′ })(3.8)In words, equation (3.8) says that distance is defined as the log of the ratio of the distance to the closestschool that offers FI to the distance to the closest school that does not offer FI.Interpreting the Coefficient on FI in the Presence of Unobserved HeterogeneityIf the impact of initial FI enrolment is the same for all children, then both the TSLS and CF estimationmethods estimate the average treatment effect (ATE) of initial FI enrolment. On the other hand, ifthere is unobserved heterogeneity in the impact that initial FI enrolment has on student outcomes (seechapter 2), then neither the TSLS nor control function approach identifies the ATE without additionalassumptions. In this case, the actual impact that initial FI enrolment has on a child would be givenby γi and estimating a constant parameter, γ , as in the TSLS and CF methods will involve averagingover these individual effects. For the TSLS estimation, under a set of assumptions the estimate of γis known as the Local Average Treatment Effect or LATE (Angrist and Imbens, 1994). The effect is a“local” one because by using distance as an instrument, the TSLS estimate of the causal impact of initialFI enrolment on student outcomes is identified off the “compliers” — those families induced to enroltheir child in FI as a result of living closer to an FI school. The two main assumptions with regard toestimating the LATE are: (i) independence — that relative distance is independent of the outcomes giventreatment and (ii) monotonicity — for each individual, the probability of treatment is strictly increasingor decreasing in the relative distance to the nearest FI school. The interpretation of the estimates in thecontrol function approach is more complicated. Here, the estimate of γ is a weighted average treatmenton the treatment (ATT) where the weights are a function of the probabilities of enrolment (or, moreaccurately, their densities) given distance and the other observable characteristics (see, for example,Heckman and Vytlacil, 2005 and Blundell and Mias, 2007).87In both methods, the instrument plays a key role by increasing the probability of FI enrolment for87As shown in Vytlacil (2002) there is a relationship between the LATE and control function approaches. To see this, notethat both approaches require the independence assumption to hold and that, if we assume additive separability with regards todistance in the selection model given by equation (3.5), the monotonicity assumption must be true as well. More generally,Vytlacil (2002) shows that given the LATE assumptions, there always exists a selection model that rationalizes the observeddata.683.5 Empirical Strategya subset of the population. These are possibly parents with an otherwise medium or low preference forFI are likely near the margin between enrolling or not enrolling their child in the program. This raisesthe question of whether performance in FI differs for those children whose parents chose the programbecause they lived closer to an FI school. This is an issue I return to in the conclusion of this chapter.Furthermore, note that the program choice equation in equation (3.5) is additive in both relative distanceand the unobserved error term. In practice, this specification might be misspecified if the instrument isjust one dimension parents take into account in conjunction with other factors; that is, there might alsobe interactions between the relative distance instrument and the remaining observable and unobservablecharacteristics. However, with regard to the former, I find that all of my estimated CF results are robustto re-estimating equation (3.5) with additional interaction terms. The potential correlation betweendistance and unobservables is discussed in the next section in conjunction with the exclusion restriction.Examining the Validity of the Exclusion Restriction Regarding Relative DistanceThe two conditions required in order for an instrument to be valid are relevance — that the instrumentis able to predict the endogenous variable, in this case early FI enrolment — and excludability — thatthe instrument is unrelated to the outcomes of interest except through its impact on FI enrolment. Oneof the biggest concerns about using distance as an instrument is that the decision about where to live isnot random and schooling is often a big factor in parents’ decisions (see, for example, Black, 1999 orBlack and Machin, 2011). If the parents who choose to live in areas with greater access to an FI schoolare also those parents who differ along other dimensions related to student achievement, then this is aviolation of the exclusion restriction. In order to address these concerns, I follow a method similar inspirit to what other papers have done and include a set of neighbourhood fixed effects in all regressions(see, for example, Foley, 2012 or Barrow et al., 2013). The idea behind including neighbourhood fixedeffects is that the model now permits parents to choose the area in which they want to live; that is, theneighbourhood fixed effects are soaking up the sorting behaviour of parents. But, conditional upon thatneighbourhood, the precise location where parents end up is now more a function of exogenous factorssuch as the supply of housing, actions of other buyers and school locations. Thus, I am arguing thatthe exclusion restriction is satisfied conditional upon looking within a given neighbourhood. I define aneighbourhood using two primary measures: Forward Sortation Areas (FSA) and catchment areas. AnFSA is a geographical region with an average of 7,000 households. An FSA is defined by the first threedigits of a student’s postalcode. Since postalcodes are included in the data, I know exactly in which onea student resides. A school’s catchment area is a region in which all students who reside get first priorityfor attending a specific (non-FI) school.88 Unfortunately, I do not have geocoded data on the location ofthe catchment areas over time. In addition, catchment areas are based on street addresses, which are notin the data as well. As a result, I impute catchment areas by calculating the school that most childrenattend within a given region.89 The combination of each region in which a given school is the most88Some school districts do have catchment areas for FI schools; however, here I focus on non-FI catchment areas.89Postalcodes themselves ended up being too small with not enough observations; therefore, I aggregated up to the censustract level when calculating the school that most children in a given year and grade attended the most.693.5 Empirical Strategyattended is labelled the “catchment area”. Because of the noisiness associated with the construction ofthe catchment areas, all of the results presented in the main tables use FSA as the primary neighbourhooddefinition. However, in the appendix I also present the results of the main specifications using catchmentareas instead of FSA fixed effects.While the exclusion restriction on distance is not directly testable, we can at the very least check tosee if the observable characteristics are related to distance. This method is similar in spirit to that usedin other papers that evaluate the validity of using distance as an instrument (c.f. Altonji et al., 2005b,Cullen et al., 2005 and Barrow et al., 2013). Table 3.3 shows the results of a regression of individualcharacteristics, Xi, onto the log of the relative distance measure and year fixed effects, with and withoutneighbourhood fixed effects. The results show that by just including year fixed effects in the regressions,distance is highly correlated with several of the remaining observable characteristics (panel A) in both astatistically and economically significant sense. For example, from Panel A we see that a 100% increasein one’s relative distance to an FI school is associated with a 6.5pp decrease in the likelihood the childis an ESL student (relative to a mean of 30%), a 6pp increase in the probability the child speaks Englishat home, a 1.6pp increase in the share of the area with a bachelors degree or higher and a 1.2pp increasein the share of the area made up of home owners. Furthermore, most of the signs on the estimatedcoefficients are consistent with a story that suggests children of parents who live closer to FI schoolswould also be expected to preform better in school. 90In contrast to the results in panel A, including neighbourhood fixed effects (panel B) causes distanceto have either no or a small correlation with these variables. Every coefficient in panel B decreases inmagnitude when compared to panel A and now the coefficients on distance are significant for only ESLand speaking English at home — two variables which are highly correlated. Furthermore, the signs onmany of the coefficients are no longer consistent with a story that distance would be correlated withhigher student achievement. While it remains to be the case that living closer to an FI school is associ-ated with a lower probability of being enrolled in an ESL program, a higher probability of speaking en-glish at home and lower unemployment, it is also the case that living closer to an FI school is associatedwith lower levels of education, lower levels of income and lower home ownership rates; although, noneof these results is significant and the magnitudes implied by many of these coefficients are extremelysmall. Overall, these results suggest that looking at relative distance within a given neighbourhood is amuch more valid instrument than looking at relative distance alone.The discussion above centres around the concern that distance is correlated with unobserved abilitycharacteristics that directly affect student achievement. Another possibility is that even if relative dis-tance is uncorrelated with unobserved ability, it could still be correlated with parental preferences forFrench Immersion — and through preferences affect future student achievement. For example, supposethose parents who live as close as possible to an FI school are those parents with a high preference forhaving their child in the FI program. Furthermore, now suppose that in the presence of unobserved90In addition, I find that running the OLS regressions in equation (3.1) with fixed effects that are even smaller geographicallythan FSA or catchment areas has no material effect on the estimated coefficients. This result is consistent with the assumptionstated above that sorting on unobserved ability cannot take place in too small of a region.703.6 Results and Discussionheterogeneity in the impact of FI, these same parents are also more likely to keep their child enrolledin FI regardless of performance. This example would generate a negative relationship between distanceand future student achievement because the parents who live closer to an FI school are less likely toremove their child from the program in the event of a poor match quality (chapter 2). However, I findthat re-running the reduced-form analyses from chapter 2 in equation (2.1) shows no evidence that therelationship between achievement and program exit does not vary by the relative distance to the nearestFI school.3.6 Results and Discussion3.6.1 Summary StatisticsTable 3.1 compares the characteristics of children who initially enrol in FI (that is, children enrolledin FI in either kindergarten or grade 1) with those who do not. All of the statistics in table 3.1 arebased on each child’s characteristics in kindergarten. The largest differences between FI and non-FIchildren are seen in variables that relate to the child’s first or home language. Children in FI are far lesslikely to be designated an ESL student (4% vs 31%) and much more likely to come from a householdin which English is spoken at home (88% vs 68%). Other differences include that FI children are morelikely to be female, live in areas with slightly higher average levels of income and education and lowerunemployment, and live 1 km closer to the nearest FI school.In Appendix table C.1, I examine the difference in the average test scores between the FI and non-FIchildren. These results are very similar to those presented in section 2.3 above in chapter 2. Panel A ofAppendix table C.1 looks at test score differences between children enrolled in FI during the grade ofthe test and all other children. In grade 4, we see that the FI children outperform the non-FI children by0.03σ and 0.13σ in math and reading, respectively while performing 0.05σ lower in terms of writingoutcomes. These differences are all statistically significant at the one percent level. In grade 7, however,the FI students now greatly outperform the non-FI students with differences in average test scores of0.09σ , 0.24σ and 0.09σ in math, reading and writing, respectively. Finally, in grade 10 we observeeven larger differences with the students remaining in FI performing 0.25σ , 0.45σ and 0.25σ better onprovincial exams in math, English and science, respectively. However, some of these differences couldbe the result of survivorship bias within the program. I explore this possibility in panel B of Appendixtable C.1, which shows the difference in test scores between students who initially enrolled in FI andthose who did not. What we find is that the differences are now much lower in magnitude; although,by grades 7 and 10, we continue to see the initial FI children outperforming the remaining students.Nevertheless, these results suggest that higher performing students are more likely to remain in the FIprogram (see chapter 2).713.6 Results and Discussion3.6.2 OLS ResultsI begin by presenting the OLS estimates of the impact that initial enrolment into the FI program has onchildren’s math, reading and writing scores in grades 4, 7 and 10. Each cell in table 3.2 displays thecoefficient on FI in equation (3.1) with the dependent variable varying along the columns. The grade4 results are presented in columns 1–3, the grade 7 results are in columns 4–6 and the grade 10 resultsare in columns 7–9. Each of the regressions include a full set of controls at the child and disseminationarea levels as well FSA fixed effects. Focussing first on grade 4, the results for these specificationssuggest a negative effect of FI on each of the child’s math (-0.1σ ), reading (-0.07σ ) and writing scores(-0.17σ ) with all coefficients significant at the one percent level. In grade 7, we see that the impactof FI on math outcomes increases from -0.1σ to -0.05σ , the impact on reading increases from -0.07σto 0.02σ and the impact on writing scores increases from -0.17σ to -0.06σ . The math and writingscores are significant at the one percent level while the reading scores are insignificant. Thus, evenafter accounting for survivorship bias by looking at initial enrolment into the FI program, we still seeevidence — at least in the OLS context — that children who enrol in the FI program do experience animprovement over time in both mathematics and English language arts. These differences — ranging inmagnitude from 0.05σ to 0.1σ — are economically meaningful and statistically significant at the onepercent level. Looking at secondary school outcomes, the OLS results show that the initial FI studentsare consistently outperforming their peers. Children who initially enrolled in FI perform 0.16σ betteron the grade 10 English exam, 0.12σ on the math exams and even 0.11σ better on the grade 10 scienceexam. All of these results are significant at the one percent level. Thus, the outcomes for grade 10continue to be consistent with the idea that the children who initially enrol in FI are able to catch-up andeven out-perform their peers who did not enrol in FI.The assumption needed in order for the OLS estimates above to be unbiased is one of selection-on-observables. In contrast, these coefficients are likely biased if entry into FI is correlated with unobservedfactors that could also influence student outcomes. In order to address this issue, I now turn to the controlfunction method discussed above in section 3.5.3.6.3 Accounting for the Endogeneity of Initial FI EnrolmentOne concern is that by looking within a given neighbourhood, there will not be enough variation leftover in order to find a strong correlation between distance and FI enrolment and thus we are left witha problem of weak instruments. Fortunately, this does not appear to be the case. Table 3.4 shows theresults of a regression of FI enrolment onto the relative distance measure, distance, and a variety ofcontrols. From columns 1–3 — which do not include any neighbourhood fixed effects — we see thatthe coefficient on distance ranges from -0.04 and -0.05 with t-statistics all around 10. Since distanceis defined as the log of the ratio of the distance of the closest FI school and the distance to the closestnon-FI school, a coefficient of -0.04 implies that a 100% increase in the relative distance between FIand non-FI schools is correlated with a decline in the probability of FI enrolment of 4pp or 40% of theaverage enrolment probability. In columns 4 and 5 — where I add in neighbourhood fixed effects —723.6 Results and Discussionwhile we do see a decrease in the impact that relative distance has on early FI enrolment (declining to-0.03), the first stage remains quite strong with t-statistics of approximately 10. Finally, panel B of table3.4 show that the instruments remain strong even after limiting the sample to children with valid testscores observed in grade 7. This is important because all regressions are run separately for each gradeand subject combination.91 Finally, while table 3.4 shows the results of a linear probability model, allresults are similar using a probit specification.Control Function ResultsPanel B of table 3.2 shows the results using the control function approach. As described above, thecontrol function estimates are my preferred specification of this paper; although, I have also includedthe TSLS results in Appendix table C.2. As before, all regressions include a full set of controls as well asneighbourhood fixed effects. Focussing on the results in grade 4, the estimated causal impact that initialenrolment in the FI program has on student outcomes in grade 4 is now much lower than the impact seenin the OLS regressions. The impact on math scores in grade 4 declines from −0.1σ in the OLS case to−0.46σ using the control function approach. The reading results decline from −0.07σ to −0.37σ andthe writing results decline from −0.17σ to −0.35σ . Each of these coefficients is significant at the onepercent level and is a very substantive effect. Thus, the control function results suggest a large, negativeimpact of initial enrolment in the immersion program on each of a child’s grade 4 outcomes in math,reading and writing.For grade 7, the estimated coefficients are now −0.45, −0.25 and −0.16 for math, reading andwriting, respectively. The coefficients for math and reading outcomes are significant at the one percentlevel while the coefficient for the writing test is significant at the ten percent level. From these values, wecontinue to observe a much more negative impact from initial entry into the FI program when comparedto the OLS results. For both the reading and writing outcomes, the point estimates have improvedrelative to the estimates observed for outcomes in grade 4 with increases in performance between grades7 and 4 of 0.12σ and 0.19σ , respectively. The former difference is significant at the ten percent levelwhile the latter is significant at the one percent level. For math, the grade 7 coefficient is virtuallyidentical to the one observed in grade 4.Looking at the secondary school outcomes in columns 7–9, the coefficients on FI are now -0.25,-0.02 and -0.33 when the dependent variable is the grade 10 exam in math, English, and science, respec-tively. Only the coefficients for the math and science outcomes are significant, at the ten and five percentlevels, respectively. Thus, the secondary school results suggest that while children who initially enrol inFI are able to catch-up to their peers in terms of English Language Arts, there appears to be some lastingnegative effects on both math and science outcomes.92 Finally, with the caveat that the grade 10 examshave a very different format and set-up than the standardized tests in the earlier grades, I compare the91Even limiting the sample to students with valid grade 10 observations yields similar coefficients with t-statistics between7.5–9.92While the standard error on the estimated coefficient for the English exam scores is not trivial, we will see that this resultis robust to a variety of specifications.733.6 Results and Discussioncoefficients for the math and English exam scores to the grade 4 results for math and reading. What wefind is that for both math and reading, the point estimates in grade 10 are lower in magnitude than theestimates we saw for outcomes in grade 4 with test score differences between the two grades of 0.21σand 0.35σ , respectively; differences that are both economically and, in the case of reading, statisticallysignificant.In summary, these results show that initial enrolment into the FI program at first leads to large de-creases in student achievement in each of the standardized tests for math, reading and writing. However,the results also show that FI students are — to some extent — able to reverse these declines. In termsof reading and writing, I find no significant impact of initial FI enrolment on grade 10 English examscores, suggesting that children are able to fully catch-up to their peers by the end of grade 10 in theseareas. For math scores, I find significant negative effects in grade 10 that are lower in magnitude thanthe effects seen in grades 4 and 7, but still very substantive. Similar results are also seen in Appendixtable C.3 which uses catchment area fixed effects instead of FSA fixed effects.93 Finally, I also testedthe robustness of the results to using a TSLS approach, the results of which are shown in Appendixtable C.2. For grade 4, we continue to see large, negative and significant effects of initial FI enrolment.The TSLS results imply that initial FI enrolment leads to declines in achievement of 0.27σ , 0.34σ and0.25σ in math, reading and writing, respectively. Each of these coefficients is smaller in magnitudethan the control function results, but also less precisely estimated. For outcomes in grade 7, the TSLSresults show negative effects on math (γT SLS = −0.17) and writing scores (γT SLS = −0.15) , but posi-tive effects on reading scores (γT SLS = 0.12). However, these effects are estimated with lower levels ofprecision and none is significant at conventional levels. Finally, the TSLS results for outcomes in grade10 are also qualitatively similar to the control function results with coefficients on outcomes in math,English and science of -0.15, 0.12 and -0.2, respectively, none of which is significant at conventionallevels. In general, we see that the TSLS results are qualitatively similar to those in the CF but slightlymore positive and less precisely estimated.94 In addition, as in the control function results, the TSLSestimated coefficients show that children who initially enter the FI program will be negatively impactedin the earlier grades, but by grade 10 are able to fully catch up in reading and writing (with significantdifferences in the coefficients across grades 4–10) while some negative effects remain in more STEMsubjects such as math and science.95What explains the fact that children are having difficulty early on in the program, while at the sametime we do not see these difficulties last into the future, or at least to the same degree, particularly forreading and writing? One of the common explanations of this result is that since the tests are adminis-93Furthermore, I tested the robustness of the control function results by running different specifications for the first-stageprobit regression; adding in additional interaction terms does not alter the results in any material way.94One possible explanation for these differences is that, as discussed above, in the presence of heterogeneity the CF resultsuse information from all enrolled parents while the TSLS estimates are only based on the compliers.95An additional specification I run is based on Wooldridge (2010) and involves using the predicted probabilities of initialFI enrolment (from a probit specification with relative distance included as one of the controls) as an instrument in a TSLSestimation. The results are qualitatively similar to those seen in the CF approach, but less precisely estimated. Here weobserve that students catch-up in terms of ELA outcomes, but there is no catch-up effect in math. Furthermore, the estimatesare actually closer to the CF results — and in most cases are more negative compared to the CF estimates — than the standardTSLS results described above (which are generally more positive compared to the CF estimates).743.6 Results and Discussiontered in English and English is not formally introduced into the curriculum until grades three or four,then the FI children need time before they can fully catch-up to their counterparts in the standard pro-gram (Genesee, 2007). This explanation also potentially accounts for the fact that the catch-up is greaterin English Language Arts than in mathematics. Even if they are not being taught English in school, chil-dren still have a large exposure to English through other settings (e.g. at home or with friends), and thiscan allow for a child’s English language skills to develop (Cummins, 1979). In contrast, children do notnecessarily have the same exposure to mathematical concepts. Furthermore, math is also likely a morecumulative subject, in that basic concepts such as addition or division are required to understand morecomplicated notions such as fractions or geometry and so forth (National Mathematics Advisory Panel,2008). Thus, negative effects early on are potentially more longer lasting.Relatedly, another question surrounding the catch-up of the early FI children is: which children aredriving the result? Specifically, is the result driven by children who remain in FI or by children whoseparents take them out of the program? The answer is important because it speaks to whether childrenare “naturally” catching up or only by having their parents remove them from the program. In orderto explore this issue, I re-run equations (3.1) and (3.7) including an interaction term between initial FIenrolment, FI, and a variable indicating whether children are still in French Immersion during the gradeof the test, still in FI. This variable is obviously endogenous and cannot be used to say whether parentsshould or should not remove their child from FI. What we can do, however, is see how these variablesare changing over time to see what is driving the catch-up results. Formally, I estimate the followingregression specification:Yigtn = X′itβ + γFIi+θ (still in FIi)+δn+δt + εintThe results are shown in Appendix table C.5. Given the results of chapter 2, which showed that lowertest scores are correlated with a higher probability of leaving the FI program, it is not surprising thatwe find a positive coefficient for children still in FI and a negative coefficient for children who leftthe program. But, once again these are not causal estimates. More interestingly is the fact that wealso observe both of these coefficients increasing over time with significant differences. This suggeststhat one group is not more likely than the other to experience this catch-up phenomenon. Even thechildren who remain in the FI program and have English gradually introduced into the curriculum areexperiencing an improvement in their performance over time.Bounding and Testing the Robustness of the Results using Oster (2015)In this section I use the methodology developed by Oster (2015) in order to examine the robustness andbound the results seen in table 3.2. The first exercise I run is to calculate the adjusted bias coefficientsfor the OLS results in panel A using the techniques seen on Oster (2015). This methodology requirestwo inputs: A ratio of the degree of selection on unobservables to the selection on observables andthe hypothetical maximum R-squared assuming all possible controls were included in equation (3.1).I follow Oster (2015) by assuming the value of the former is 1. I also assume that the maximum R-753.6 Results and Discussionsquared is 0.4. This is the R-squared from a regression of the grade 7 test score with the grade 4 testscore included as an additional control. The results of this exercise are shown in panel A of appendixtable C.4. In some cases the adjusted coefficients are lower in magnitude compared to the controlfunction approach while in others they are greater in magnitude. In all cases except grade 10 math, thesigns of the adjusted coefficients using the Oster (2015) method are identical to those seen using thecontrol function approach.96The Oster (2015) method can also be used to bound the control function results. The problem isthat now it no longer makes sense to assume an equal degree of selection on both unobservables andunobservables. This is because, in theory, the inclusion of the inverse mills ratio in equation (3.1)should account for the remaining selection on unobservables. However, because of either functionalform assumptions or biases in the instrument, some small amount of bias could remain. In panel Bof appendix table C.4, I calculate the bias-adjusted coefficients using the second stage of the controlfunction method and values of 0.05–0.2 for the ratio of the degree of selection on unobservables tothe selection on observables. What the results show is that for small ratios of 0.05 or 0.1, the bias-adjusted coefficients are very similar to the control function results. For the higher ratio of 0.2, while thecoefficients are much greater in magnitude in some cases, they retain the same qualitative characteristicsof the control function results with improvements seen over time particularly in ELA.3.6.4 Results by GenderA large literature emphasizes differences between boys and girls when learning a foreign language.97Because of these differences, it stands to reason that FI potentially has much different implications forboys than it does for girls. Furthermore, recall that it was shown above that girls are much more likelythan boys to be initially enrolled in the FI program, suggesting parents possess similar prior beliefs.Conversely, if parents are more reluctant to enrol boys in FI, then it is possible that the boys who are en-rolled in the program are drawn from the higher end of the ability distribution or have parents that differalong some unobserved dimension. In order to explore these possibilities, I follow Wooldridge (2010,2015) by re-running the control function analysis above with an additional interaction term between FIand a dummy variable equal to one if the child is female. The results are shown in table 3.5, where Ishow the estimated coefficients on FI as well as the interaction term, FI ∗ f emale, for both the OLSand control function estimation methods, limiting the regressions to my preferred specification whichinclude a full set of controls along with FSA fixed effects.Table 3.5 shows that there does not appear to be large differences between boys and girls in termsthe impact that FI has on student achievement. This is true for both the OLS and control function results.96The reason the grade 10 math results are unchanged is because the coefficient on early FI enrolment with all controlsincluded is very similar to the coefficient on early FI enrolment with no controls. This is not to say that the controls have noeffect on the coefficient, but that there appears to be offsetting effects from the inclusion of all the controls. This is in contrastto the remaining outcomes where additional controls seem to primarily lead to lower or more negative effects of FI enrolment.97For example, see Burman et al. (2008) or Eliot (2012). Note that these differences need not all be biological. Differencescould arise because of environmental factors such as the amount of time parents spend reading to their daughters as opposedto their sons (Baker and Milligan, 2012).763.6 Results and DiscussionMost of the estimated coefficients on FI ∗ f emale are close to 0 and relatively precisely estimated. Theonly outcomes where we observe significant differences for both grades 4 and 7 are the writing scoreswith girls performing relatively better in grade 4 (higher by 0.03σ ) and relatively worse in grade 7(lower for girls by 0.04σ ). We also see girls performing better on the grade 10 math exam by 0.06σ .In every case, none of these differences is enough to qualitatively change the total impact that FI has onstudent outcomes. Thus, contrary to what many parents might expect, I find that the causal impact of FIon student achievement does not vary much by gender. However, as discussed above, this result couldbe due to parents’ preferences with respect to initial selection into the program.3.6.5 The Impact of Students Enrolling in Private SchoolThe estimated coefficient on FI represents the difference between enrolling in FI and not enrolling inFI. But, what does it mean to not enrol in FI? For a majority of households it means enrolling their childin a traditional public school. However, for a non-trivial number of parents another viable option is toenrol their child in private school. This is likely especially true among parents of early FI students sincefrom table 3.1 we saw that the characteristics of the FI children and their households are closer to thosefound in private school than those who enrol their child in a traditional public school. All of this begsthe question of to what extent are the negative effects of FI found above the result of a positive increasein performance from parents choosing to enrol their child in private school as opposed to FI?The exact answer to this question requires knowledge of both the causal effect of private schoolon student outcomes and also the percentage of FI parents who would have otherwise enrolled theirchild in private school — the estimation for both of which is beyond the scope of this paper. Instead,I examine this question by running two alternative specifications of equations (3.1) and (3.7). One inwhich I exclude all children who are enrolled in private school in grade K and the other in which Iexclude all children in private school during the grade of the test. In addition to testing the robustness ofthe previous results, these specifications are also interesting for two additional reasons. First, in manyadministrative datasets, children are not observed if they are enrolled in a private school. This leads topotential issues of censoring and attrition from the sample. In contrast, in my data I observe all childrenin public and most private schools so long as they remain in the province of British Columbia. Thus,by comparing the coefficients in these specifications to the baseline results, it speaks to the biases thatresult if students enrolled in private school are not actually observed. Second, it is also possible childrenswitching from FI into private school are responsible for the catch-up of the FI students observed in theprevious sections. This follows from the fact that approximately 15% of children who leave FI aftergrade 5 are then enrolled in a private school. The specification in which I exclude children in privateschool in the grade of the test will allow us to see if the catch-up is being driven by these children.The results for the specification in which I exclude all students who initially enrol in private schoolare shown in columns 2, 5 and 8 of table 3.6 for grades 4, 7, and 10 respectively. The OLS coefficientsfor every subject and grade (panel A) show that the coefficients on FI increase after accounting for initialprivate school enrolment. In grade 4, the impact of initial FI enrolment on math and reading outcomes773.6 Results and Discussionare now close to zero and insignificant (as opposed to the negative effects seen in the baseline case). Forwriting, the coefficient on FI improves from -0.17 to -0.09. Each of the differences in the coefficients issubstantively meaningful and statistically significant at the one percent level. For grade 7, the estimatedcoefficients for math and writing go from negative and significant to small, positive and significant atthe ten percent level (the differences in the coefficients are significant at the one percent level) whilethe coefficient on FI for reading outcomes increases by 0.07 to a 0.09σ gain from initial FI enrolment.Finally, in grade 10 we see that the coefficients on FI continue to increase after excluding the childreninitially enrolled in private school. The difference between these coefficients and the baseline cases aresmaller than before, ranging from 0.03σ– 0.05σ , though the differences remain statistically significantat conventional levels.The results for the second specification in which I exclude children based on private school enrol-ment during the grade of the test are nearly identical to the set of results discussed above. Overall, theOLS results suggest large improvements in the impact that FI has on student outcomes after account-ing for private school enrolment. Any negative effect that initial enrolment into the FI program has onstudent outcomes is gone by grade 7, and by grade 10 these children are greatly out-performing theirnon-FI peers.Turning to the control function results in panel B, we continue to see overall increases in the co-efficients on FI after excluding initial private school enrolment (columns 2, 5 and 8) relative to thebaseline coefficients. In math and reading, the coefficient on FI in grade 4 increases by 0.07 and 0.04,respectively; however, neither of these differences are signifiant at standard levels. In writing, we seean increase in the coefficient on FI of 0.09 which is quite large and also significant at the ten percentlevel. For outcomes in grade 7 (column 5), the coefficients on FI improve by 0.07 for math and readingand 0.1 for writing; each of these differences is substantively large and significant at the five percentlevel. Finally, for outcomes in grade 10, the increases in the coefficients are approximately 0.05 andare not significant at standard levels. Furthermore, as in the OLS case, each of these results is robust toexcluding children based on private school enrolment during the grade of the test.Overall, these results suggest that non-FI students entering private school are driving a part of theestimated negative coefficients on FI. While accounting for private school enrolment did lead to signif-icant improvements in the causal impact of FI, particularly in grades 4 and 7; qualitatively, the mainconclusion of the previous sections is unchanged. We continue to observe large negative and significanteffects from initial enrolment into the FI program on outcomes in grade 4. These impacts decline inmagnitude by grade 7 and then decline further by grade 10. Once again, the catch-up is strongest forboth reading and writing. Furthermore, there is no evidence that the catch-up phenomenon is beingdriven by children who initially enrol in FI but then subsequently switch to private school. Finally, theseresults also demonstrate the value in being able to observe children in private school. If private schoolis a viable option for both the treated and control groups, it has a significant impact on the estimatedcounterfactual of interest when performing a program evaluation.783.7 Conclusion3.7 ConclusionThis paper examined the impact that initial enrolment in the French Immersion program has on futurestudent outcomes. The main contribution of this paper was to estimate the causal impact of FI programentry while accounting for the dual biases of selection and survivorship bias. I addressed survivorshipbias by looking at the impact of initial enrolment in the FI program instead of program enrolment duringthe grade of the test. I addressed selection bias by using a control function approach along with relativedistance to the nearest FI school within a given neighbourhood as an instrument for FI enrolment.I find that initial FI program entry leads to large declines in student achievement in grade 4 of 0.46σ ,0.37σ ,and 0.35σ for each of the outcomes in math, reading and writing; however, this effect does notappear to last into the future. For outcomes in grade 10, I find that initial FI children are performing0.25σ worse in math, 0.33σ worse in science and no different in terms of English Language Arts. Inaddition, I also find that the results do not differ based on the gender of the child and that non-FI childrenenrolling in private school account for a portion of this negative effect. However, accounting for privateschool enrolment does not qualitatively change any of the results.By using distance as an instrument, much of my main identifying variation comes off how distanceaffects a parents’ probability of enrolling their child in FI. One question that cannot be directly addressedin this paper is whether those parents who are most affected by living closer to an FI school differ fromother parents along some unobserved dimensions. This can potentially explain why the impact of FIfound in this paper is worse than other papers that use random assignment (Steele et al., 2015) orstructural equation modelling (see chapter 2).98 One possibility is that this is a set of parents who areless involved in their child’s education than the average FI parent.99 Many school districts emphasizethat parental involvement is crucial for success in immersion language programs. For example, oneoften-made suggestion is that parents should read to their child in their native language.100 The ideais that this helps to develop overall language and cognitive abilities (Cummins, 1979). In chapter 2, Ishowed that there is a lot of heterogeneity in terms of which children perform well in the FI program.Understanding the mechanisms through which children are able to excel in an immersion program is aninteresting avenue for future research.The outcomes in this paper were limited to short and medium-run results on standardized test scoresand provincial exams up to grade 10. These results by themselves are not enough to say whether parentsshould or should not enrol their child in an FI program. In order to do a full welfare analysis, I needlong-term outcomes of the FI children such as university attended, performance in university and labor98There are several important differences between the setup in chapter 2 and this current chapter. Chapter 2 uses a structuralequation model to examine parents’ program choices (and their preferences) each year and models the impact of current FIenrolment and also explicitly accounts for unobserved FI heterogeneity.99Similarly, it could also be the case that these are parents who are less able to tell if their child is a good fit for FI or not.100On their website, the Vancouver School Board recommends that parents, “Concentrate on enriching your child’s firstlanguage. Read aloud to your child every day in the dominant language of your home. This will develop your child’sknowledge and vocabulary, as well as instilling an all important love of language and literature.” Similarly, in an informationsheet for parents, the Surrey School Board says that one way parents can help their children in FI is to, “Read aloud daily in thelanguage of the home and talk about what is being learned at school. Children quickly transfer knowledge between languages”(District Info Sheet – French Immersion, page 1)793.7 Conclusionmarket outcomes. The latter of these is likely the most important because as discussed in the intro-duction, learning a second language can be a significant comparative advantage and also increase one’slabor market opportunities. Given the declines in performance observed in grade 10, it is still entirelyreasonable to think that the overall gains from learning a second language may outweigh any of the costsassociated with these declines in student achievement.803.8 Tables3.8 TablesTable 3.1: Summary StatisticsPanel A: FI and non-FI StudentsEnrolled in FI Not Enrolled in FIMean Std Dev Mean Std Dev DifferenceFemale 0.55 0.50 0.49 0.50 0.06∗∗∗Aboriginal 0.02 0.13 0.03 0.16 -0.01∗∗∗ESL 0.04 0.20 0.31 0.46 -0.26***English Home 0.88 0.33 0.66 0.48 0.22∗∗∗Distance to FI (km) 2.16 1.83 3.1 2.43 -0.92∗∗∗Distance to non-FI (km) 0.94 0.82 0.96 1.04 -0.02∗∗∗Distace to Private (km) 2.50 2.07 2.68 2.36 -0.16∗∗∗Unemp % (DA) 0.06 0.05 0.07 0.05 -0.01∗∗∗% Univ (DA) 0.24 0.13 0.18 0.12 0.05∗∗∗ln(HH Income) (DA) 10.98 1.15 10.92 1.12 0.04∗∗∗ln(HH Value) (DA) 12.48 1.66 12.37 1.79 0.10∗∗∗% Own Home (DA) 0.71 0.22 0.7 0.22 0.01∗∗∗In a Gr4 Private School 0.04 0.20 0.16 0.36 -0.12∗∗∗In a Gr7 Private School 0.06 0.23 0.15 0.35 -0.09∗∗∗Observations 20,468 180,573Panel B: FI, Non-FI and Private School StudentsFI Private Non-FI, publicMean Mean MeanDifference Difference(FI – Private) (FI – Non-FI)Female 0.55 0.50 0.49 0.05∗∗∗ 0.06∗∗∗Aboriginal 0.02 0.00 0.03 0.01∗∗∗ -0.01∗∗∗ESL 0.04 0.14 0.34 -0.10∗∗∗ -0.29∗∗∗English Home 0.88 0.73 0.64 0.15∗∗∗ 0.23∗∗∗Distance to FI (km) 2.16 3.14 3.09 -0.98∗∗∗ -0.93∗∗∗Distance to non-FI (km) 0.94 1.14 0.92 -0.20∗∗∗ 0.01∗∗Distace to Private (km) 2.50 2.75 2.67 -0.25∗∗∗ -0.17∗∗∗Unemp % (DA) 0.06 0.07 0.08 -0.01*** -0.01∗∗∗% Univ (DA) 0.24 0.22 0.18 0.02*** 0.056∗∗∗ln(HH Income) (DA) 10.98 10.98 10.91 0.00 0.06∗∗∗ln(HH Value) (DA) 12.48 12.53 12.35 -0.05∗∗∗ 0.14∗∗∗% Own Home (DA) 0.71 0.71 0.70 -0.01∗∗∗ 0.01∗∗∗In a Gr4 Private School 0.04 0.82 0.03 -0.78∗∗∗ 0.01∗∗∗In a Gr7 Private School 0.06 0.75 0.04 -0.69∗∗∗ 0.02∗∗∗Observations 20,468 28,162 152,411Notes: All summary statistics (except the last 2) are based on those observed at the time the child isobserved in grade K. A child is considered to be enrolled in FI if they are observed in the FI programin kindergarten or grade 1. For the purpose of this exercise only, a child enrolled in a private FI schoolis considered to be in FI, but not private school. (This does not materially affect any of the results.) Alldistance variables are calculated based on the driving distance between the centroid of the child’s homepostal code and the longitude and latitude coordinates of the individual schools. All Dissemination Areavariables (with "DA" in parentheses) are in real 2004 $. % Univ uses the total population 15 years andolder as its denominator in order to make the variable consistent across census years. Note that thistable differs from table 2.1 because the sample here exlcudes all children with missing test scores whichincludes all children who entered grade K in 2009 ∗ p < 0.1, ∗∗ p < 0.05, ∗∗∗ p < 0.01.813.8TablesTable 3.2: The Impact of French Immersion on Student OutcomesPanel A: OLS ResultsGrade 4 Grade 7 Grade 10(1) (2) (3) (4) (5) (6) (7) (8) (9)Math Reading Writing Math Reading Writing Math English ScienceFI-0.10∗∗∗ -0.07∗∗∗ -0.17∗∗∗ -0.05∗∗∗ 0.02 -0.06∗∗∗ 0.12∗∗∗ 0.16∗∗∗ 0.09∗∗∗[0.02] [0.01] [0.02] [0.02] [0.02] [0.02] [0.02] [0.01] [0.02]Observations 190870 191665 188478 133492 134556 132312 72679 86126 85675Child Controls Yes Yes Yes Yes Yes Yes Yes Yes YesDA Controls Yes Yes Yes Yes Yes Yes Yes Yes YesFSA Fixed Effects Yes Yes Yes Yes Yes Yes Yes Yes YesP-Value (H0: γt = γ4) N/A N/A N/A 0.001 0.000 0.000 0.000 0.000 N/APanel B: Control Function ResultsGrade 4 Grade 7 Grade 10Math Reading Writing Math Reading Writing Math English Science(1) (2) (3) (4) (5) (6) (7) (8) (9)FI-0.46∗∗∗ -0.37∗∗∗ -0.35∗∗∗ -0.45∗∗∗ -0.25∗∗∗ -0.16∗ -0.25∗ -0.02 -0.33∗∗[0.09] [0.10] [0.09] [0.12] [0.09] [0.09] [0.15] [0.14] [0.14]Observations 190870 191665 188478 133492 134556 132312 72679 86126 85675P-Value (H0: γt = γ4) N/A N/A N/A 0.981 0.086 0.009 0.113 0.002 N/ANotes: Standard errors clustered at the grade K school level are in brackets. In panel B, standard errors are calculated using a block-bootstrappingapproach with 100 draws. All cells display the coefficient on initial FI enrolment, FI, in a regression where the dependent variable varies byoutcome and grade. All regressions include controls at the child and DA levels as well as FSA and year fixed effects. "Child" level controlsinclude gender, aboriginal status, ESL status, special-ed status, a dummy for whether the child speaks English at home, and binary variablesfor month of birth. "DA" controls refer to census variables at the level of the Dissemination Area of the child’s home address. They are theunemployment rate, % with bachelor degree or higher, log(real average household income), log(real home value) and % home-owner. An FSAis a geographic unit based on the first three digits of one’s postal code. All observable characteristics take the values seen while the child wasenrolled in grade K. P-Value (H0: γt = γ4) shows the p-value from a test of whether the given coefficient is equal to the corresponding value ingrade 4. P-values in panel B are calculated using 100 block-bootstrapped draws clustered at the grade K school level. The coefficient on FI forthe grade 10 English exams is compared to the grade 4 reading exam. ∗ p < 0.1, ∗∗ p < 0.05, ∗∗∗ p < 0.01.823.8TablesTable 3.3: Distance Balancing TestsPanel A: Year Fixed Effects(1) (2) (3) (4) (5) (6) (7) (8) (9)Female Aboriginal ESL English at Home Unemp % % Univ Deg ln(HH Income) ln(Home Value) % Home OwnerDistance-0.002∗ -0.001 0.065∗∗∗ -0.061∗∗∗ 0.004∗∗∗ -0.016∗∗∗ -0.022∗ -0.000 -0.012∗[0.001] [0.001] [0.018] [0.018] [0.002] [0.006] [0.012] [0.035] [0.007]Constant0.499∗∗∗ 0.029∗∗∗ 0.217∗∗∗ 0.733∗∗∗ 0.049∗∗∗ 0.233∗∗∗ 11.047∗∗∗ 12.735∗∗∗ 0.748∗∗∗[0.004] [0.004] [0.028] [0.027] [0.002] [0.014] [0.031] [0.071] [0.015]Observations 198237 198237 198237 198237 198237 198237 198237 198237 198237Panel B: FSA and year Fixed Effects(1) (2) (3) (4) (5) (6) (7) (8) (9)Female Aboriginal ESL English at Home Unemp % % Univ Deg ln(HH Income) ln(Home Value) % Home OwnerDistance-0.001 -0.001 0.018∗∗∗ -0.015∗∗ 0.001 0.001 0.012 0.002 -0.000[0.002] [0.001] [0.007] [0.007] [0.001] [0.002] [0.010] [0.020] [0.005]Constant0.498∗∗∗ 0.030∗∗∗ 0.273∗∗∗ 0.680∗∗∗ 0.053∗∗∗ 0.214∗∗∗ 11.003∗∗∗ 12.728∗∗∗ 0.731∗∗∗[0.004] [0.002] [0.007] [0.007] [0.002] [0.003] [0.018] [0.025] [0.005]Observations 198237 198237 198237 198237 198237 198237 198237 198237 198237Notes: Standard errors clustered at the FSA level in brackets. Distance is defined as the log of the ratio of the distance to the closest FI school and the closest non-FI school. Allregressions are limited to children in grade K and include year fixed effects. All values are based on those observed while the child was enrolled in grade K. ∗ p < 0.1, ∗∗ p < 0.05,∗∗∗ p < 0.01.833.8 TablesTable 3.4: Impact of Distance on Early FI EnrolmentPanel A: All Children(1) (2) (3) (4) (5)Distance-0.051∗∗∗ -0.043∗∗∗ -0.040∗∗∗ -0.034∗∗∗ -0.029∗∗∗[0.005] [0.004] [0.004] [0.003] [0.003]Observations 201041 201041 201041 201041 201041Child Controls No Yes Yes Yes YesDA Controls No No Yes Yes YesFSA Fixed Effects No No No Yes NoCatchment FE No No No No YesPanel B: Children with Grade 7 Test Scores Only(1) (2) (3) (4) (5)Distance-0.046∗∗∗ -0.040∗∗∗ -0.037∗∗∗ -0.031∗∗∗ -0.026∗∗∗[0.005] [0.004] [0.004] [0.003] [0.003]Observations 135312 135312 135312 135312 135312Child Controls No Yes Yes Yes YesDA Controls No No Yes Yes YesFSA Fixed Effects No No No Yes NoCatchment FE No No No No YesNotes: Standard errors clustered at the grade K school level are in brackets. The dependentvariable in each regression is a binary variable equal to 1 if the child enrolled in the early FIprogram and 0 otherwise. "Distance" is defined as the log of the ratio of the distance to theclosest FI school and the closest non-FI school. All independent variables — including distance— take on the values observed while the child was enrolled in grade K. "Catchment" refers toimputed geographic areas in which all children are given priority to attend a specifc neighbour-hood school. See notes in table 3.2 for details about the variables included in "child" and "DA"levels. ∗ p < 0.1, ∗∗ p < 0.05, ∗∗∗ p < 0.01.843.8TablesTable 3.5: Impact on Student Outcomes by GenderPanel A: OLS ResultsGrade 4 Grade 7 Grade 10(1) (2) (3) (4) (5) (6) (7) (8) (9)Math Reading Writing Math Reading Writing Math English ScienceFI-0.10∗∗∗ -0.06∗∗∗ -0.18∗∗∗ -0.04 0.03∗ -0.04∗ 0.09∗∗∗ 0.17∗∗∗ 0.10∗∗∗[0.02] [0.02] [0.02] [0.02] [0.02] [0.02] [0.02] [0.02] [0.02]FI∗Female -0.01 -0.01 0.02 -0.03 -0.02 -0.04∗∗ 0.05∗ -0.01 -0.03[0.02] [0.02] [0.02] [0.02] [0.02] [0.02] [0.03] [0.02] [0.02]Observations 190870 191665 188478 133492 134556 132312 72679 86126 85675Panel B: Control Function ResultsGrade 4 Grade 7 Grade 10Math Reading Writing Math Reading Writing Math English Science(1) (2) (3) (4) (5) (6) (7) (8) (9)FI-0.45∗∗∗ -0.36∗∗∗ -0.38∗∗∗ -0.44∗∗∗ -0.24∗∗ -0.13 -0.31∗∗ -0.02 -0.32∗∗[0.10] [0.10] [0.09] [0.12] [0.09] [0.09] [0.15] [0.15] [0.15]FI∗Female -0.00 -0.00 0.03∗ -0.02 -0.01 -0.04∗∗ 0.06∗∗∗ -0.00 -0.01[0.02] [0.02] [0.02] [0.02] [0.02] [0.02] [0.02] [0.02] [0.02]Observations 190870 191665 188478 133492 134556 132312 72679 86126 85675Notes: Standard errors clustered at the grade K school level are in brackets. In panel B, standard errors are calculated using a block-bootstrapping approach with 100 draws. This analysis is identical to the one seen in table 3.2, except now I include an interaction termbetween FI and gender. All regressions include the full set of controls described in table 3.2. See notes in table 3.2 for additional details. ∗p < 0.1, ∗∗ p < 0.05, ∗∗∗ p < 0.01.853.8TablesTable 3.6: Impact of FI After Excluding Children Enrolled in Private SchoolPanel A: OLS ResultsGrade 4 Grade 7 Grade 10Baseline ExcludingPvt0 = 1ExcludingPvt4 = 1Baseline ExcludingPvt0 = 1ExcludingPvt7 = 1Baseline ExcludingPvt0 = 1ExcludingPvt10 = 1(1) (2) (3) (4) (5) (6) (7) (8) (9)Math-0.10∗∗∗ -0.02 -0.02 -0.05∗∗∗ 0.03∗ 0.03∗Math0.12∗∗∗ 0.15∗∗∗ 0.14∗∗∗[0.02] [0.02] [0.02] [0.02] [0.02] [0.02] [0.02] [0.02] [0.02]Reading-0.07∗∗∗ 0 -0.01 0.02 0.09∗∗∗ 0.09∗∗∗English0.16∗∗∗ 0.21∗∗∗ 0.20∗∗∗[0.01] [0.01] [0.01] [0.02] [0.01] [0.02] [0.01] [0.01] [0.01]Writing-0.17∗∗∗ -0.09∗∗∗ -0.09∗∗∗ -0.06∗∗∗ 0.03∗ 0.02Science0.09∗∗∗ 0.13∗∗∗ 0.12∗∗∗[0.02] [0.01] [0.01] [0.02] [0.02] [0.02] [0.02] [0.02] [0.02]P-Value (Math) N/A 0.000 0.000 N/A 0.000 0.000 N/A 0.000 0.000P-Value (Reading) N/A 0.000 0.000 N/A 0.000 0.000 N/A 0.000 0.000P-Value (Writing) N/A 0.000 0.000 N/A 0.000 0.000 N/A 0.000 0.000Panel B: Control Function ResultsGrade 4 Grade 7 Grade 10Baseline ExcludingPvt0 = 1ExcludingPvt4 = 1Baseline ExcludingPvt0 = 1ExcludingPvt7 = 1Baseline ExcludingPvt0 = 1ExcludingPvt10 = 1(1) (2) (3) (4) (5) (6) (7) (8) (9)Math-0.46∗∗∗ -0.39∗∗∗ -0.42∗∗∗ -0.45∗∗∗ -0.38∗∗∗ -0.39∗∗∗Math-0.25∗∗ -0.2 -0.18[0.09] [0.09] [0.10] [0.12] [0.11] [0.09] [0.13] [0.13] [0.12]Reading-0.37∗∗∗ -0.33∗∗∗ -0.34∗∗∗ -0.25∗∗∗ -0.18∗∗ -0.18∗∗English-0.02 0.03 -0.01[0.10] [0.09] [0.09] [0.09] [0.09] [0.09] [0.13] [0.13] [0.12]Writing-0.35∗∗∗ -0.26∗∗∗ -0.27∗∗∗ -0.16∗ -0.06 -0.06Science-0.33∗∗ -0.30∗∗ -0.32∗∗∗[0.09] [0.07] [0.07] [0.09] [0.07] [0.07] [0.13] [0.12] [0.11]P-Value (Math) N/A 0.258 0.514 N/A 0.05 0.181 N/A 0.202 0.09P-Value (Reading) N/A 0.324 0.571 N/A 0.037 0.075 N/A 0.207 0.792P-Value (Writing) N/A 0.047 0.078 N/A 0.038 0.04 N/A 0.38 0.866Notes: Standard errors clustered at the grade K school level are in brackets. In panel B, standard errors are calculated using a block-bootstrapping approach with 100 draws. All cells displaythe coefficient on initial FI enrolment in a regression where the dependent variable varies by outcome and grade. All regressions include year fixed effects, child controls, DA controls andFSA fixed effects. The regressions in columns 2, 5 and 8 exclude all children who were enrolled in private school during grade K. The regressions in columns 3, 6 and 9 exclude all childrenwho were enrolled in private school during the grade of the test. P-Value represents the p-value from the test that the coefficient on FI after excluding children enrolled in private school isidentical to the coefficient in the baseline model for the exact same subject-grade. All P-values in panel B are calculated using 100 block-bootstrapped draws clustered at the grade K schoollevel. All regressions include the full set of controls described in table 3.2. See notes in table 3.2 for additional details. ∗ p < 0.1, ∗∗ p < 0.05, ∗∗∗ p < 0.01.86Chapter 4: Peer Effects, Heterogeneityand School Choice Policies4.1 IntroductionAs school districts offer additional options to parents for their children’s education, the decisions of par-ents to remove their children from their current school and enrol in another can lead to large changes inthe composition of a student’s peers — which in turn impacts a student’s own performance. Identifyingthe impact of peer effects in a school-choice setting — and in education settings more generally — isa difficult task (see, for example, Manski, 1993 and Angrist, 2014), though many papers have exploredthese issues. This paper expands on the peer effects literature by taking advantage of a unique featureof public education in Canada in which parents have the option of enrolling their child in an immersionlanguage program that begins in grades 4 or 6, depending on the jurisdiction. Formally, this programis known as Middle French Immersion (beginning in grade 4) or Late French Immersion (beginningin grade 6). The children who enrol in these programs are, on average, students with high levels ofachievement and are thus likely drawn from a higher end of the ability distribution. The removal ofthese children, in turn, potentially impacts those students who remain in their current school. In thispaper, I estimate the impact of the students who leave a given school cohort on the remaining studentsusing administrative data from two separate Canadian Provinces — Ontario and British Columbia.The major impediment to identifying the causal impact of the leaving students is that entry into theseimmersion programs is not random. There could be unobserved factors correlated with both a parent’sdecision to enrol their child in the immersion program and a child’s future academic achievement.In addition to a model with school fixed effects (Hoxby, 2000), I address this endogeneity problemusing a control function approach known as the two-stage residual inclusion (TSRI) method, whichrelates to the two-step approaches mainly seen in the health economics literature.101 In the first stage,I predict the percentage of students who leave to enter the immersion program by explicitly modellingparents’ program choices. In the second stage, I run the baseline model with school fixed effects, butI also include as a control the difference between the predicted and actual percentage of children wholeave for the LFI program. The idea is that this residual encompasses unobserved factors potentiallycorrelated with both program choice and changes in student achievement. In the first stage, I instrumentfor program choice using the distance of the child’s school (in grade 3 or 4) to the nearest late immersionschool. Since all models include school fixed effects, the main identifying assumption in that deviations101See, for example, Kessler and McLennan (2000) or Gowrisankaran and Town (2003).874.1 Introductionfrom the average change in student achievement in a given school are uncorrelated with deviations inthe average distance to locations where the immersion program is available. In other words, I exploitchanges in the availability of the immersion program over time and need it to be uncorrelated withchanges in student achievement over time. I examine the credibility of these assumptions by conductingbalance tests showing that my distance measure is uncorrelated with other observable characteristicsincluding, but not limited to, baseline standardized test scores. While these tests do not prove theinstrument is valid, they do enhance its credibility.This paper contributes to the large literature in economics about the role of peer effects.102 Specif-ically, it contributes to a subset of the literature which exploits natural experiments that lead to the(exogenous) introduction of students to — or removal of students from — a given cohort. Examplesinclude the impact of desegregation and other policies related to the racial makeup of students (An-grist and Lang, 2004, Hoxby and Weingarth, 2005, Billings et al., 2014), displacement of students fromHurricane Katrina (Imberman et al., 2012) and immigration (Lavy et al., 2011).103 Despite the randomnature of these changes to class composition, one of the concerns about using exogenous shocks is thatthey also might be accompanied by other policies designed to mitigate any (negative) effects of the eventwhich caused the change in peers in the first place. For example, extra funding might be provided inorder to help school districts cope with a sudden influx of students. This makes it difficult to isolate thepure peer effects. Furthermore, if the setting is an extreme one, it is not clear how much of the findingsapply in terms of general school-district policies. The main contribution of this paper is that it directlyavoids both of these concerns. The setting I examine is one in which all changes occur “naturally”within a given cohort; that is, there is no additional response on the part of the school districts. As aresult, I am able to isolate the impact of peer composition while reducing the concern with regards toexternal validity.A second contribution of this paper relates to heterogeneity and non-linearity of peer effects. Manypapers have found that the impact of a peer on a student’s own performance depends on the interactionof both the ability of the peer and the student’s own ability.104 For example, Imberman et al. (2012) findthat high-performing students entering a cohort have positive effects on the remaining students whilelow-performing students have negative effects; Furthermore, these impacts also differ by the baselineability of the children already in the existing cohort. In this paper, I am able to explore these non-linearities as well by exploiting variation in the quality of both the students who enrol in the immersionprograms and also the children who remain behind.Finally, this paper contributes to the literature examining general equilibrium effects of choice poli-cies (see, for example, Epple and Romano, 1998, Nechyba, 2000, Avery and Pathak, 2015 and Altonji etal., 2015). This paper emphasizes how even small choice programs can affect many students in a schooldistrict by altering the composition of their peers. In this sense it is most similar in spirit to Altonji et al.102See Sacerdote (2015) for a recent review of this literature.103The Hoxby and Weingarth (2005) and Imberman et al. (2012) papers are the studies that are closest to this one in terms ofboth estimation strategy and types of analyses run.104See, for example, Antecol et al. (2016), Burke and Sass (2013), Imberman et al. (2012), Gould et al. (2009) and Hoxbyand Weingarth (2005).884.1 Introduction(2015) who examine the impact of the cream-skimming effect of a voucher program (and private schoolsin general) on the performance of the remaining public school children. In contrast, in this paper I focuson changes in enrolment caused by a specific publically-administered choice program.105 As discussedabove, this reduces the concern that exists when changes in enrolment are caused by out-of-districtschooling options. For example, private schools might exert competitive pressures on public schoolsand thus combining peer effects with the impact of school competition. In addition, another differenceis that in my paper there will be much more heterogeneity in the quality of the leaving students allowingme to explore heterogeneous responses in the impact on the remaining children.Looking in the province of Ontario, I find that a 10pp increase in the percentage of children whoenter the middle immersion program causes a decrease in math scores of 0.06 test score standard devi-ations (σ ), a decrease in reading scores of 0.04σ and a decrease in writing scores of 0.04σ . In BritishColumbia, I find that a 10pp increase in the percentage of children who enter the late immersion pro-grams causes a decrease in math scores of 0.12σ , a decrease in reading scores of 0.1σ and a decrease inwriting scores of 0.05σ . In each subject, I find smaller overall effects in Ontario as opposed to BritishColumbia; though because of how the estimation samples are constructed, the Ontario sample has animmersion enrolment rate (9%) approximately double that of British Columbia (4%).106 Based on thesepercentages, the impact of going from no LFI leavers to the mean LFI rates in both jurisdictions is adecrease in achievement of 0.01σ -0.04σ across all three subjects. However, further analysis shows thatthese overall effects are the result of substantial heterogeneity in the impact that different students leav-ing for the immersion program have on the remaining students. For example, I find that a 10pp increasein the fraction of low-performing students leaving a school cohort to enter the immersion program leadsto increases in achievement for the remaining students of 0.1σ −0.14σ in math, 0.04σ −0.1σ in read-ing, and 0.06σ − 0.19σ in writing. Breaking down these effects even further shows that the largesteffect is on remaining students in the upper end of the ability distribution. Conversely, I find that high-performing leavers cause large reductions in the achievement of the remaining students of 0.15σ–0.19σin math, 0.1σ − 0.13σ in reading, and 0.06σ–0.09σ in writing. The impact of average performingleavers on the remaining students varies by jurisdiction and subject, with negative impacts in Ontariofor each subject while in British Columbia I find negative effects on reading outcomes but positive ef-fects for writing outcomes.107 The final exercise I run is to use the immersion program as an instrumentfor looking at the total effect of a change in the composition of a student’s peers. I find that replacing10pp of one’s peers from low-performing students to average-performing students leads to test scoregains of 0.05σ – 0.07σ in Ontario and 0.03σ −0.04σ in BC. Replacing 10pp of low-performing peerswith high-performing students leads to test score gains of 0.06σ – 0.12σ in Ontario and 0.06σ −0.08σin BC.The remainder of this chapter is organized as follows. Section 4.2 gives an overview of the Late105Altonji et al. (2015) do motivate their paper by discussing how choice programs offered within a public school systemcan potentially affect the remaining children through similar mechanisms.106Furthermore, the Ontario estimates have much higher levels of precision than the estimates from British Columbia.107Much of this difference likely has to do with how “average” performing students are defined in the two jurisdictions.These differences arise because of differences in the scores provided, as discussed in greater detail below.894.2 Settingand Middle Immersion program. Section 4.3 describes the data and estimation samples. Section 4.4describes the empirical approach while section 4.5 presents and discusses the results. Section 4.6 con-cludes.4.2 SettingThe French Immersion program is a school-choice program offered by many public school districtsacross Canada. The program is an immersion language program — meaning that most of the students’instruction is in their non-native language, which in this case is French. In practice, there are many vari-ations of the program offered by school districts. The most common type is the Early French Immersion(EFI) program seen in chapters 2 and 3, in which initial enrolment takes place in kindergarten or grade1. In addition, school districts may also offer Middle French Immersion (hereafter, ‘MFI’, with entryin grades 4 or 5) or Late French Immersion (hereafter, ‘LFI’, with entry in grades 6 or 7). Typically, aschool district will offer one of the MFI or LFI program; however, in rare cases both are offered. As de-scribed in section 4.3 below, the data used in this paper are separate administrative datasets on studentsin select school districts in the Canadian province of British Columbia (BC) as well as the universe ofstudents in the province of Ontario. From examining school-district websites, I found that none of theBC school districts examined offer the MFI program, only the LFI program and in a couple of casesan additional French program known as Intensive French which is discussed further below. Meanwhile,in Ontario, a majority of school districts only offer the MFI program, as well as the Extended Frenchprogram. Finally, although this chapter takes advantage of both the LFI and MFI programs, throughoutthis chapter I simplify notation by often using the acronym LFI to refer to both types of programs, unlessstated otherwise, as many of the institutional features are similar for both programs as I describe below.The LFI program is designed for children in households in which French is not the language spokenat home; however, it is expected for the children to have some knowledge of French as Canadian ele-mentary school children enrolled in the traditional public school system are taught basic French as partof the curriculum. The program usually lasts until the start of secondary school; at this point, studentshave the option of continuing on in the secondary immersion program (joining the Early French Immer-sion students) or switching to alternative secondary schooling options (e.g. traditional high schools orprivate schools).108 The curriculum taught to students in the program mirrors that taught in the tradi-tional public school system; the major difference is the language of instruction. The exact distributionof classroom time in English and French varies by school district, but most share a similar pattern. Forexample, the Vancouver School Board requires that their LFI program be 100% French in grade 6. Ingrade 7 children are taught English Language Arts in English and the remainder of subjects in French.109Similarly, the MFI program in Toronto is 100% French in grades 4 and 5 with English introduced ingrade 6.108Alternatively, some school districts have a set-up such that the MFI program lasts until the start of middle school (grade 6or 7) at which point the children join in with the other EFI children.109Source: https://www.vsb.bc.ca/programs/late-french-immersion.904.3 DataThere is generally no screening process for accepting students into an LFI program. However, someschool districts do emphasize that the program can be quite challenging. For example, a presentation inthe school district of North Vancouver states that LFI is an, “intensive and challenging program...[an]excellent option for motivated students who are willing to put themselves out of their comfort zone”and that it would be helpful for the students who possess “good work habits and a positive attitude”(North Vancouver School District, 2016, slide 6). Therefore, while there is no formal screening processin place, all of these factors could still lead to overall positive selection into these programs.In most of Ontario and the school districts of Vancouver and Surrey in BC, there is a separate pro-gram that can be viewed as a less extreme version of MFI or LFI known as Extended French (hereafter‘EF’, primarily in Ontario) or Intensive French (found in BC). Extended French begins around grades4 (Ontario) or 6 (BC) and continues on until the start of secondary school. Unlike MFI, which startat 100% French in the first year, Extended French is a bilingual program — meaning that 50% of theclass is conducted in French and the remaining 50% in English.110 Unfortunately, there are few officialEF statistics available for Ontario, but data from individual school boards suggest that in some schooldistricts it is much more popular than MFI. For example, in the 2015/2016 school year, the TorontoDistrict School Board had 28 schools in which the EF program was offered, but only 3 which offeredthe MFI program. This, in turn, causes measurement issues particularly in the Ontario data with regardsto identifying MFI and LFI students. Ideally, in Ontario I would want to identify both the immersionand Extended French students. I discuss this issue in greater detail in section 4.3 below when I definethe estimation samples.1114.3 DataThis study uses two separate confidential administrative panel datasets from two large provinces inCanada — Ontario and British Columbia. Below, I discuss these two datasets in greater detail andoutline the methodology used to construct the main estimation samples.4.3.1 Data from OntarioThe data from Ontario comes from confidential administrative data obtained from the Education Qualityand Accountability Office of Ontario and accessed through the PEDAL data laboratories. The dataconsists of all standardized-test takers in grades 3 and 6 for the entire province of Ontario between2004 and 2012 inclusive.112 Informally, the standardized tests in Ontario are known as the “EQAO”exams. The EQAO are a series of tests in math, reading comprehension and writing administered to all110In BC, the Intensive French program is often set up such that classroom instruction is 80% French in the first half of theschool year and 20% of the second half.111This will not be a large concern in the BC data because of the size of the Intensive French program and as described insection 2.3.2, I am better able to identify LFI schools.112While the test score data goes back several years, it is only more recently that student identifiers were added to the dataallowing a researcher to follow a child over time.914.3 DataOntario students in publically funded schools in grades 3 and 6. These are low-stakes examinations;their results have no direct implications for student advancement, teacher evaluations or school funding.For every subject, a child who takes the EQAO is awarded a score from the set {0,1,2,3,4} with 4being the highest possible score and a score of 0 is referred in the data as “not enough for level 1”. Istandardize all test scores at the year-grade-subject level to be mean 0 and variance 1.113 The EQAOtests are mandatory for all children enrolled in a publically funded school; therefore, the data consistsprimarily of children enrolled in public and catholic school boards.114The data contains an identifier if a student is enrolled in an immersion language program, but itdoes not distinguish between Early French Immersion or Middle French Immersion. I define a childto be enrolled in the MFI program if they are observed to be enrolled in a French Immersion programin grade 6, but not in grade 3.115 One large concern about the Ontario data is the issue of ExtendedFrench. While the data identifies students enrolled in early or middle immersion programs, it does notidentify students in Extended French. Thus, it is possible I am severely understating the percentage ofchildren who are leaving a cohort to pursue a French as a Second Language (FSL) intensive educationprogram. In order to determine the extent to which this is a concern, I will run several robustness checkson the main results looking at all non-MFI children to leave a given school. The results suggest thatEF students are not a significant omitted variable. One possible explanation is that program preferencemight be very local; that is, parents who live near MFI schools enrol their child in those schools, andsimilarly for parents who live near Extended French schools. In my main specification of interest, Iexclude all schools which are never observed to have students in grade 3 subsequently enrol in the LFIprogram.Since the primary interest in this paper is the impact on the remaining students from enrolment inthe MFI program, I exclude all school-board years with fewer than 5 MFI students in a given year. Thisrestriction causes 270 board-years and 60% of students to be dropped from the sample.116 In addition,since the MFI designation needs to be imputed from the data, there exists measurement error concernin how MFI students are identified. In order to limit the extent this is the case, I check the numberof MFI students enrolled in a given school. If I find that the total number is less than five then I flagthese children and remove their (and the school’s) MFI designation. The idea is that I want to ensurethat my estimate of the number of MFI leavers is a conservative one.117 Less than 1% of MFI childrenare affected by this requirement and all of my results are robust to different thresholds. Finally, I alsoexclude school years in which more than 75% of children enrolled in the MFI program. I chose thisthreshold in order to avoid potential cases where a negative shock to a given cohort forces many parentsto choose alternative schooling and the MFI program is a convenient alternative. In total, this restriction113This is to make the results comparable with the standardized test scores in the BC data.114 In Ontario, Catholic school boards are publically funded — and regulated — entities.115All children enrolled in EFI in grade 3 are dropped from the sample.116Over half of these students come from school boards with no MFI students and 75% come from school boards with oneor fewer MFI students.117All of my results are robust to an alternative condition in which I allowed schools to have less than 5 MFI students as longthere were other early immersion students present in the school. This condition accounts for the possibility that some studentsjoined in with the early French Immersion program.924.3 Dataaffects less than 1% of the sample and all results are unchanged if these children are included in theestimating equations. The final sample contains 243,652 children, 13,609 of whom are labelled as beingenrolled in the MFI program. Appendix A contains additional details on how the sample is constructed.Even with the restriction that a school must have at least 5 MFI students to be labelled an MFIschool, there is still evidence of some measurement error taking place. Many schools are observed tohave MFI students for only 1 out of a possible 6 years. While there are legitimate reasons why this couldbe the case, for example if a school district tried out offering an MFI program but demand was not ashigh as anticipated (which would also be consistent with the low enrolment rates observed), there wereseveral cases where I was able to find historical information and see whether or not a school offered theMFI program. What I find is that there are several cases in which students enrolled in a school labelledMFI are actually enrolled in a school which offered the Extended French program. As we will see insection 4.3.3 below, most of these cases occur in the first two grade 6 years observed in the data — 2007and 2008. One potential solution to this issue is to alter the designations of students in schools I am ableto identify as EF. However, since this would involve manually changing a non-random subset of theseschools and the data, it is not an appealing option. Furthermore, as discussed above, EF students area key omitted variable from the analysis. It is primarily for these reasons that I decided to leave thesechildren in the main estimation sample with their MFI designation intact. However, in order to makesure these students are not driving the estimated results, I construct and present select results using analternative estimation sample with a stricter definition of MFI enrolment. In this alternative sample, Irestrict the definition of an MFI school to those schools observed to have at least five MFI students in atleast two different years. As in the main sample, if I find that this condition is not satisfied then I flagthe MFI children enrolled in these schools and remove their (and the school’s) MFI designation. As wewill see, where I present the results using both samples, many of these results are both qualitatively andquantitively similar regardless of the MFI definition used.4.3.2 Data from British ColumbiaThe British Columbia data is nearly identical to that described in chapters 2 and 3, and thus I discuss itin much less detail. The main difference between this chapter and the previous two is that I now limitthe sample to children observed in the greater Vancouver area in grade 4 as opposed to grade K. Thisis done for two reasons: to make the results as comparable as possible to the data from Ontario andsince grade 4 is the year of the first set of standardized tests.118 As in the Ontario data, I further limitthe sample to children observed in both grades 4 and 7 and those children with non-missing observablecharacteristics in grade 4. In addition, all children enrolled in the early French Immersion program ingrade 4 or Francophone schools are dropped from the sample.The BC data also does not distinguish between the different types of immersion programs; therefore,118Since my main estimation sample of interest involves limiting the analysis to children observed in the same school in bothgrades 4 and 7, this effectively requires me to drop all children in the school districts in which secondary school begins in grade9 and most elementary schools only go up to grade 5. Putting these districts back in for the estimation does not qualitativelyaffect any of the results.934.3 DataI need to impute whether a child enrolled in an LFI program. A student is defined to be enrolled in anLFI program if they are enrolled in a French Immersion program in grade 6, but not in grade 4.119Furthermore, as in the Ontario data, if the total number of LFI students enrolled in a given school (ingrade 6) is less than five, I flag these children and remove their LFI designation. Approximately 3% ofLFI students are affected by this change.In the case of British Columbia, I was able to obtain a list of schools which offered the LFI programduring the sample years; therefore, another restriction I impose is that a LFI student must be enrolledin a school that was indicated as offering LFI. Furthermore, this external list of LFI schools also allowsme to test my definition of LFI enrolment.120 In the BC data, approximately 96% of children labelledLFI (prior to the sample cuts described above) are in a school that was verified to offer the LFI program.Furthermore, my definition of an LFI school is able to identify 94% of all the LFI schools in the mainschool districts that were supposedly offered during the sample periods.121 Therefore, my definitionsappear to do a good job at identifying LFI students and LFI schools in BC. In total, the data contains91,498 students, 3,382 of whom are labelled as being enrolled in an LFI program. Appendix A containsadditional details on how the sample is constructed.1224.3.3 Summary StatisticsSummary Statistics at the School LevelIn section 4.3.2 above I discussed how I need to use the data in order to identify whether a school offersthe MFI or LFI program. Appendix table D.1 shows the estimated total number of MFI and LFI schoolsacross each of the sample years for each jurisdiction. First, focussing on British Columbia, we seethat the estimated number of schools offering the LFI program ranges from 17 in 2004 to 20 in 2011while hitting a high of 25 in 2007. Furthermore, in each year there are both LFI schools introduced andremoved from the available choice set to parents; thus, there will exist within grade-4-school variationin the distance to the nearest LFI school.123 Next, I turn to the estimated number of MFI schoolsin Ontario, which are also shown in appendix table D.1. Here, we see the extent of the issue I first119This is one area where I deviate from the definition used in the Ontario data. In the Ontario data, I used a child’s immersionenrolment status during the grades of the tests because those are the only two years I observe. In contrast, since I observe allyears in the BC data, I can define LFI enrolment based on the first grade in which it becomes available (grade 6) as opposedto the grade of the test (grade 7). I chose this definition because I want to focus on modelling initial enrolment into the LFIprogram as opposed to the the decision to enrol and subsequently remain. In total, 9% of children who enrol in LFI in grade 6are no longer enrolled in grade 7.120All of my results are robust to excluding this restriction.121It is not the case that the remaining 6% are missing because they all have fewer than five LFI students enrolled. Accountingfor these children only raises the percentage of schools matched to 96%. Alternate explanations for why some schools listed asLFI in my external list have no observed LFI students in my data include: measurement error in the external list or the externallist also includes Extended French schools.122An additional change I make in order to make the analyses more comparable is that I no longer drop those children witha special education designation. This is because, unlike in the BC Data, the Ontario data does not say why a child was givena special education designation. However, since I limit the sample in both datasets to children with all three test scores, thenthis will lead to many special-education children being dropped from both samples.123While the total number of changes in the available LFI schools might appear small at first, note that even small changespotentially affect students enrolled in many different schools by altering the distance to the nearest LFI program.944.3 Datadescribed in section 4.3.2 where a lot of schools in the first two years of the data are labelled as MFIfor one year only. From appendix table D.1 we see that using the main MFI definition of at least fivestudents implies 134 MFI schools in 2007, 94 in 2008 (with 56 schools in 2007 no longer offering MFIin 2008), 79 in 2009 and then slowly rises back up to 95 by 2012. At the same time, appendix table D.1also shows how the more conservative MFI school definition of at least five students in two differentyears leads to a more stable range of MFI schools between 75 and 85. The biggest differences areseen in the number of schools that no longer offer the MFI program. Under the alternative conservativedefinition of MFI enrolment, only three schools that offered MFI in 2007 no longer do so in 2008.124 Asdiscussed above, in this paper I will present select tables using both samples and we will see that bothyield similar estimates emphasizing the robustness of the results.125Table 4.1 looks at school level averages related to LFI entry. On average, after all sample cuts havebeen made (the “Whole Sample” column), the average percentage of children who leave a school to enrolin the LFI program is 5% in Ontario and 4% in BC.126 Excluding from the sample those schools whichare never observed to have a student leave for the LFI program — as is done in my preferred specification(the “Regression Sample” columns) — increases these percentages to 10% and 5%, respectively. Table4.1 also shows the distribution of the fraction of children in a given school year who enrol in the LFIprogram. In a majority of cases, this fraction is between 0 and 10%; although, we do also see somecases in which over half of a given school-cohort has left to enrol in LFI.Table 4.2 examines the baseline achievement of the LFI leavers in grade 3 (Ontario) and grade 4(BC). Here, I divide all children into three groups by baseline achievement. In BC, the groups are basedon the tercile of the child’s performance.127 In Ontario, the groups are based on the (unstandardized)test scores of students in grade 3; that is, on the 0–4 scale. Group 1 (hereafter the “low performing”group) consists of students with scores of 2 or lower. Group 2 (hereafter the “average performing”group) consists of students with a score of 3. Group 3 (hereafter the “high performing” group) consistsof students with perfect scores of 4. Looking at whole sample in Ontario, we see that, on average acrossall subjects, 70% of children who enrol in the middle immersion program are in the average performinggroup while 15% of children are in both the low and high performing groups. This is in contrast to thepopulation distributions; for example, the population distribution of grade 3 math scores are such that 28percent score 2 or lower, 60 percent score at level 3 and 12 percent score at level 4.128 Thus, we see thatlow-performing students are underrepresented in MFI while average and high-performing students are124Similarly, the number of schools labelled MFI in 2008 but not 2009 declines from 27 in the main sample to 14 in thealternative sample.125In particular, I present all of the TSRI results using both definitions. All other results are based on the main definition ofat least 5 students in a given year, unless stated otherwise.126Without the sample cuts, the average in both provinces is approximately 2.5% in Ontario and 3.3% in BC. The percentagesincrease in Ontario because of the board restriction to have at least 5 LFI students. In contrast, the data in BC is already limitedto large urban school districts.127Note that since the writing scores in BC are also on an integer scale it is not possible to split the sample using achievementinto three equally sized groups. The size of the three groups are 30%, 40% and 30% for the low, average and high performers,respectively.128For reading, the corresponding grade 3 percentages are 33%, 57% and 7%, respectively while for writing, the percentagesare 31%, 63% and 6%.954.3 Dataoverrepresented. These differences are even more stark in the “regression sample” columns. Similarly,in BC we see that between 40 and 50 percent of children who enrol in LFI are in the top performingtercile. In summary, what the results in table 4.2 show is that while LFI students are drawn from a higherend of the ability distribution, there is heterogeneity in terms of the quality of the LFI students.While table 4.2 shows heterogeneity across schools, it tells us nothing about how much heterogeneitythere is within schools. For example, it could be that in some schools only high performing childrenleave for LFI while in other schools only low performing children leave. In order to determine howmuch within school variation there is, I examine the fraction of LFI leavers within each school at eachachievement level. These results are shown in table 4.3, which is limited to school-years in which atleast two students left for LFI. Columns 1 and 4 of table 4.3 show the percentage of schools in whichat least one student in each of the three achievement bins defined above in table 4.2 left for LFI inOntario and BC, respectively. Columns 2 and 5 show the percentage of schools with LFI leavers intwo of the three achievement while columns 3 and 6 show the percentage of schools in which all ofthe students who left for LFI were in the same achievement grouping. For example, based on mathscores in Ontario, we see that the percentage of schools in each of these three categories are 26% (MFIleavers in all achievement bins), 53% (MFI leavers in two achievement bins) and 21% (all MFI leavershave similar baseline achievement scores). Similar results are seen in the remaining subjects as wellfor Ontario. For BC, the corresponding percentages in terms of math outcomes in columns 4, 5 and 6are 27%, 46% and 27%, respectively. Similar results are seen in terms of grade 4 reading scores whilefor grade 4 writing scores we see fewer schools with LFI leavers in all three terciles and more withleavers in only two terciles. Furthermore, in appendix table D.2, I show that there are differences inthe quality of the LFI leavers looking at the same school over time. Appendix table D.2 is a transitionmatrix comparing the distribution of achievement of LFI leavers in a school in years t and t + 1. Forexample, in Ontario schools where all LFI leavers all score at the same math level, in the following yearonly 27% of these schools had leavers that were once again all in the same achievement category. Moregenerally, appendix table D.2 shows a lot of variation in the distribution of LFI leavers within a givenschool from one year to the next. In summary, these results show that there is a lot of within schoolvariation in the quality of the LFI leavers and that it is not the case that the students who leave a givenschool to enter the LFI program are all of similar ability.The final set of results presented in this subsection examines differences between grades 3 or 4schools that have children subsequently enrol in the LFI program and those that do not. The motivationis to see whether there are specific school characteristics that influence LFI enrolment. For example,parents of students in cohorts with many low-performing peers might view the LFI program as a wayto increase the quality of their child’s peers. These questions are examined in appendix table D.3 whichshows the characteristics of those school cohorts in which a child did (columns 1 and 2) and did not(columns 3 and 4) enrol in the LFI program. However, comparing columns 1 and 3 shows no significantdifferences between these two sets of schools. School cohorts with students who subsequently enrol inLFI do not disproportionately have more lower or higher performing students, nor do they differ alongother dimensions as well such as percentage female, percentage ESL or percentage special-education.964.3 DataStatistics at the Student LevelAll of the results presented so far have been at the school level. In this section, I present results atthe student level examining how LFI children differ from the remaining population and examine whatfactors go into a student’s LFI enrolment choice. Appendix table D.4 examines how LFI students differfrom the remaining students along several observable characteristics. LFI students are 10pp more likelyto be female, less likely have English as their Second Language (in BC), less likely to be labelled“exceptional” (in Ontario; these are students primarily with special needs) and much more likely toattend a grade 3 or 4 school that is within 2km of a school that offers LFI, with differences of 50–60ppin Ontario and 30pp in BC.129 Similarly, in table 4.4, I present the marginal effects from a logisticregression of LFI enrolment onto a set of controls based on when the child was enrolled in grade 3(Ontario) or 4 (BC). For each specification, I present the results both without school fixed effects (theodd-numbered columns) and with school fixed effects (even numbered columns). Furthermore, columns1 and 2 are estimated using the main sample from Ontario while columns 3 and 4 are estimated on thealternative sample with the more conservative definition of MFI enrolment. For each specification, table4.4 shows that girls are 2–3pp more likely to enrol in an LFI program. Note that while the marginaleffects in Ontario are generally larger, how the estimation samples are constructed implies that theLFI enrolment rate in Ontario (10%) is nearly twice that in BC (5%). In addition, consistent withappendix table D.4, table 4.4 also shows that being labelled as ESL in BC decreases the probability ofLFI enrolment by 4pp while a child indicated as exceptional in Ontario reduces the probability of LFIenrolment by 10–16pp. Furthermore, looking at the impact of achievement, we see that a 1σ increasein the baseline level of achievement in math, reading and writing leads to increases in the probabilityof LFI enrolment of 2–3pp for each subject in Ontario and 1pp for each subject in BC. All of theseresults are significant at the one percent level. Furthermore, the fact that these effects remain even afterincluding school fixed effects suggests that it is not just absolute scores that matter but also childrendoing better than the average child in a school. Finally, the last parameter I want to discuss is theimpact of attending a grade 3 or 4 school that is within 2km of an LFI school. From columns 2 and6, we see that — even with school fixed effects — attending a school close by to an LFI school raisesthe probability of attending an LFI school by 14pp and 2.4pp in Ontario and BC respectively with thecoefficients significant at the one percent level. While the result from Ontario is quite high, it is possiblybeing driven by the large number of MFI schools in the first two years. In columns 3 and 4, I re-run theregression using the sample with the more conservative definition of MFI students and schools. Even inthis specification with much fewer changes in the number of available MFI schools, we continue to seea positive and significant impact of being within 2km of an LFI school of 10pp (column 4). In summary,these results continue to show that high achieving students are more likely to enrol in the LFI programwhile other factors matter as well such as gender and the distance to the nearest LFI school.129Some of the differences across jurisdictions are likely due to how the programs and labels are assigned. From appendixtable D.4 note that Ontario has a high percentage of students labelled exceptional (special needs) while BC has a higherpercentage of students with an ESL label.974.4 Empirical Strategy4.4 Empirical Strategy4.4.1 Within School VariationI begin by calculating the impact of students leaving for the LFI program on the remaining students. Forthe analyses given in subsections 4.4.1–4.4.3, I primarily follow the methodology used in Imberman etal. (2012). The main specification of interest is given by:Yis(t+3)(g+3) = α+X′istgβ + f (Yistg)+ γ%LFIistg+δt +δs+ εis(t+3)(g+3) (4.1)where Yisgt is the value of outcome Y — a child’s standardized test score in math, reading or writing —for student i in grade g in year t and who attended school s during grade g. Note that equation (4.1)controls for the child’s baseline test scores in grade g; thus, we are primarily interested in changes inachievement. For the Ontario data, f (Yistg) takes the form of year-score fixed effects, taking advantageof the fact that students can only earn one of five possible scores.130 In BC, f (Yistg) takes the formof a third-order polynomial in Yistg.131 The value of g depends upon the jurisdiction being examinedand is based on the grade of the standardized tests; for Ontario we have g = 3 while for BC we haveg = 4. X is a set of observable characteristics, the values of which are based on those observed at thetime the child was in grade g. %LFIist(g−3) is the percentage of children originally enrolled in schools with student i in grade g who are observed to be enrolled in the LFI program in grade (g+ 3). Themain parameter of interest of this paper is given by γ , representing the impact on future test scores froma 100pp increase in LFI leavers. Finally, δt and δs represent year and school fixed effects, respectively.Regression specification (4.1) is run separately by subject and province and is limited to children whodo not enrol in the LFI program. In my preferred specification, the sample is further limited to childrenobserved in the same school for both grades g and (g+3). This removes any impact from other non-LFIstudents who have left the school for unknown reasons.4.4.2 HeterogeneityThe regression specification in equation (4.1) looks only at the percentage of children in a given cohortthat leave for the LFI program, but does not account for the quality of the leaving peers. While table 4.2shows that LFI students perform better on baseline standardized tests compared to the average student,tables 4.2 and 4.3 also show that there is a lot of heterogeneity in terms of the quality of the LFI studentsboth across and within schools. Furthermore, previous papers such as Burke and Sass (2013), Imbermanet al. (2012) and Hoxby and Weingarth (2005) have shown that different peers have different effects. Inthis section, I explore whether different sub-groups — divided by baseline performance — are impacting130In this paper I follow the previous literature that makes use of EQAO data (e.g. Card, Dooley Payne, 2010 and Baker,2013) in treating the Ontario test score as a continuous dependent variable. However, as a robustness check I will also presentselect results using an ordered probit specification.131Allowing the BC coefficients for f(Yistg)to vary by year does not materially affect the results.984.4 Empirical Strategythe remaining students in different ways. The main estimating equation is now given by:Yis(t+3)(g+3) = α+X ′istgβ + f (Yistg)+J∑j=1γ j%LFIQ jystg+δt +δs+ εis(t+3)(g+3) (4.2)where the major difference between equations (4.2) and (4.1) is that equation (4.2) replaces the percent-age of all children who left a cohort for the LFI program with the percentage of all children who left forthe LFI program and were in the jth bin based on their grade g test score; In practice, I use J = 3 in orderto make the results comparable across the two jurisdictions. For the Ontario data, it is not possible tosplit the data into three equal groups. Instead, the three bins I use are as described in table 4.2 above andare given by: Q1 = {yistg|yistg < 3}, Q2 = {yistg|yistg = 3} and Q3 = {|yistg > 3}where yistg is the originalun-standardized score of the child. For the BC data, the three bins are based on the actual terciles of thestandardized scores at the year-grade-subject level.1324.4.3 Endogeneity of the LFI DecisionIdentifying the impact of one’s peers is extremely difficult due to factors such as the non-random as-signment of children to schools. This omitted variable bias is what motivates the inclusion of schoolfixed effects in equations (4.1) and (4.2). All variables should now be interpreted in terms of deviationsfrom their average school levels. The main identifying assumption is that deviations from the mean interms of a school’s average percentage of children who leave for the LFI program are uncorrelated withother factors which impact deviations from the average change in student achievement among childrenenrolled in a given school.133Even with school fixed effects, the estimate of γ will still be biased if there are specific school-grade-year shocks or trends that affect both future test scores and parents’ LFI decisions. For example, anegative shock to an entire school year could induce many students to leave and enter the LFI program,while at the same time cause an individual child’s future achievement to decrease. More generally,Angrist (2014) surveys these methods and finds that even taking advantage of random variation canlead to biased estimates. In order to address these concerns, I turn to a two-step residual inclusionapproach which is similar in spirit to the techniques popularized in the health economics literature byKessler and McLennan (2000) and Gowrisankaran and Town (2003). The idea is to explicitly modelparents’ LFI choices and construct variables predicting the percentage of students in each school wholeave for LFI. This method consists of two main steps. First, I run a logistic regression of each child’sLFI enrolment on a set of child and school characteristics as well as school fixed effects and use the132The one exception is the writing score which was grouped into bins combining the overall years. This was done because,as described in earlier chapters, there is no scaled score for the writing tests, only a raw score on an integer scale. Similar tothe Ontario data, in some years it was not possible to get an equal distribution of test scores into 3 bins.133Furthermore, there are also more technical issues surrounding the issue of peer effects such as the reflection problem(Manski, 1993) which arises due to the simultaneity of estimating both the impact of an individual on her peers and the impactof her peers on the individual. Exploiting the fact that the LFI program becomes available after grade 4 and that there isvariation over time in the availability of the program helps to address this concern. The impact of peers is being driven byexogenous variation in terms of who enrols in the LFI program.994.4 Empirical Strategyestimated parameters in order to calculate (i) each child’s predicted probability of entering the LFIprogram and (ii) the predicted percentage of students in each school year who will leave for the LFIprogram. Second, I rerun the regression given in equation (4.1) except now I also include as a controlvariable the difference between the actual percentage of students in a given school who enter the LFIprogram and the predicted percentage calculated in step 1 above. Similar to a two-stage least squaresestimator, my goal is to isolate exogenous changes in the percentage of students who leave to enter theLFI program.Formally, the first stage of the 2SRI approach is given by the following logistic regression equation:LFIitsg = G(h(Xitg,{Yi jtg}Jj=1,Dit)+δs+δt +ξitsg)(4.3)where LFIitsg is an indicator variable equal to 1 if student i in school s in year t subsequently en-rols in the LFI program, G() is the logistic function, X is a set of observable characteristics, Yi jtgis the student’s achievement in grade g in subject j, Dst is a measure of the distance to the near-est LFI school (discussed further below), δs and δt are school and year fixed-effects, respectivelyand ξ is an idiosyncratic shock with a type 1 extreme value distribution. As before, all observablecharacteristics take on the values observed while the child was enrolled in school during grade g. Inpractice, I assume that h( ˜Xitg,{Yitg j}Jj=1,Dst) is given by the linear specification, h(Xitg,Yitg,Dst) =Xitgθ1 +∑Jj=1 f (Yi jtg)+Dstθ3, however, all results are robust to an alternative specification in which Iallow the distance variable to interact with the child’s own baseline test scores. Once equation (4.3) isestimated, I follow McFadden (1973) by calculating the probability each child i in school s in year tenrols in an LFI program, pˆist = P(LFIits = 1|θˆ) =exp(̂h(Xitg,Yitg,Ditg)+δˆs+δˆt)1+exp(̂h(Xitg,Yitg,Ditg)+δˆs+δˆt) where expressions witha “hat” denote predicted values based on the parameter estimates from equation (4.3). Given the indi-vidual probabilities of the LFI enrolment, I can then aggregate up to the school level by calculating thepredicted percentage of students who will leave to enter the LFI program, ˆ%LFIst :ˆ%LFIst =∑i∈S pˆistNstwhere Nst is the total number of children enrolled in school s in year t (during grade g− 3). Finally,I estimate a version of equation (4.1) with an additional term consisting of the difference between thepredicted and actual percentage of children who leave for LFI:Yis(t+3)(g+3) = α+X ′istgβ + f (Yistg)+γ%LFIstg+δt +δs+pi( ˆ%LFIstg−%LFIstg)+εis(t+3)(g+3) (4.4)where all other parameters are defined as above.Similarly to equation (4.1), the estimates of γ j in equation (4.2) will also be biased if there areomitted variables that affect deviations from the school average in both the percentage of children in agiven bin who leave for LFI and also changes in standardized test scores. Fortunately, we can addressthis issue using the 2SRI approach described above. The major difference is that I now use the first-stage1004.4 Empirical Strategyresults in order to predict the percentage of children in each bin that will leave for the LFI program. Ithen re-run equation (4.2), but now include as controls the difference between the actual and predictedpercentage of children in each group who left for the LFI program.In order to ensure that the identification and estimates of γ are not being driven by the non-linearitiesinherent in the TSRI approach, I include a variable in the LFI choice equation which is not seen inequation (4.1). The variable I have chosen for this purpose is a function of the distance to the nearestschool which offers the LFI program. Define dFIis jt to be the distance of the school student i attends ingrade 3 or 4 (school s) to school j observed in year t, where school j offers LFI. I define the variablesthis way since in the Ontario data I do not have any information regarding the home addresses of thechildren; the only location data I have is regarding the location of the school attended. Using thesedistance measures, I define the instrument, Dst as follows:Dst = 1(min j{ds jt}< 2) (4.5)In words, equation (4.5) says that Dst is a dummy variable equal to 1 if a school s is located within 2km ofan LFI school. Since equation (4.3) contains school-fixed effects, then the key identifying assumption isthat changes in the availability of LFI around a particular school are uncorrected with changes in studentachievement except through its impact on LFI enrolment.One of the biggest concerns about the exclusion restriction related to distance is that parents mightsort into neighbourhoods that offer or do not offer the LFI program. Similarly, LFI schools might locatein areas where demand is highest and this itself might be correlated with changes in student achievementof the non-LFI students. While the exclusion restriction on distance is not directly testable, we can at thevery least check to see if it is correlated with changes in the observable characteristics, after accountingfor school fixed effects. This is similar in spirit to the tests run in chapter 3. Appendix table D.5 showsthe results of a regression of school-year level averages, xsgt , onto Dst along with year and school fixedeffects. The results show that distance has either no or a small correlation with these variables. Forexample, I find that a school becoming within 2km of a LFI school in Ontario is correlated with changesin grade 3 test scores of -0.008σ , 0.028σ , and 0.033σ in math, reading and writing, respectively inOntario, none of which is significant at standard levels. Similar results are seen in BC as well withcoefficients of 0.013, 0.012 and 0.008, respectively. Furthermore, appendix table D.5 also shows thatbeing closer to an LFI school is uncorrelated with other school level averages such as percentage femaleor percentage ESL. Therefore, we do not see any evidence that changes in the availability in the LFIprogram are correlated with changes in baseline achievement.4.4.4 Using LFI to Look at Peer Effects More BroadlyAll of the specifications presented so far are designed to show the impact of children leaving for the LFIprogram on the remaining students. Alternatively, these specifications can also be viewed as “reducedform” regressions if the main parameter of interest is not the impact of LFI leavers, but the impact ofone’s peers more generally. Thus, as a final exercise I present the two stage least squares (TSLS) results1014.5 Results and Discussionfor the impact of changes in peer quality on student achievement using the predicted percentage of LFIleavers of different abilities as instruments for changes in peer quality. Formally, there are now threesteps to this methodology. The first step is identical to equation (4.3) above. The second and thirdstages are to run a TSLS regression using the predicted percentage of LFI leavers of different abilitiesas instruments for changes in peer quality and including the difference between the predicted and actualperformance of the leavers as a baseline control. In particular, let %YQ jy(−i)stg represent the percentageof student i’s peers who score in bin j in grade g in year t, and ∆%YQ jy(−i)s(t+3)(g+3) represent the changein these percentages between grades (g+3) and g. The main estimating equation is given by∆%YQ jy(−i)s(t+3)(g+3) = pi0 j +X′istgpi1 j + f′j (Yistg)+J∑r=1[pi2r j(%FIQrystg)+pi3r j(RQrystg)]+δt + ... (4.6)...+δs +u jistg∀ j ∈ 2, ...,JYistg = α+X ′istgβ + f (Yistg)+J∑j=2θ j∆%YQ jy(−i)s(t+3)(g+3)+J∑j=1ρ j(RQrystg)+δt +δs + vis(t+3)(g+3) (4.7)where all other parameters are defined as before. Note that, by construction, the total sum of the changesin peer quality, ∑Jj=1∆%YQ jy(−i)s(t+3)(g+3) , is 0, which is why I have excluded the change in the per-centage of low-performing peers from the regression. The main coefficients of interest are now givenby θ j, which show how changes in the composition of peer quality affect student achievement. Theinterpretation of the values of θ2 and θ3 are that they represent the impact from replacing 100% of one’speers from low-performing students to average and high-performing students, respectively.134 By ex-tension, the difference between θ3 and θ2 gives the impact of replacing average-performing peers withhigh-performing peers.4.5 Results and Discussion4.5.1 Institutional Differences Between Ontario and BC DataPrior to discussing the results, I first discuss key differences in the data between the two jurisdictions andhow these differences could affect the estimated results. While I do my best to ensure that the analysesin both the Ontario and BC datasets are as comparable as possible, there are some differences thatcannot be avoided and will potentially affect the results. The major difference concerns how childrenare assigned to bins based on achievement. In BC, because the scaled scores are continuous measures,it is much easier to split the scores into 3 equal terciles. In Ontario, because all scores are on an integerscale from 0–4 and a majority of students have a score of 3, the distribution will have a much highervariance. In addition, children in the low performing group in Ontario likely have a lower underlyingability, on average, than children in the lowest performing group in BC. Conversely, children in thehighest performing group in Ontario likely have a greater underlying ability than the top performing134The main advantage of the specification given in equation (4.7) is it it explicitly looks at the impact of replacing one typeof peer with another.1024.5 Results and Discussiontercile in BC. This will have implications for the estimated coefficients if we think there is a monotonicrelationship between peer quality and student achievement. This is an issue I will come back to whendiscussing the actual results.Furthermore, the coarseness of the Ontario scores potentially biases any measure of the changes intest scores — children with scores of 4 can only go down while children with scores of 0 or 1 havenowhere to go but up. Formally, I find that 56% of children have same score in both grades 3 and 6.However, the results below suggest that this is not a major concern. There will be several instances wherewe see positive and significant effects even on the top performing students. Intuitively, even though it istrue that children cannot score higher than 4 (or less than 0/1), since all scores are standardized, childrencan still perform relatively better (or worse) than their peers, even though their absolute score remainsthe same.Another difference between the two jurisdictions is that while the BC data spans a longer timeperiod, there are more schools in the Ontario data. This has implications for standard errors, which areclustered at the grade 3 or 4 school level. Accordingly, we will observe that the Ontario estimates are,by and large, more precisely estimated (and also more robust to alternative specifications) than their BCcounterparts.1354.5.2 Within School VariationThe results of equation (4.1) are shown in table 4.5. Column 1–4 show the results from Ontario whilecolumns 5–8 show the results from BC. In columns 1 and 5, the sample is limited to all non-LFI studentsenrolled in a given grade 3 school so long as that school was ever observed to have students subsequentlyenrol in the LFI program. In both jurisdictions, columns 1 and 5 show negative effects on the achieve-ment of the remaining students as more individuals in a cohort leave for the LFI program. In particular,the coefficients on %LFI in the case of Ontario are -0.16, -0.19 and -0.22 for outcomes in math, readingand writing, respectively. Since %LFI is defined on a scale from 0 to 1, these coefficients imply thata 10pp increase in the fraction of children who leave for LFI in Ontario is associated with a 0.016σ ,0.019σ , and 0.022σ decline in the grade 6 test score in each of math, reading and writing respectively.These coefficients are significant at the one percent level for outcomes in reading and writing and signif-icant at the five percent level for achievement in math. Below, I discuss the magnitudes of these effectsin much greater detail. For BC, the corresponding coefficients in column 5 are -0.15, -0.19 and -0.2,respectively, with only the coefficient for reading outcomes significant at the five percent level. Whilethe coefficients on the remaining subjects are not significant, they are all similar to the values obtainedlooking in Ontario despite the large differences between the jurisdictions.Columns 2 and 6 in table 4.5 further limit the sample to only those non-LFI students enrolled in thesame school in the later grade. This restriction further emphasizes that it is the LFI program that leadsto a “break” in the cohort and not factors relating to issues such as schools closing, schools feedinginto other schools or students leaving the school more generally for other reasons. Columns 2 and 6135There are 856 schools in the regression sample for Ontario and 357 for British Columbia.1034.5 Results and Discussionrepresent the preferred specification of the paper. I find that limiting the sample to children enrolled inthe same school causes a slight increase in the magnitude in each of the coefficients. This makes senseas children who switch schools should not be greatly affected by the percentage of their former peerswho leave for LFI because these children have lost all of their former peers by changing schools. InOntario, the coefficients on LFI in column 2 are -0.24, -0.20 and -0.23 for outcomes in math, readingand writing, respectively, with each coefficient significant at at least the five percent level. As before,we also see similar results for BC, with coefficients of -0.18, -0.23 and -0.25, respectively; furthermore,the coefficients for reading and writing are significant at the one and ten percent levels respectively.The coefficients on %LFI in columns 2 and 5 above range from -0.25 to -0.18. In the estimationsample, the average value of %LFI is between 3% and 7%; therefore, the expected impact of LFIleavers on the remaining students is a decrease of 0.01σ– 0.02σ . These differences are not very large.Furthermore, recall from table 4.1 that the number of schools with over 10% of children leaving forLFI was 12% and 25% in BC and Ontario, respectively. Therefore, these results suggest that the overallimpact of the LFI leavers on the remaining children is small for a majority of students.Columns 3, 4, 7 and 8 all examine the robustness of these results to alternative specifications. Incolumns 3 and 7, I further limit the sample to all children enrolled in those schools which are observedto have students enrol in LFI at least three different years (as opposed to only once in columns 2 and6). Columns 4 and 8 limit the sample to only those school years in which a school is observed to haveat least one child enrol in the LFI program. The motivation behind each of these specifications is tocheck if the key identifying variation is coming off the intensive or extensive margin with regard to LFIenrolment. For Ontario, what we find is that the results in both columns 3 and 4 are qualitatively similarto the previous set of results — albeit lower in magnitude and less precise because of the lower samplesize. In BC, while the results in column 7 are similar to those in column 6, the coefficients in column8 for outcomes in math and reading are over twice those seen in column 6 while the results are similarwhen looking at achievement scores in writing. Therefore, overall it is not the case that changes in theextensive margin are driving the results found so far.One question regarding these results is how much are LFI leavers proxying for the total percentageof students in grade 3 or 4 to leave a given school cohort?136 In order to examine this question, Ire-estimate equation (4.1) except now the key independent variable of interest is the percentage ofstudents who exited a given cohort, but did not enrol in LFI. What I want to know is do we see thesame coefficients as in table 4.5 above. The results are shown in Appendix table D.6. Unlike table 4.5which showed negative and significant effects of LFI leavers; in contrast, Appendix table D.6 showssmall, often positive and often insignificant effects of non-LFI leavers.137 Therefore, it does not appearto be the case that the impact of LFI leavers in equation (4.1) is proxying for the total exit rate from agiven school cohort.136For example, suppose that a school year level shock encouraged parents to remove their child from a given school. Formany of these parents, the most viable outside option might be the LFI program.137The one exception is a positive and significant effect of reading scores in BC for non-LFI schools. But, this result is notseen for the other subjects or at all in Ontario.1044.5 Results and Discussion4.5.3 Heterogeneity in the LFI StudentsWhile the overall impact of the LFI leavers on the remaining students is not substantial, it could be thatthese results are masking larger offsetting effects from different sub-groups. In order to explore thisissue, I now turn to estimating equation (4.2). The results for Ontario and BC are shown separately intables 4.6 and 4.7, respectively. All regressions are in the preferred specification — limited to students inthe same school for both grades and schools that have at least 1 LFI student. In column 1, I examine theeffect of different LFI leavers on all of the remaining students. The remaining columns examine how theLFI leavers affect different students based on their baseline performance. The idea is that, for example,high-performing students who leave for LFI might affect high-performing children who remain in thetraditional school system differently than low-performing students who remain, and vice-versa. Forthe remainder of this section, I overview the results from both Ontario and BC separately; however,I save discussing the results in greater detail until the following section where we have accounted forunobserved heterogeneity in the LFI choices of parents.Results from OntarioFirst, I start off showing the impact of students with the lowest scores leaving for the LFI program. Incolumn 1, we see that as low-performing students leave for the LFI program, the effect on the remainingstudents is always positive and significant when looking at outcomes in math and writing. For example,the coefficient on %LFIQ1 when the dependent variable is math achievement is 1.05, implying that a10pp increase in the fraction of LFI students scoring 2 or lower leads to an increase in math achieve-ment of 0.1σ — an outcome that is both economically and statistically significant. The correspondingcoefficients for outcomes in reading and writing are 0.24 and 0.56 respectively with only the latter sig-nificant at the five percent level. The results in the remaining columns look at how different students(split by grade 3 test scores) are affected by the LFI leavers. In columns 2–4, we see that low-performingstudents leaving for the LFI program have positive effects on each of the remaining students, with thelargest impact on the top-performing remaining students. This is true for each subject; for example, a10pp increase in the fraction of LFI students scoring 2 or lower increases the math, reading and writingscores of the remaining students with perfect scores of 4 (in grade 3) by 0.2σ , 0.15σ and 0.25σ , respec-tively, though only the coefficient with math outcomes as the dependent variable is significant at the fivepercent level.Next, I examine the impact of children who leave for LFI and had a grade 3 score of 3 (the middle-performing leavers). In column 1, we see that the overall impact of these children leaving for LFI is asmall decrease in achievement. For math outcomes, the coefficient on %LFIQ2 is -0.31, implying that a10pp increase in the fraction of LFI students with a score of 3 leads to a decrease in math achievementof 0.03σ . For outcomes in reading and reading and writing, the coefficient on %LFIQ2 is -0.17 and-0.32, respectively. While only the latter coefficient is significant at the one percent level, neither ofthese coefficients is substantively large. In columns 2–4, we see that the largest impact of the middle-performing leavers is on the remaining students in the bottom and average performing bins. At the same1054.5 Results and Discussiontime, I find no effect of these LFI leavers on the top-performing remaining students.The impact of children with perfect scores of 4 leaving for LFI is negative with coefficients that aremuch larger in magnitude than those seen for children leaving for LFI with scores of 3. The coefficientsin column 1 show that a 10pp increase in the fraction of LFI students with perfect scores decreases themath, reading and writing scores of the remaining students by 0.07σ , 0.07σ and 0.05σ , respectively.These coefficients are both statistically and economically significant. From the remaining columns, wesee that high-performing leavers tend to have a negative and often significant effect on all remainingstudents across each subject.138Results from British ColumbiaIn general, the coefficients seen in the BC regressions are qualitatively similar to those seen in the On-tario regressions, with smaller magnitudes and less precisely estimated. From column 1, we continue toobserve that low-performing students leaving for the LFI program have positive effects on the remainingstudents. The coefficient on %LFIQ1 is 0.53, 0.21 and 0.14 when the dependent variable is achievementin math, reading and writing, respectively, with only the former significant at the ten percent level. Thecoefficient on %LFIQ2 is -0.29, -0.14 and -0.10 when the dependent variable is achievement in math,reading and writing, respectively. None of these coefficients is significant at standard levels. Thesecoefficients also imply very small overall effects of average performing leavers. Finally, the coefficientson the fraction of top performing students leaving for LFI are -0.37, -0.44 and -0.55 when looking atoutcomes in math, reading and writing, respectively. Each of these coefficients is significant at the fivepercent level and imply that — as in Ontario — high-performing leavers have a negative effect on theremaining students.The results in the remaining columns look at how different students are affected by the LFI leavers.Unfortunately, due to low levels of precision, it is difficult to make any claims with much certainty. Forthe most part, The results are consistent with those seen in the Ontario regressions. low-performingleavers have positive effects on remaining students in each of the three groups and for each subject.Similarly, high-performing leavers have negative effects on the remaining students and for each of thethree subjects.1394.5.4 Accounting Unobserved Heterogeneity in the LFI DecisionThe key identifying assumption in the previous set of results is that deviations from a school’s averagepercentage of children leaving for LFI are uncorrelated with deviations from average changes in testscores. Examples of cases where this assumption is violated include school year shocks that affect bothLFI enrolment and future performance. In order to address these concerns, I turn to the 2SRI approach138The one exception is students with low math scores, where we see a small positive coefficient.139Interestingly, another result consistent with the Ontario regressions is that I find middle performing students leaving forLFI have large negative and significant effects on the math outcomes of low-performing students who remain, but that high-performing students who leave for LFI have no effect.1064.5 Results and Discussiondescribed in section 4.4 and equations (4.3)–(4.5). The results of estimating equation (4.3) are verysimilar to those already seen in table 4.4. The major difference is that (4.3) includes a more involvedfunction of baseline achievement while the regressions run in table 4.4 do not. However, this does notappear to make any material changes. In particular, the estimated marginal effect of the instrumentDstwhen estimating equation (4.3) is 0.15 in Ontario and 0.025 in BC with t-statistics of 13.7 and 4.3,respectively.After equation (4.3) is estimated, the next step is to generate a predicted probability of LFI enrolmentfor each student and to aggregate results at the school year level. Using the predicted percentages of LFIenrolment, I then proceed to estimate equation (4.4). For the remainder of this subsection, I describe theresults for both the impact of all LFI leavers combined and the impact of different children leaving forLFI on different children who remain in the same school. I then conclude with an overall discussion ofthe results.TSRI Results – All LeaversThe results of estimating equation (4.4) are shown in table 4.8. The results from Ontario in columns 1–4are qualitatively very similar to the results seen in table 4.5, but are larger in magnitude. For example,in the preferred specification in column 2, I find that a 10pp increase in the fraction of LFI leavers leadsto a decrease in achievement of 0.06σ , 0.04σ and 0.04σ in the grade 6 outcomes for math, readingand writing, respectively. Furthermore, each of the coefficients in column 2 is significant at the onepercent level and similar results are also seen in column 1, which looks at the entire sample. Finally, incolumns 3 and 4, we see coefficients which are smaller in magnitude — for example the coefficients formath, reading and writing outcomes in column 4 are -0.5, -0.31 and -0.21, respectively — though stillsignificant (for math and reading) and more economically meaningful than the corresponding resultsseen in table 4.5. Thus, we continue to see that the estimated results are not being driven by changes inalong the extensive margin with regards to whether schools have any LFI leavers or not.The TSRI results for BC — shown in columns 5–8 — indicate much larger negative effects thanthose seen in table 4.5, particularly for outcomes in math and reading. For math outcomes in the pre-ferred specification (column 6), we see that a 10pp increase in the fraction of LFI leavers leads to adecrease in achievement of 0.12σ , an effect that is significant at the one percent level. Similar resultsare also seen in columns 7 and 8. For outcomes in reading, a 10pp increase in the fraction of the LFIleavers decreases reading achievement by 0.1σ . Furthermore, each of the coefficients in columns 6–8are all statistically significant and economically meaningful. Finally when the dependent variable inachievement in writing, a 10pp increase in the fraction of the LFI leavers decreases outcomes by 0.04σ ;however, neither this coefficient nor those seen in the remaining columns are significant at standardlevels.1074.5 Results and DiscussionTSRI Results – Heterogeneity in OntarioTable 4.9 shows the corresponding TSRI results from Ontario looking at heterogeneity in the impactof the LFI leavers. Once again, we see that that the results are qualitatively similar to those seen intable 4.6, but with coefficients which are larger in magnitude. low-performing students leaving for LFIcontinue to have a positive effect on the remaining students, with corresponding coefficients for math,reading and writing of 1.43, 0.45 and 0.58, respectively. The coefficients for math and writing outcomesare significant at the one and ten percent levels, respectively. Furthermore, we also continue to observethat while this positive effect is present for all students, it is highest on the remaining children with topbaseline test scores. A 10pp increase in the fraction of LFI leavers with grade 3 test scores of 2 or lowerleads those remaining students with high baseline levels of achievement to experience test score gainsof 0.36σ , 0.38σ and 0.48σ in math, reading and writing, respectively. These are substantial gains andeach of the corresponding coefficients is significant at the five or one percent level.Table 4.6 also shows that children who leave for LFI and have a baseline test score of 3 negativelyimpact the remaining students. The corresponding coefficients in column 1 are now -0.7, -0.48 and-0.53 for outcomes in math, reading and writing respectively with each coefficient is significant at theone percent level and more economically meaningful than the values observed in table 4.6. Furthermore,columns 2–4 show that most of the estimated negative effects are in the performance of the remainingstudents with baseline scores of 3 or lower.Finally, table 4.6 shows large negative effects on the remaining students from top-performing stu-dents leaving for the LFI program. A 10pp increase in the percentage of LFI leavers with perfect scoresleads to decreases in achievement of 0.15σ , 0.13σ and 0.06σ for math, reading and writing, respec-tively; although, only the coefficients for math and reading are signifiant at standard levels. Consistentwith table 4.6, we continue to see that the set of remaining students most affected by these leavers variesby subject. For math, it is the top-performing students who suffer the biggest decline (γ3 =−3.01 ), forreading, it is the middle-performing students (γ3 =−1.7) and for writing is the low-performing students(γ3 =−1.9).TSRI Results – Heterogeneity in British ColumbiaTable 4.10 shows the corresponding heterogeneity from the province of British Columbia. First, wesee large positive effects on the remaining students from low-performing students leaving for the LFIprogram. The coefficient on %LFIQ1 in column 1 when the dependent variable is achievement in math,reading and writing is 1.14, 1.02 and 1.9, respectively, with only the latter significant at the five percentlevel. Looking at columns 2–4, we see that the effect is strongest on the remaining students whoseperformance is in the second and third tercile. Although, for math and reading outcomes, none of theestimated coefficients is significant at conventional levels.Next, I examine the impact of students in the middle tercile leaving for LFI. Here, the coefficientsvary substantially by subject. For math outcomes, the coefficient on %LFIQ2 is small (γ2 =0.14), butalso highly imprecisely estimated. For outcomes in reading, the coefficient is negative (γ2 =-1.03) and1084.5 Results and Discussioninsignificant while for outcomes in writing, the coefficient is positive (γ2 =1.35) and significant at thefive percent level. Thus, unlike in Ontario, here we do not see consistent results with regards to theoverall impact of average performing leavers. In columns 2–4, we see that for math outcomes none ofthe remaining coefficients is significant at standard levels; though the signs do indicate negative effectson remaining students with low baseline achievement and positive effects on remaining students withaverage or high baseline achievement. This is in contrast to the results from Ontario where we saw thataverage performing leavers have negative and significant effects on both low and average-performingremaining students. I discuss these and other differences in more detail in the discussion section below.For reading outcomes, we see large, negative and signifiant effects on the remaining students fromaverage-performing students leaving for LFI. Interestingly, these effects appear to be concentrated onthe remaining students in the bottom and top terciles, with coefficients of -1.95 and -3.45, respectively,both significant at standard levels. The coefficient on %LFIQ2 when the sample is limited to average-performing remaining students is actually positive (γ2 = 0.79), but the large standard error (1.06) makesit difficult to make any precise claims. For outcomes in writing, we see large, positive and significanteffects from average-performing LFI leavers on the achievement of the remaining students in both themiddle and top terciles with coefficients of 1.81 and 2.28, respectively.Finally, for high-performing leavers, we see that for each subject, the effect on the remaining stu-dents is negative and both economically and statistically significant. The coefficient on %LFIQ3 incolumn 1 when the dependent variable is achievement in math, reading and writing is -1.93, -1.03 and-0.92, respectively, with each significant at at least the five percent level. Columns 2–4 also show thatthis negative effect is present on almost all remaining students. For math outcomes, the largest impactis on students in the middle and top terciles. For outcomes in reading and writing, the largest impact ison students in the bottom and top terciles.Discussion of the TSRI ResultsThe results shown in table 4.8 tell a story similar to the one discussed above looking only at withinschool variation, albeit with larger magnitudes. Overall, I find that the effect on the remaining studentsfrom a 10pp increase in the fraction of their peers who leave for the LFI program is a decrease in studentachievement of 0.04σ - 0.06σ in Ontario and a decrease in achievement of 0.04σ– 0.12σ in BC. Whilethe coefficients in BC are larger in magnitude, recall that the average fraction of children leaving for LFIin the BC estimation sample is approximately half that seen in the Ontario estimation sample. Usingthe average LFI percentages and the preferred specification in columns 2 and 6, I find that the impactof going from no LFI leavers to the average LFI leaving rates decreases student achievement in math,reading and writing by 0.04σ , 0.03σ and 0.03σ respectively in Ontario and 0.04σ , 0.03σ and 0.01σ ,respectively in BC. These results imply that children in schools where many of their fellow students haveleft for LFI experience significant and meaningful declines in achievement; however, at the same time,for a majority of students (where the percentage of children who leave for LFI is below the jurisdictionaverage) the total impact of the LFI leavers on the remaining students is quite small.1094.5 Results and DiscussionFurther analysis shows that the peer effects seen in table 4.8 are the net result of several offsettingeffects. In particular, I find evidence of substantial heterogeneity in terms of both the quality of theleavers and their impact on the remaining students. The results of these analyses are mostly consistentwith the notion of monotonicity. Low-performing students leaving for LFI have large positive effectson the remaining students while high-performing students leaving for LFI have large negative effectson the remaining students. Thus, these results suggest that so long as children are drawn from all levelsof the ability distribution, the average impact on the remaining students is likely to be small because ofoffsetting effects.Looking at the heterogeneity results in tables 4.9 and 4.10, what explains some of the large differ-ences in the estimated coefficients across the two jurisdictions? For example, in the case of averageperforming leavers, we see large negative effects in Ontario, but large positive effects for math andwriting outcomes in BC. One possible explanation of this difference relates to how a child’s perfor-mance is mapped to achievement bins in the two jurisdictions. In Ontario, the high performing group— those with perfect scores of 4 — only make up 7–12% of the total fraction of students; most stu-dents have a score of 3 (the “average” performers). Thus, it is likely that within the average group,there are many students who are in the top 15–25% of their grade. In contrast, in the BC data, eachof these students would be in the top performing bin. Thus, to the extent that these high-performingleavers have a negative effect on the remaining students, particularly in math and writing, it potentiallyexplains the differences across the two jurisdictions.140 These negative effects would seen through theaverage-performing leavers in Ontario and through the top performing-leavers in BC. Similar reasoningcould also explain why we see smaller effects from low-performing leavers in BC. “Low performers” inOntario make up only 20% of the population as opposed to 33% in BC. Given that the low-performingstudents in Ontario are likely drawn from a lower end of the ability distribution, having them leave theprogram could imply a greater impact on their peers.4.5.5 Robustness ChecksOne potential issue with the above methodology is that while I use all baseline test scores to predictLFI enrolment, the main estimating equation includes only as a control the baseline achievement in thesame subject as the outcome of interest. This is done because we are interested in the impact on futureachievement conditional on baseline achievement from a loss of peers due to LFI enrolment. But, atthe same time, the remaining achievement scores are effectively being treated as excluded variables.In order to see how much this is driving the estimated results, I re-estimate equations (4.1) and (4.2)with all grade 3 or 4 test scores included as controls. The results from these specifications are shownin Appendix tables D.7–D.9. In the case of Ontario, the estimated coefficients are both qualitativelyand quantitively similar to those seen in tables 4.8 and 4.9. In the case of BC, the results for math and140Conversely, the same argument could also be applied with children in the bottom quartile being contained in the averageperforming group as well. However, because of the skewness of the distribution of test scores (for example, in the case ofreading scores, 7% of students with scores of 4 versus 33% with scores 2 or lower), this issue is more relevant at the top endof the distribution.1104.5 Results and Discussionreading are similar to those seen in tables 4.8 and 4.9. For writing, we see much larger negative effectsin Appendix table D.7 than those seen in table 4.8. Comparing Appendix table D.9 with table 4.10,we see that what is driving this result is the fact that we no longer observe any positive effects fromlow-performing leavers. However, once again, each of the writing results in BC is estimated with lowlevels of precision, making it difficult to accurately compare coefficients across the two models.Two issues in the Ontario data that are not present in the BC data are the coarseness of the test scoresand the apparent greater measurement error with regards to identifying LFI schools. In this section, Iexamine the extent to which these phenomena are influencing the results. First, I address the coarsenessof the test scores by re-running the second stage equation (4.4) as an ordered probit (with scores of 0 and1 grouped together since the percentage of children with scores of 0 is extremely small) and examinewhether we still see significant effects for the percentage of LFI leavers. The estimates of this orderedprobit specification are seen in column 1 of appendix table D.10. Here, we continue to see large andnegative effects of LFI leavers on the test scores of the remaining children. In order to interpret thesecoefficients, I calculate the marginal effect at each outcome level. These marginal effects are shown incolumns 2–4.141 From column 2, we see that the marginal effect of achieving a math score of 2 is apositive and significant 0.16. The interpretation of this coefficient is that a 10pp increase in the fractionof LFI leavers increases a student’s probability of achieving a grade 6 math test score of 2 by 0.016pp.This is relative to a mean of 0.3. In contrast, in columns 3 and 4 we see that a 10pp increase in thefraction of LFI leavers decreases a student’s probability of achieving a score of 3 or 4 on the math testin grade 6 by 0.01pp and 0.015pp, respectively (relative to means of 0.5 and 0.13). Next, looking atreading outcomes, we see that a 10pp increase in the fraction of LFI leavers changes the probability ofachieving a test score of 2, 3, and 4 by 0.012pp, -0.005pp, and -0.01pp, respectively. Similar results areseen also when examining the impact of student achievement in writing. All of the results for readingand writing are also significant at the one percent level. As a final exercise, I compare the ordered probitresults with the baseline results using a second stage OLS regression. In order to do this, I calculatethe average predicted change in the standardized test scores using the marginal effects and the averagestandardized test score in each bin.142 The results are shown in columns 5 and 6 of appendix table D.10.Comparing the results in the two columns, we see that the implied test score change in the ordered probitregression is nearly identical to what is estimated using a second stage OLS estimation. Therefore, itdoes not appear that treating the Ontario test scores as continuous leads to any bias in the results.Next, I examine the robustness of the results in tables 4.8 and 4.9 by estimating the results using thealternative sample discussed above with the more conservative definition of MFI enrolment.143 Before Idiscuss the TSRI results, one concern might be that with less variation in the number of new or lost MFIschools, there will not be a significant effect of distance when estimating equation (4.3), which in turn141All marginal effects must sum to 0 and thus the excluded marginal effect is the impact of a student achieving a score of 1or lower.142Formally, let Y¯ j denote the average standardized test score for children with level score of j, j = 1,2,3,4 and let my j bethe marginal effect for outcome y of score j. Then the expected test score change for outcome y is given by ∑4j=1 my jY¯ j.143Recall that in this alternative sample, a school is only designated an MFI school if that school is observed to have at least5 MFI students in two or more different years.1114.5 Results and Discussionbrings up issues related to identification. However, similar to what we saw in table 4.4, this does notend up being the case. Estimating equation (4.3) using the alternative sample implies a marginal effecton Dst of 0.11 along with a t-statistic of 6.3.The TSRI results under this alternative estimation sample are shown in appendix table D.11. Theresults show that the impact of LFI leavers is qualitatively similar to the estimates seen in table 4.8.For example, in the preferred specification in column 2 of appendix table D.11, a 10pp increase in thefraction of LFI leavers causes a decrease in achievement of 0.08σ , 0.05σ and 0.04σ in math, reading andwriting respectively (compared to 0.06σ , 0.04σ and 0.04σ in table 4.8). Similar results are also seen inthe remaining columns as well. Next, appendix table D.12 examines the robustness of the heterogeneityresults in table 4.9 to this alternative estimation sample. Once again, the results are both qualitativelyand quantitatively very similar across the two tables. We continue to observe large positive effects fromlow-performing students leaving for LFI — an effect that is greatest on the remaining children with thehighest baseline performance. Similarly, we also continue to observe that high-performing leavers havenegative effects on the remaining students. In summary, re-estimating the TSRI results using the moreconservative definition did not materially affect any of the observed results. Thus, it is not the case thatthe schools (and students within) which offered the MFI program for one year only are driving the TSRIresults seen in tables 4.8 and 4.9 above.4.5.6 Using LFI to Examine Overall Peer EffectsThe results in section 4.5.4 above can be viewed as “reduced form” results for changes in peer char-acteristics. In this section, I expand on this idea by estimating equations (4.6) and (4.7), which usethe percentage of children who leave for LFI as instruments for overall changes in peer compositionbetween the two test score grades. Once again, I start off by discussing the results from Ontario and BCseparately with an overall discussion in the final subsection.Results from OntarioThe results of equations (4.6) and (4.7) for Ontario are shown in table 4.11. The main independent vari-ables of interest are the changes between grades 3 and 6 in the fraction of average and high performingpeers. The excluded category is the change in the percentage of peers in the bottom performing bin;therefore, the coefficients should be interpreted as the impact of replacing peers in the lowest perform-ing bin with children in either the middle or highest performing levels. For each subject, table 4.11also includes four different values. The first, labelled H0 : θ2 = θ3, is the p-value of the test of equalitybetween these two coefficients. This tests the impact of replacing average-performing students withhigh-performing students. The second, labelled H0 : θ2 = (θ3−θ2), tests whether the effect on achieve-ment of replacing low-performing peers with average performing peers is the same as replacing averageperforming peers with high-performing peers. However, note that a failure to reject this hypothesis doesnot necessarily imply a rejection of non-linear peer effects because of the skewness of the achievementbins. Table 4.11 also includes the F-statistics from each of the two first-stage regressions of the change1124.5 Results and Discussionin the fraction of peers onto the predicted fraction of children leaving for LFI in each (grade 3) per-formance level. The F-statistic is the joint test of significance on each of the three instruments. Recallthat both the first and second stage regressions include as a control the difference between the actualand predicted fractions of LFI leavers in each of the three achievement groups. These residuals are nottreated as instruments and are therefore not included in the F-statistics. From table 4.11, we can see thatin every specification, the joint F-statistics are very large (over 20) indicating that students leaving forthe LFI program are strong instruments for overall changes in peer composition.First, looking at math outcomes, the coefficients in column 1 of table 4.11 — 0.73 and 1.25 — implythat replacing 10pp of a students’ lowest performing peers with average and high-performing peers leadsto increases in math achievement of 0.07σ and 0.12σ , respectively. For reading, the correspondingimpacts are 0.05σ and 0.08σ , while for writing outcomes the impacts are 0.05σ and 0.06σ . Each ofthese coefficients is significant at the one or five percent levels and is economically meaningful. For eachsubject, we also find positive impacts from replacing 10pp of average-performing students with high-performing students; although, the magnitudes of the differences are smaller, ranging from 0.006σ forwriting outcomes to 0.05σ for math outcomes. The impact of going from average to high-performingpeers is significant at the one percent level in the case of math outcomes and at the five percent levelin the case of reading outcomes. Finally, comparing the difference between going from low to averagepeers and average to high peers, we see that while the former is larger in every subject, it is onlysignificantly larger in the case of writing outcomes at the ten percent level.Columns 2–4 of table 4.11 examine which children are most affected by the change in peer com-position. As in the previous tables, here students are split based on their grade 3 test scores. Column2 examines the impact of changes in peer quality on students with low baseline levels of achievement.For math outcomes, the two coefficients are 0.73 and 0.94 for middle and high achieving peers respec-tively. Both of these coefficients are substantive and statistically significant at the one percent level,but are not significantly different from each other. Thus, what appears to matter most is the removal oflow-performing peers. The results are qualitatively similar for outcomes in reading, with coefficientsof 0.58 and 0.80 respectively. Once again, these coefficients are significantly different from zero at theone percent level, but their difference in not significant at standard levels. In writing, the coefficientsare 0.32 and 1.13, and their difference is statistically significant at the ten percent level. Thus, the writ-ing outcomes of students with low baseline levels of achievement appear to gain the most when low oraverage performing peers are replaced with high-performing peers.The results in column 3 are qualitatively very similar to those seen in column 1. The one exception isthat we find no significant effect of replacing low-performing students with high-performing students onthe writing outcomes of the children with average baseline levels of achievement. Finally, for studentswith high levels of baseline achievement in column 4, the results show that math outcomes improveas the quality of peers improve with coefficients on the change in the percentage of average and high-performing peers of 0.65 and 1.7, respectively, both of which — along with their difference — aresignificant at the one percent level. In contrast, for outcomes in reading and writing, once again mostof the gains come from replacing low-performing peers only. In fact, the estimated effect of replacing1134.5 Results and Discussionaverage peers with high-performing peers is negative. However, the large standard errors on thesecoefficients make it difficult to make any further claims with much certainty.Results from BCThe BC results shown in table 4.12 are qualitatively very similar to the results from Ontario, thoughwith smaller magnitudes and also relatively weaker instruments in the first stage, particularly in the setof regressions with achievement in writing as the dependent variable. In column 1, the coefficients onthe change in the percentage of average-performing peers are 0.27, 0.39 and 0.26 for math, readingand writing, respectively. Thus, replacing 10pp of low-performing peers with average performing peersleads to gains in achievement of 0.03σ in math (significant at the five percent level), 0.04σ in reading(significant at the five percent level) and 0.03 in writing (not significant at standard levels). Similarly,replacing 10pp of low-performing students with high-performing peers leads to even greater gains of0.08σ in math, 0.06σ in reading and 0.07 in writing. Each of these coefficients is significant at theone percent level. Finally, column 1 also shows that there are positive and significant gains replacingaverage-performing students with high-performing students for outcomes in math and writing, but notin reading.Columns 2–4 show how the above results vary by a student’s own baseline performance in grade3. For math outcomes, we see that all of the gain to students with a baseline level of achievement inthe bottom tercile comes from replacing their low-performing peers; there is no gain from replacingaverage-performing peers with high-performing peers. In contrast, for students with average and highbaseline levels of achievement (columns 3 and 4), there are larger gains from replacing average peerswith high-performing peers (coefficients of 0.82 and 0.53, respectively) than there are from replacinglow-performing peers with average performing peers (coefficients of 0.33 and 0.16, respectively). How-ever, neither of these differences are significantly different from each other at conventional levels. Foroutcomes in reading, most gains from better peers comes from replacing low-performing peers witheither average or high-performing peers. This is true for students with either a low or high baseline levelof achievement. For students with an average baseline level of achievement, we see that replacing low-performing peers with average peers leads to no gain in achievement, while replacing low-performingpeers with high-performing peers does have a positive, but insignificant effect. Finally, for writing out-comes, table 4.12 shows that all of the positive peer effects comes from having high-performing peers.For all students, replacing low-performing peers with average performing peers leads to small and in-significant changes in writing achievement. In contrast, replacing 10pp of low-performing peers withhigh-performing peers leads to increases in achievement between 0.04σ and 0.06σ , with significanteffects seen for students with either a low or high baseline level of achievement.Discussion of ResultsOverall, these results show that there exists large peer effects from replacing a low performing peer witheither an average or higher performing student. In every specification in column 1 in both tables 4.111144.6 Conclusionand 4.12, at least one of the coefficients on the change in the percentage of average peers or changein the percentage of high-performing peers is significant. These results are also consistent with thosefound in the previous literature. That low-performing students negatively impact their peers is alsofound in Billings et al. (2014), Imberman et al. (2012), Lavy et al. (2011) and Hoxby and Weingarth(2005). The main contribution of this paper is that I am able to show these results as a consequence ofchanges in the current peer composition from the introduction of a school-choice program. This is incontrast to the above papers which tend to focus on the external introduction of peers from exogenousshocks or events. With regards to replacing average peers with high-performing peers, while we dosee differences in the two jurisdictions, most of these differences are likely the result of how childrenare assigned to different groups based on performance. For example, in Ontario, replacing peers ofaverage quality with high-performing peers led to smaller gains than replacing low-performing peerswith average performing peers. In contrast, in BC we see that for both math and writing, replacingaverage peers with high-performing peers leads to gains that are larger than the gains when going fromlow-performing peers to average peers; although, these differences are not always statistically significantdue to low levels of precision. As discussed above, students labelled “average” in Ontario likely havea much higher underlying ability than children labelled “average” in BC. The transition from averageto high-performing peers in Ontario might be a much smaller jump in underlying ability than it is inBC, and thus we would expect a smaller gain.144 Similarly, children labelled “low” performers likelyhas a lower underlying ability in Ontario, this potentially explains the larger in magnitude coefficientsobserved in the Ontario results compared to those in BC.4.6 ConclusionIn this paper, I used administrative data from two separate jurisdictions to examine the impact on aschool cohort when a fraction of children leaves to enrol in an alternative choice program known asmiddle or late immersion. I find that, in Ontario, a 10pp increase in the fraction of leavers leads to astatistically significant decrease in achievement between 0.04σ and 0.06σ across each of the three sub-jects of math, reading and writing. In BC, I find larger overall effects for math and reading, with a 10ppincrease in the fraction of leavers causing a decrease in test scores of 0.12σ and 0.10σ , respectively;however, the average LFI rate in the estimation sample in BC is approximately half the rate in Ontario.Thus, I find that going from no LFI peers to the jurisdiction average has a similar effect in both Ontarioand BC of a decrease in test scores of 0.01σ–0.04σ . In both Ontario and BC, I find that these overall ef-fects mask considerable heterogeneity in the impact that leaving students have on the remaining cohort.Removing low-achieving peers leads to increases in test scores while removing high-achieving peersleads to decreases in test scores. Finally, I used the LFI leavers to examine peer effects more generally.144Furthermore, these results — particularly for Ontario — are also consistent with findings of previous papers which useadmission cutoffs to identify peer effects. For example, both Abdulkadirog˘lu et al. (2014) and Dobbie and Fryer (2014) failto find any significant differences in student outcomes between students just above and below admission cutoffs to elite examschools. Being admitted into an elite school implies a shift in peers from average to high performers while being at the marginat an elite school admission cutoff likely implies that said student is in top performing group as well.1154.6 ConclusionNot surprisingly given the previous results, I find that changes in peer composition has large effectson one’s own achievement. Replacing low-performing peers with average peers and replacing averagepeers with high-performing peers both lead to gains in achievement that are both statistically significantand economically meaningful. In total, replacing 10pp of low-performing peers with high-performingpeers led to test score gains of 0.07σ–0.13σ in Ontario and 0.06σ–0.08σ in BC.A big limitation of this paper is the inability to say much about the mechanisms through which theseeffects operate. Is the reason that removing low-performing students from a cohort leads to an increasein test scores because low-performing students are also likely to be disruptive in class (e.g. Carrell andHoekstra, 2010) or is it because children are now more likely to have peers that can help them out andthat they can learn from? Similar questions can be asked for the removal of high-performing studentsas well. For example, is it that removing high-performing students from a school cohort increases thelikelihood of having a disruptive peer or is that high-performing peers do exert some positive externalityon the remaining students? From the final section in the paper, we were able to see some gains evenwhen average-performing students are replaced with high-performing peers. This suggests that at leastpart of the effect does come from having higher performing peers, but even then the mechanism is stillnot truly known. Understanding how it is that peers are affecting one’s own achievement is an importantand necessary step needed before any sort of policy recommendation can be made with regards to theoptimal way students should be assigned to classrooms or programs.1164.7 Tables4.7 TablesTable 4.1: Summary Statistics: School Level AveragesPanel A: OntarioWhole Sample Regression SampleMean Std Dev Mean Std DevGrade 3 Size 37.61 20.76 35.73 18.25% Leaving for LFI 0.05 0.13 0.10 0.16No LFI Leavers 0.70 0.46 0.47 0.500 < %LFI < 10 0.15 0.36 0.27 0.4410 < %LFI < 25 0.07 0.25 0.12 0.3225 < %LFI < 50 0.04 0.21 0.08 0.2750 < %LFI 0.03 0.17 0.05 0.22School-years 7001 3952Panel B: British ColumbiaWhole Sample Regression SampleMean Std Dev Mean Std DevGrade 4 Size 34.49 16.89 34.83 16.71% Leaving for LFI 0.04 0.08 0.05 0.09No LFI Leavers 0.61 0.49 0.53 0.50 < %LFI < 10 0.29 0.45 0.35 0.4810 < %LFI < 25 0.07 0.25 0.08 0.2725 < %LFI < 50 0.02 0.15 0.03 0.1750 < %LFI 0.01 0.09 0.01 0.1School-years 3343 2724Notes: All values are based on observations at the school-by-year level ingrade 3 (Ontario) or grade 4 (BC). "Whole Sample" consists of all chil-dren and schools left over after all sample cuts are made (see text for de-tails). "Regression Sample" further excludes all children in schools whichare never observed to have a student subsequently enrol in the LFI program.1174.7 TablesTable 4.2: Distribution of Achievement of LFI StudentsPanel A: OntarioWhole Sample Regression SampleMean Std Dev Mean Std Dev%LFI Math Level ≤ 2 0.01 0.03 0.01 0.03%LFI Math Level = 3 0.04 0.10 0.07 0.12%LFI Math Level = 4 0.01 0.04 0.02 0.05%LFI Read Level ≤ 2 0.01 0.03 0.01 0.04%LFI Read Level = 3 0.04 0.10 0.07 0.13%LFI Read Level = 4 0.01 0.03 0.01 0.03%LFI Write Level ≤ 2 0.01 0.03 0.01 0.03%LFI Write Level = 3 0.04 0.11 0.08 0.14%LFI Write Level = 4 0.01 0.02 0.01 0.03School-years 7001 3952Panel B: British ColumbiaWhole Sample Regression SampleMean Std Dev Mean Std Dev%LFI Math Q1 0.01 0.03 0.01 0.03%LFI Math Q2 0.01 0.04 0.02 0.04%LFI Math Q3 0.02 0.05 0.02 0.05%LFI Read Q1 0.01 0.02 0.01 0.02%LFI Read Q2 0.01 0.04 0.02 0.04%LFI Read Q3 0.02 0.03 0.02 0.05%LFI Write Q1 0.01 0.03 0.01 0.04%LFI Write Q2 0.01 0.04 0.02 0.05%LFI Write Q3 0.02 0.04 0.02 0.05School-years 3343 2724Notes: All values are based on observations at the school-year level in grade 3(Ontario) or grade 4 (BC). "Whole Sample" consists of all children and schoolsleft over after all sample cuts are made (see text for details). "Regression Sam-ple" further excludes all children in schools which are never observed to havea student subsequently enrol in the LFI program. Test score levels in Ontariorefer to overall scores on a scale from 0-4. In BC, “Q1", “Q2" and “Q3" refer tothe lowest tercile, middle tercile and highest tercile of test scores, respectivelyat the year-grade level. The one exception is the writing scores which are notgrouped by year.1184.7 TablesTable 4.3: Within School Variation of LFI LeaversOntario British ColumbiaAll Bins 2 Bins 1 Bin All Terciles 2 Terciles 1 TercileMath 0.26 0.53 0.21 0.27 0.46 0.27Reading 0.29 0.48 0.24 0.26 0.47 0.27Writing 0.21 0.49 0.30 0.14 0.56 0.30Notes: This table caclulates the percentage of LFI leavers in a given school-year in differentcategories grouped by achievement. All values are based on observations at the school-yearlevel in grade 3 (Ontario) or grade 4 (BC). All test scores in each jurdisdiction have beensplit into the same three groups as defined in table 4.2. The sample is limited to grade 3 or 4school-year level observations with at least two students who leave for the LFI program.Table 4.4: Marginal Effects from a Logistic Regression of Late Immersion EnrolmentOntario British Columbia(1) (2) (3) (4)Female0.026∗∗∗ 0.026∗∗∗ 0.032∗∗∗ 0.032∗∗∗ 0.017∗∗∗ 0.016∗∗∗[0.002] [0.002] [0.003] [0.002] [0.002] [0.002]Exceptional /Spec Ed-0.002 0.013∗∗∗ -0.008 0.015∗∗∗ -0.038∗∗∗ -0.037∗∗∗[0.007] [0.004] [0.009] [0.005] [0.008] [0.006]ESL-0.131∗∗∗ -0.103∗∗∗ -0.164∗∗∗ -0.129∗∗∗ -0.009 -0.007∗∗[0.008] [0.005] [0.010] [0.006] [0.008] [0.003]Math Score0.020∗∗∗ 0.022∗∗∗ 0.026∗∗∗ 0.028∗∗∗ 0.007∗∗∗ 0.008∗∗∗[0.002] [0.001] [0.003] [0.001] [0.001] [0.001]Reading Score0.027∗∗∗ 0.025∗∗∗ 0.033∗∗∗ 0.032∗∗∗ 0.014∗∗∗ 0.013∗∗∗[0.002] [0.001] [0.002] [0.001] [0.002] [0.001]Writing Score0.017∗∗∗ 0.019∗∗∗ 0.022∗∗∗ 0.023∗∗∗ 0.002 0.005∗∗∗[0.002] [0.001] [0.002] [0.001] [0.001] [0.001]LFI < 2km0.190∗∗∗ 0.146∗∗∗ 0.208∗∗∗ 0.111∗∗∗ 0.050∗∗∗ 0.024∗∗∗[0.012] [0.011] [0.015] [0.018] [0.010] [0.006]Observations 130264 130264 94846 94846 75395 75395School FE No Yes No Yes No YesNotes: Standard errors clustered at the school level are in brackets. All results are the displayed marginal effectsfrom a logistic regression of LFI (MFI in Ontario) enrolment onto the variables above as well as additionalcontrols and year fixed effects. The estimation sample excludes schools never observed to have at least 1 LFIstudent. Columns 1 and 2 are estimated using the main estimation sample in Ontario in which schools are labelledLFI only if a school has at least 5 LFI students (if not, then the students enrolled in these schools are also no longerlabelled LFI). Columns 3 and 4 are estimated on the sample constructed using the more conservative definiton ofLFI enrolment in which a school is labelled an LFI school only if it is observed to have at least 5 LFI studentsin at least two different years. Additional controls included in the regressions from Ontario are whether a childis labelled gifted in grade 3 and the number of children in the grade 3 school cohort. Additional controls in theregressions from British Columbia include an indicator for children labelled as gifted, whether a child speaksEnglish at home, number of children in the grade 4 school cohort and census characteristics at the DisseminationArea level based on the postalcode of the child’s grade 4 address. These variables are the unemployment rate,% with bachelor degree or higher, log(real average household income), log(real home value) and % home-owner.Math, Reading and Writing scores are standardized test scores with mean 0 and variance 1 (standardized at thejurisdiction-year-grade-subject level). All variables take on the values observed while the child was in grade 3(Ontario) or 4 (BC). ∗ p < 0.1, ∗∗ p < 0.05, ∗∗∗ p < 0.01.1194.7TablesTable 4.5: Impact of LFI Leavers on Student AchievementOntario British ColumbiaSubject Coeff (1) (2) (3) (4) (5) (6) (7) (8)Math %LFI-0.16∗∗ -0.24∗∗∗ -0.10 -0.14 -0.15 -0.18 -0.22 -0.45∗∗[0.07] [0.09] [0.12] [0.14] [0.12] [0.13] [0.14] [0.19]Reading %LFI-0.19∗∗∗ -0.20∗∗ -0.18 -0.13 -0.19∗∗ -0.23∗∗∗ -0.29∗∗∗ -0.48∗∗∗[0.06] [0.08] [0.11] [0.12] [0.10] [0.09] [0.09] [0.14]Writing %LFI-0.22∗∗∗ -0.23∗∗∗ -0.09 -0.06 -0.20 -0.25∗ -0.35∗∗ -0.28[0.06] [0.08] [0.11] [0.11] [0.13] [0.15] [0.16] [0.21]Observations 116655 70701 27238 27891 72013 56990 30394 23828Mean %LFI 0.07 0.07 0.16 0.17 0.03 0.03 0.05 0.07Same School No Yes Yes Yes No Yes Yes YesNotes: Standard errors clustered at the school level are in brackets. The estimation sample excludes all schools that are neverobserved to have at least 1 LFI student. "Same School" refers to a sample restriction limiting the regression to all chlildrenobserved in the same school for both grades 3 and 6 (Ontario) or grades 4 and 7 (BC). Columns 3 and 7 limit the sample to onlychildren observed in schools which had students leave for LFI in at least 3 different years. Columns 4 and 8 limit the sample toonly children observed in a given school if a student left for LFI in that school and in that year. Columns 1–4 include grade 3test score by year fixed effects while columns 5–8 include a third order polynomial in the grade 4 test score. Additional controlsfor the regressions run using the Ontario sample include gender, whether a child has an ESL designation, whether a child hasan exceptional (special needs) designation, whether a child is labelled gifted, and total number of students in the grade 3 cohort.Additional controls for the regressions run using the sample from British Columbia include gender, whether a child has an ESLdesignation, whether a child has special-ed designation, whether a child is labelled as gifted, whether a child speaks English athome, year fixed effects and census characteristics at the Dissemination Area level based on the postalcode of the child’s grade4 address. These variables are the unemployment rate, % with bachelor degree or higher, log(real average household income),log(real home value) and % home-owner. All regressions include school fixed effects. ∗ p < 0.1, ∗∗ p < 0.05, ∗∗∗ p < 0.01.1204.7 TablesTable 4.6: Heterogeneity of LFI Leavers – OntarioAll Students Level ≤ 2 Level = 3 Level = 4Subject Coeff (1) (2) (3) (4)Math%LFIQ11.05∗∗∗ 1.20∗∗∗ 0.80∗∗ 1.96∗∗[0.29] [0.32] [0.38] [0.96]%LFIQ2-0.31∗∗∗ -0.50∗∗∗ -0.28∗∗ 0.27[0.11] [0.15] [0.13] [0.25]%LFIQ3-0.71∗∗∗ 0.14 -0.82∗∗∗ -1.24∗∗∗[0.18] [0.29] [0.21] [0.39]Observations 70701 19882 42606 8213Reading%LFIQ10.24 0.28 -0.01 1.46[0.21] [0.28] [0.26] [1.09]%LFIQ2-0.17 -0.28∗ -0.12 0.13[0.11] [0.16] [0.11] [0.38]%LFIQ3-0.71∗∗∗ -0.72∗∗ -0.78∗∗∗ -0.57[0.18] [0.31] [0.23] [0.80]Observations 70701 25536 40453 4712Writing%LFIQ10.56∗∗ 0.29 0.52 2.51[0.23] [0.31] [0.32] [1.59]%LFIQ2-0.32∗∗∗ -0.30∗∗ -0.27∗∗∗ 0.01[0.08] [0.14] [0.10] [0.40]%LFIQ3-0.50∗ -0.79∗ -0.31 -1.56[0.29] [0.44] [0.33] [1.04]Observations 70701 22670 44501 3530Notes: Standard errors clustered at the grade 3 school level are in brackets. %LFIQ j representsthe fraction of children with scores in bin j as defined in the text. All regressions includegrade 3 achievement by year fixed effects, a full set of baseline controls (see notes in table4.5 for additional details) and school fixed effects. Each sample is limited to children who areenrolled in the same school in both grades 3 and 6. The sample is further limited to childrenenrolled in schools observed to have at least 1 student subsequently enrol in the LFI program.∗ p < 0.1, ∗∗ p < 0.05, ∗∗∗ p < 0.01.1214.7 TablesTable 4.7: Heterogeneity of LFI Leavers – British ColumbiaAll Students Q1 Q2 Q3Subject Coeff (1) (2) (3) (4)Math%LFIQ10.53∗ 0.59∗ 0.79 0.47[0.31] [0.32] [0.54] [0.60]%LFIQ2-0.29 -0.98∗∗∗ -0.19 0.27[0.23] [0.32] [0.36] [0.35]%LFIQ3-0.37∗∗ -0.18 -0.58∗∗ -0.24[0.19] [0.24] [0.27] [0.27]Observations 56990 17667 19427 19896Reading%LFIQ10.21 0.01 0.13 0.45[0.33] [0.45] [0.48] [0.43]%LFIQ2-0.14 -0.06 0.06 -0.37[0.22] [0.31] [0.33] [0.35]%LFIQ3-0.44∗∗∗ -0.27 -0.50∗ -0.44∗∗[0.16] [0.26] [0.27] [0.22]Observations 56990 18119 19427 19444Writing%LFIQ10.14 0.09 0.07 0.04[0.28] [0.30] [0.53] [0.49]%LFIQ2-0.10 0.50 -0.29 0.35[0.19] [0.35] [0.25] [0.43]%LFIQ3-0.55∗∗ -0.92∗∗ -0.19 -0.58[0.25] [0.39] [0.31] [0.37]Observations 56990 16102 23314 17574Notes: Standard errors clustered at the grade 4 school level are in brackets. “Q1",“Q2" and “Q3" refer to the lowest tercile, middle tercile and highest tercile of testscores, respectively at the year-grade level. The one exception is the writing scoreswhich are not grouped by year. All regressions include a third order polynomial ingrade 4 test scores, a full set of baseline controls (see notes in table 4.5 for additionaldetails), year fixed effects and grade 4 school fixed effects. Each sample is limitedto children who are enrolled in the same school in both grades 4 and 7. The sampleis further limited to children enrolled in schools observed to have at least 1 studentsubsequently enrol in the LFI program. ∗ p < 0.1, ∗∗ p < 0.05, ∗∗∗ p < 0.01.1224.7TablesTable 4.8: TSRI – Impact of LFI Leavers on Student AchievementOntario British ColumbiaSubject Coeff (1) (2) (3) (4) (5) (6) (7) (8)Math %LFI-0.40∗∗∗ -0.58∗∗∗ -0.33∗ -0.50∗∗ -0.89∗ -1.24∗∗∗ -1.33∗∗∗ -1.16∗∗[0.10] [0.13] [0.19] [0.21] [0.49] [0.48] [0.48] [0.52]Reading %LFI-0.47∗∗∗ -0.43∗∗∗ -0.36∗∗ -0.31∗ -0.58 -1.02∗∗ -1.04∗∗ -1.03∗∗[0.09] [0.12] [0.17] [0.17] [0.38] [0.41] [0.41] [0.45]Writing %LFI-0.38∗∗∗ -0.44∗∗∗ -0.20 -0.20 0.22 -0.37 -0.52 -0.63[0.09] 2[0.12] [0.16] [0.16] [0.37] [0.48] [0.47] [0.51]Observations 116655 70701 27238 27891 72013 56990 30394 23828Mean %LFI 0.07 0.07 0.16 0.17 0.03 0.03 0.05 0.07Notes: Standard errors clustered at the school level are in brackets. The estimation sample excludes all schools that are neverobserved to have at least 1 LFI student. Columns 1–4 include grade 3 test score by year fixed effects while columns 5–8 includea third order polynomial in the grade 4 test score. All regressions also include a complete set of baseline characteristics ascontrols (see notes in table 4.5 for details) and an additional control equal to the difference between the actual percentage of acohort leaving to enter LFI and the predicted percentage. This predicted percentage comes from a logistic regression of studentLFI enrolment onto a function of grade 3 or 4 achievement (either grade 3 test score by year fixed effects in all subjects or athird order polynomial in grade 4 achievement for all subjects and with time-varying coeeficients), baseline characteristics, anindicator for whether the child’s current school is within 2km of an LFI school as well as school and year fixed effects. Seenotes in table 4.5 for additional details. ∗ p < 0.1, ∗∗ p < 0.05, ∗∗∗ p < 0.01.1234.7 TablesTable 4.9: TSRI – Heterogeneity of LFI Leavers – OntarioAll Students Level ≤ 2 Level = 3 Level = 4Subject Coeff (1) (2) (3) (4)Math%LFIQ11.43∗∗∗ 1.13∗∗ 1.24∗∗ 3.60∗∗∗[0.42] [0.45] [0.55] [1.20]%LFIQ2-0.70∗∗∗ -0.83∗∗∗ -0.63∗∗∗ -0.09[0.18] [0.24] [0.21] [0.45]%LFIQ3-1.50∗∗∗ -0.98∗ -1.19∗∗∗ -3.01∗∗∗[0.29] [0.55] [0.35] [0.71]hline Observations 70701 19882 42606 8213Reading%LFIQ10.45 0.55 -0.01 3.76∗∗[0.33] [0.45] [0.37] [1.51]%LFIQ2-0.48∗∗∗ -0.61∗∗ -0.28 -1.15[0.16] [0.27] [0.17] [0.74]%LFIQ3-1.30∗∗∗ -1.11 -1.68∗∗∗ 0.69[0.40] [0.72] [0.55] [2.10]Observations 70701 25536 40453 4712Writing%LFIQ10.58∗ 0.15 0.43 4.82∗∗[0.32] [0.41] [0.42] [2.03]%LFIQ2-0.53∗∗∗ -0.38∗ -0.41∗∗∗ 0.45[0.13] [0.21] [0.16] [0.75]%LFIQ3-0.56 -1.92∗ -0.13 -1.04[0.67] [1.04] [0.70] [2.79]Observations 70701 22670 44501 3530Notes: Standard errors clustered at the school level are in brackets. Each sample is limitedto children who are enrolled in the same school in both grades 3 and 6 and excludes schoolswhich are never observed to have at least 1 student subsequently enrol in LFI. All regressionsinclude a complete set of baseline observable grade 3 characteristics (see notes in table 4.5for additional details), grade 3 achievement by year fixed effects, school fixed effects andadditional controls equal to the difference between the actual percentage of a cohort with agiven test score leaving to enter LFI and the predicted percentage. This predicted percentagecomes from a logistic regression of student LFI enrolment onto grade 3 test score by year fixedeffects in all subjects, baseline characteristics, an indicator for whether the child’s currentschool is within 2km of an LFI school and school fixed effects. ∗ p < 0.1, ∗∗ p < 0.05, ∗∗∗p < 0.01.1244.7 TablesTable 4.10: TSRI – Heterogeneity of LFI Leavers – British ColumbiaAll Students Q1 Q2 Q3Subject Coeff (1) (2) (3) (4)Math%LFIQ11.14 1.12 1.80 1.45[0.91] [1.09] [1.53] [1.38]%LFIQ20.14 -1.52∗ 0.59 0.83[0.67] [0.91] [0.79] [0.93]%LFIQ3-1.93∗∗∗ -1.14∗ -2.61∗∗∗ -1.58∗∗∗[0.45] [0.65] [0.64] [0.56]Observations 56990 17667 19427 19896Reading%LFIQ11.02 -0.27 1.76 1.43[1.22] [1.73] [1.76] [1.59]%LFIQ2-1.03 -1.95∗∗ 0.79 -3.45∗∗∗[0.77] [0.93] [1.06] [1.24]%LFIQ3-1.03∗∗ -1.04 -0.46 -1.37∗∗[0.46] [0.73] [0.59] [0.55]Observations 56990 18119 19427 19444Writing%LFIQ11.90∗∗ 1.08 2.04∗∗ 2.55∗∗[0.89] [1.12] [1.03] [1.05]%LFIQ21.35∗∗ 0.78 1.81∗∗ 2.28∗∗[0.65] [1.05] [0.83] [0.90]%LFIQ3-0.92∗∗ -1.64∗∗ -0.01 -1.60∗∗[0.43] [0.64] [0.68] [0.71]Observations 56990 16102 23314 17574Notes: Standard errors clustered at the school level are in brackets. Each sample islimited to children who are enrolled in the same school in both grades 4 and 7 and ex-cludes schools which are never observed to have at least 1 student subsequently enrolin LFI. All regressions include a complete set of baseline observable grade 4 charac-teristics (see notes in table 4.5 for additional details), a third order polynomial in grade4 achievement, school fixed effects and additional controls equal to the difference be-tween the actual percentage of a cohort with a given test score leaving to enter LFI andthe predicted percentage. This predicted percentage comes from a logistic regressionof student LFI enrolment onto a third order polynomial in grade 4 achievement (forall subjects and with time-varying coefficients), baseline characteristics, an indicatorfor whether the child’s current school is within 2km of an LFI school as well as schooland year fixed effects. ∗ p < 0.1, ∗∗ p < 0.05, ∗∗∗ p < 0.01.1254.7 TablesTable 4.11: TSLS Estimates of Change in Peer Composition - OntarioAll Students Level ≤ 2 Level = 3 Level = 4Subject Coeff (1) (2) (3) (4)Math∆%YQ2(−i)stg0.73∗∗∗ 0.73∗∗∗ 0.59∗∗∗ 0.65∗∗[0.09] [0.14] [0.11] [0.28]∆%YQ3(−i)stg1.25∗∗∗ 0.94∗∗∗ 1.01∗∗∗ 1.69∗∗∗[0.11] [0.23] [0.15] [0.26]Observations 70701 19882 42606 8213H0: θ2 = θ3 0.00 0.39 0.00 0.00H0: θ2 = (θ3−θ2) 0.17 0.11 0.38 0.38F2 65.29 42.42 77.18 27.76F3 60.72 74.16 61.67 24.96Reading∆%YQ2(−i)stg0.49∗∗∗ 0.58∗∗∗ 0.25∗∗ 1.15∗∗[0.10] [0.18] [0.12] [0.46]∆%YQ3(−i)stg0.81∗∗∗ 0.80∗∗∗ 0.80∗∗∗ 0.38[0.14] [0.29] [0.21] [0.63]Observations 70701 25536 40453 4712H0: θ2 = θ3 0.03 0.52 0.00 0.34H0: θ2 = (θ3−θ2) 0.41 0.43 0.23 0.09F2 43.08 37.99 48.25 25.61F3 60.33 65.67 63.13 13.21Writing∆%YQ2(−i)stg0.53∗∗∗ 0.32∗∗ 0.39∗∗∗ 0.71[0.08] [0.13] [0.11] [0.51]∆%YQ3(−i)stg0.56∗∗ 1.13∗∗ 0.31 0.53[0.25] [0.46] [0.28] [0.86]Observations 70701 22670 44501 3530H0: θ2 = θ3 0.90 0.08 0.79 0.85H0: θ2 = (θ3−θ2) 0.08 0.33 0.15 0.49F2 68.64 65.94 54.17 17.66F3 51.52 50.53 47.46 12.96Notes: Standard errors clustered at the school level are in brackets. Each sample is limited to childrenwho are enrolled in the same school in both grades 3 and 6. The sample is further limited to childrenenrolled in grade 3 schools observed to have at least 1 student subsequently enrol in the LFI program.All regressions include a complete set of baseline observable characteristics (see notes in table 4.5for additional details), grade 3 achievement by year fixed effects, school fixed effects and additionalcontrols equal to the difference between the actual percentage of a cohort with a given test scoreleaving to enter LFI and the predicted percentage. This predicted percentage comes from a logisticregression of student LFI enrolment onto grade 3 test score by year fixed effects in all subjects,baseline characteristics, an indicator for whether the child’s current school is within 2km of an LFIschool as well as school and year fixed effects. The predicted percentages are also used as instrumentsfor the change in the percentage of peers at the given performance level between grades 3 and 6. “F2"represents the joint F-statistic on the three predicted LFI leaver variables in the first stage equationwith ∆%YQ2(−i)stg as the dependent variable. “F3" is similarly defined for ∆%YQ3(−i)stg. H0: θ2 = θ3and H0: θ2 = (θ3 − θ2) display the associated p-values from the given test of the coefficients. ∗p < 0.1, ∗∗ p < 0.05, ∗∗∗ p < 0.01.1264.7 TablesTable 4.12: TSLS Estimates of Change in Peer Composition – BCAll Students Q1 Q2 Q3Subject Coeff (1) (2) (3) (4)Math∆%YQ2(−i)stg0.27∗∗ 0.59∗∗∗ 0.30∗ 0.18[0.11] [0.18] [0.18] [0.22]∆%YQ3(−i)stg0.84∗∗∗ 0.54∗∗∗ 1.12∗∗∗ 0.71∗∗∗[0.08] [0.15] [0.15] [0.20]Observations 56990 17667 19427 19896H0: θ2 = θ3 0.00 0.78 0.00 0.00H0: θ2 = (θ3−θ2) 0.12 0.06 0.10 0.28F2 22.49 16.73 21.28 23.03F3 21.96 20.63 19.00 21.27Reading∆%YQ2(−i)stg0.39∗∗ 0.43∗∗ 0.07 0.76∗∗∗[0.17] [0.18] [0.26] [0.21]∆%YQ3(−i)stg0.61∗∗∗ 0.46∗∗ 0.40 0.74∗∗∗[0.15] [0.23] [0.25] [0.20]Observations 56990 18119 19427 19444H0: θ2 = θ3 0.19 0.88 0.16 0.91H0: θ2 = (θ3−θ2) 0.58 0.29 0.54 0.05F2 18.61 18.14 16.06 19.07F3 22.01 23.60 18.17 19.65Writing∆%YQ2(−i)stg0.26 -0.11 -0.02 -0.28[0.32] [0.38] [0.45] [0.45]∆%YQ3(−i)stg0.63∗∗∗ 0.61∗∗∗ 0.28 0.53∗∗∗[0.18] [0.16] [0.41] [0.20]Observations 56990 16102 23314 17574H0: θ2 = θ3 0.05 0.09 0.08 0.02H0: θ2 = (θ3−θ2) 0.82 0.29 0.56 0.15F2 7.35 5.71 12.01 6.16F3 26.76 19.78 17.94 16.71Notes: Standard errors clustered at the school level are in brackets. Each sample is limitedto children who are enrolled in the same school in both grades 4 and 7 and excludes schoolswhich are never observed to have at least 1 student subsequently enrol in LFI. All regressionsinclude a complete set of baseline observable grade 4 characteristics (see notes in table 4.5for additional details), a third order polynomial in grade 4 achievement, school fixed effectsand additional controls equal to the difference between the actual percentage of a cohortwith a given test score leaving to enter LFI and the predicted percentage. This predictedpercentage comes from a logistic regression of student LFI enrolment onto a third orderpolynomial in grade 4 achievement (for all subjects and with time-varying coefficients),baseline characteristics, an indicator for whether the child’s current school is within 2kmof an LFI school as well as school and year fixed effects. The predicted percentages areused as instruments for the change in the percentage of peers at the given performance levelbetween grades 4 and 7. “F2" represents the joint F-statistic on the three predicted LFIleaver variables in the first stage equation with ∆%YQ2(−i)stg as the dependent variable. “F3"is similarly defined for ∆%YQ3(−i)stg. H0: θ2 = θ3 and H0: θ2 = (θ3 − θ2) display theassociated p-values from the given test of the coefficients. ∗ p < 0.1, ∗∗ p < 0.05, ∗∗∗p < 0.01.127Chapter 5: ConclusionThis dissertation uses school-choice programs in Canada in order to examine broader research questionsin — and make several important contributions to — the economics of education literature. In chapter 2,I used the Early French Immersion program in Canada in order to examine the ability of parents to learnover time about the fit or match quality between their child and a given schooling option. In particular, Iestimate the impact that learning has on parents’ subsequent schooling choices and children’s academicachievement. I find that learning, or more accurately, new information parents receive after the initialenrolment decision has a large impact on parents’ decision to remove their child from the immersionprogram. This is especially true in earlier grades — with learning accounting for 70% of the variationin program attrition between grades K and 5. Learning allows parents to better sort their children intothe program where they can succeed academically; specifically, I find that learning causes an increase intest scores of 0.09σ by the start of secondary school. I also find interesting results simulating the modelunder a variety of counterfactual scenarios. First, I find evidence that parents are pulling their childout of the program too early. These results follow from the fact that providing parents with additionalinformation leads to declines in attrition and that how parents weight new information is much higherthan what would be implied by the data. Furthermore, many of the parents who now decide to remainin the program are those parents whose child’s performance would actually improve upon exiting theprogram. This leads to the interesting result that providing parents with more accurate informationwill not necessarily lead to higher levels of achievement. Second, I find that while making it easierfor parents to remove their child from FI leads to higher rates of attrition, the children who now leaveFI are, on average, at the margin of being better off academically out of the program. Therefore, thegains in achievement from making it easier to exit the program are very small (and in some casesnegative). Finally, I find that providing parents with information pre-enrolment leads to large changesin the composition of program enrolees, which in turn leads to large decreases in attrition and largeincreases in test scores. Thus, even with learning, there are still large differences between the parents’optimal sequence of choices ex-ante and the optimal sequence ex-post.In chapter 3, I estimate the causal impact that immersion language programs have on short andmedium run student outcomes by once again using the Early French Immersion program. In order toaddress the fact that entry and exit into the immersion program is non-random, I focus on estimatingthe causal impact of initial enrolment into the program and instrument for initial enrolment using therelative distance of the child’s home postal code to the nearest immersion school. A major criticismof using distance as an instrument is that parents choose where to live and that choosing to live nearan FI school might be correlated with other parental characteristics that affect student outcomes. In128Chapter 5: Conclusionorder to address this concern, I include a series of neighbourhood fixed effects in all my specifications.The main identifying assumption is now that parents are allowed to choose what neighbourhood theywant to live in, but where they end up in the neighbourhood is a function of more exogenous factorssuch as the available supply of housing. Turning to the results, I find that initial enrolment into theimmersion program leads to large and significant declines in student achievement in grade 4 in each ofmath, reading and writing. However, results also suggest that some of these negative effects wear off inlater grades. In grade 10, I find that initial FI enrolment has no impact on students’ English exam scoresand a negative and significant effect on exam scores in math (though smaller in magnitude than the effectseen in grade 4) and science. Furthermore, I find some part of these negative effects is being driven byparents who are enrolling their child in private school instead of the immersion program. Accountingfor private school enrolment reduces the estimated negative effects that immersion language programshave on student achievement. Finally, I find the impact of FI does not differ by the gender of the child.In chapter 4, I use the the Late (or Middle) French Immersion programs in order to estimate theeffect that students who leave a given school cohort have on the remaining students and also look atpeer effects more broadly. I am able to obtain a causal impact of the leaving students on the remainingstudents by using a two stage residual inclusion approach along with a school-fixed effects model. Thelogic behind the TSRI method is that we want to isolate exogenous attrition from a given school into theLFI program. I find that an increase in the percentage of students who leave for the LFI program hasa negative and significant effect on the students who remain. However, this result masks considerableheterogeneity in the impact of students leaving for LFI. I find that students who leave for LFI withlow levels of baseline achievement have a large positive impact on the remaining students. Conversely,students who leave for LFI with a high level of baseline achievement have a large negative impact on thestudents who remain. As a final exercise, I use the percentage of LFI leavers as an instrument for overallchanges in peer composition. I find that both replacing low-performing peers with average performingpeers and replacing average performing peers with high-performing peers has a positive and significantimpact on a student’s own level of achievement.Overall, these three chapters demonstrate a number of complex issues that are inherent to almostall school-choice policies. For example, chapter 2 emphasizes the heterogeneity of students; a school-choice option that is a great match for some students might be a poor match for others and vice versa.Furthermore, while parents do not necessarily possess information on which choice is best for theirchild at the initial enrolment decision, I show how parents are able to learn this information over timeand mitigate any negative effects from a potentially poor match. Chapter 3 emphasizes the trade-offsinvolved in school-choice programs. Some programs are designed to enhance a particular skill; forexample, math and science skills in the case of STEM schools, artistic abilities in the case of theatreand drama schools or, in the setting I examine, second language acquisition in the case of immersionprograms. However, devoting extra time to a school’s core focus potentially takes time away from otherareas; thus, gains in one area potentially come at the expense of another. Ultimately, evaluating the costsand benefits of this trade-off is up to the parents and their child, but we must acknowledge that thesetrade-offs exist. Finally, chapter 4 emphasizes the interconnectedness of all school-choice programs.129Chapter 5: ConclusionIf a school district decides to introduce an additional schooling option for parents, then this will affectthe student composition in the remaining schools. Thus, through peer effects, a new choice programpotentially affects all students, and not just those who enrol in the new program.130BibliographyAbdulkadirog˘lu, A., Angrist, J., and Pathak, P. (2014). The elite illusion: Achievement effects atBoston and New York exam schools. Econometrica, 82(1):137–196.Adda, J. and Cooper, R. W. (2003). Dynamic economics: quantitative methods and applications. MITPress.Aguirregabiria, V. and Mira, P. (2010). Dynamic discrete choice structural models: A survey. Journalof Econometrics, 156(1):38–67.Allen, M. (2004). Reading achievement of students in French immersion programs. EducationalQuarterly Review, 9(4):25–30.Altonji, J. G., Elder, T. E., and Taber, C. R. (2005a). An evaluation of instrumental variable strategiesfor estimating the effects of catholic schooling. Journal of Human Resources, 40(4):791–821.Altonji, J. G., Elder, T. E., and Taber, C. R. (2005b). Selection on observed and unobserved variables:Assessing the effectiveness of catholic schools. Journal of Political Economy, 113(1):151.Altonji, J. G., Huang, C.-I., and Taber, C. R. (2015). Estimating the cream skimming effect of schoolchoice. Journal of Political Economy, 123(2):266–324.Altschuler, G. and Skorton, D. (2012). America’s foreign language deficit. Forbes, 27:2014.Andrabi, T., Das, J., and Khwaja, A. I. (2014). Report cards: The impact of providing school and childtest scores on educational markets.Angrist, J. D. (2014). The perils of peer effects. Labour Economics, 30:98–108.Angrist, J. D., Pathak, P. A., and Walters, C. R. (2013). Explaining charter school effectiveness.American Economic Journal: Applied Economics, 5(4):1–27.Antecol, H., Eren, O., and Ozbeklik, S. (2016). Peer effects in disadvantaged primary schools evidencefrom a randomized experiment. Journal of Human Resources, 51(1):95–132.Archibald, J., Roy, S., Harmel, S., Jesney, K., Dewey, E., Moisik, S., and Lessard, P. (2006). A reviewof the literature on second language learning. ERIC.131Arcidiacono, P. (2004). Ability sorting and the returns to college major. Journal of Econometrics,121(1):343–375.Avery, C. and Pathak, P. A. (2015). The distributional consequences of public school choice. NBERworking paper No. 21525.Baker, M. (2013). Industrial actions in schools: strikes and student achievement. Canadian Journal ofEconomics/Revue Canadienne d’économique, 46(3):1014–1036.Baker, M. and Milligan, K. (2013). Boy-girl differences in parental time investments: Evidence fromthree countries. Technical report, NBER Working Paper No. 18893.Barrow, L., Schanzenbach, D. W., and Claessens, A. (2015). The impact of Chicago’s small highschool initiative. Journal of Urban Economics, 87:100–113.Becker, G. S. (1962). Investment in human capital: A theoretical analysis. The Journal of PoliticalEconomy, 70(5):9–49.Benson, J. (2012). Foreign language education improves young students’ academic suc-cess. Retrieved from <http://www.huffingtonpost.com/2012/12/05/foreign-language-education-students-success_n_2244477.html>. Ac-cessed: 2016-04-13.Bergman, P. (2015). Parent-child information frictions and human capital investment: Evidence froma field experiment. CESifo working paper series.Billings, S. B., Deming, D. J., and Rockoff, J. E. (2012). School segregation, educational attainmentand crime: Evidence from the end of busing in charlotte-mecklenburg. NBER working paper No.18487.Black, S. and Machin, S. (2011). Housing valuations of school performance. Handbook of the Eco-nomics of Education, 3:485–519.Black, S. E. (1999). Do better schools matter? parental valuation of elementary education. QuarterlyJournal of Economics, 114(2):577–599.Blundell, R. and Dias, M. C. (2009). Alternative approaches to evaluation in empirical microeco-nomics. Journal of Human Resources, 44(3):565–640.Bournot-Trites, M. and Tellowitz, U. (2002). Report of current research on the effects of secondlanguage learning on first language literacy skills. Printing House.Bowey, J. A. (2005). Predicting individual differences in learning to read. The Science of Reading: AHandbook, pages 155–172.Bowles, S., Gintis, H., and Osborne, M. (2001). The determinants of earnings: A behavioral approach.Journal of Economic Literature, 39(4):1137–1176.132British Columbia Ministry of Education (2006a). Applications of mathematics 10 to 12: Integratedresource package 2006. Retrieved from <https://www.bced.gov.bc.ca/irp/pdfs/mathematics/2006appofmath1012.pdf>. Accessed: 2016-04-13.British Columbia Ministry of Education (2006b). Applications of mathematics 10 to 12: Integratedresource package 2006. Retrieved from <https://www.bced.gov.bc.ca/irp/pdfs/mathematics/2006appofmath1012.pdf>. Accessed: 2016-04-13.British Columbia Ministry of Education (2006c). Essentials of mathematics 10 to 12: Integratedresource package 2006. Retrieved Mar 2016 from <https://www.bced.gov.bc.ca/irp/pdfs/mathematics/2006essofmath1012.pdf>. Accessed: 2016-04-13.British Columbia Ministry of Education (2006d). Principles of mathematics 10 to 12: Integratedresource package 2006. Retrieved from <https://www.bced.gov.bc.ca/irp/pdfs/mathematics/2006prinofmath1012.pdf>. Accessed: 2016-04-13.British Columbia Ministry of Education (2008). Foundations of mathematics and pre-calculusgrade 10. Retrieved from <http://www.bced.gov.bc.ca/irp/pdfs/mathematics/math_foundations_precalc10.pdf>. Accessed: 2016-04-13.British Columbia Ministry of Education (2009). Information for students, parents andguardians foundation skills assessment: Questions and answers. Retrieved from <https://www.bced.gov.bc.ca/assessment/fsa/translations/questions_answers/09_english_qa.pdf>. Accessed: 2016-04-13.Bruck, M. (1985a). Consequences of transfer out of early French immersion programs. AppliedPsycholinguistics, 6(02):101–120.Bruck, M. (1985b). Predictors of transfer out of early French immersion programs. Applied Psycholin-guistics, 6(01):39–61.Burke, M. A. and Sass, T. R. (2013). Classroom peer effects and student achievement. Journal ofLabor Economics, 31(1):51–82.Burman, D. D., Bitan, T., and Booth, J. R. (2008). Sex differences in neural processing of languageamong children. Neuropsychologia, 46(5):1349–1362.Caldas, S. J. and Boudreaux, N. (1999). Poverty, race, and foreign language immersion: Predictors ofmath and English language arts performance. Learning Languages, 5(1):4–15.Canadian Parents for French (2010). Annual FSL Enrolment 2006-2011. Retrieved from <http://cpf.ca/en/files/CPF-FSL-Enrolment-Stats.pdf>. Accessed: 2016-04-13.Card, D. (1994). Earnings, schooling, and ability revisited. NBER working paper No. 4832.133Card, D., Dooley, M. D., and Payne, A. A. (2010). School competition and efficiency with publiclyfunded Catholic schools. American Economic Journal: Applied Economics, 2(4):150–176.Center for Applied Linguistics (2011). Directory of foreign language immersion programs in u.s.schools. Retrieved from <http://webapp.cal.org/Immersion>. Accessed: 2013-06-21.Chamley, C. (2004). Rational herds: Economic models of social learning. Cambridge UniversityPress.Chernew, M., Gowrisankaran, G., and Scanlon, D. P. (2008). Learning and the value of information:Evidence from health plan report cards. Journal of Econometrics, 144(1):156–174.Chin, A., Daysal, N. M., and Imberman, S. A. (2013). Impact of bilingual education programs onlimited English proficient students and their peers: Regression discontinuity evidence from Texas.Journal of Public Economics, 107:63–78.Croll, J. and Lee, P. (2008). A comprehensive review of French second language programs and serviceswithin the anglophone sector of the New Brunswick Department of Education. Technical report,Report of the French Second Language Commission.Cullen, J. B., Jacob, B. A., and Levitt, S. D. (2005). The impact of school choice on student outcomes:an analysis of the Chicago Public Schools. Journal of Public Economics, 89(5):729–760.Cummins, J. (1976). The influence of bilingualism on cognitive growth: A synthesis of researchfindings and explanatory hypotheses. ERIC. Working Papers on Bilingualism, No. 9.Cummins, J. (1978). Educational implications of mother tongue maintenance in minority languagegroups. Canadian Modern Language Review, 34(3):395–416.Cummins, J. (1979). Linguistic interdependence and the educational development of bilingual chil-dren. Review of Educational Research, 49(2):222–251.Cummins, J. (2000). Immersion education for the millennium: What we have learned from30 years of research on second language immersion. Retrieved from <http://s3.amazonaws.com/academia.edu.documents/30285331/immersion2000.pdf?AWSAccessKeyId=AKIAJ56TQJRTWSMTNPEA&Expires=1468953313&Signature=NYoZwrTxZC8L8g4jvND85WZxh6g%3D&response-content-disposition=inline%3B%20filename%3DImmersion_education_for_the_millennium_W.pdf>. Accessed: 2016-04-13.Cunha, F., Heckman, J., and Navarro, S. (2005). Separating uncertainty from heterogeneity in lifecycle earnings. Oxford Economic Papers, 57(2):191–261.Cunha, F. and Heckman, J. J. (2008). Formulating, identifying and estimating the technology of cog-nitive and noncognitive skill formation. Journal of Human Resources, 43(4):738–782.134Deming, D., Hastings, J., Kane, T. J., and Staiger, D. O. (2014). School choice, school quality, andpostsecondary attainment. American Economic Review, 104(3):991–1013.District Info Sheet: Education Services (2014). French immersion. Retrieved from<https://www.surreyschools.ca/departments/EDSC/AnalyticsReports/District%20Info%20Sheet%20French%20Immersion%20Oct%2014pdf.pdf?>.Accessed: 2016-04-13.Dizon-Ross, R. (2015). Parents’ perceptions and children’s education: Experimental evidence frommalawi. Working Paper.Dobbie, W. and Fryer Jr, R. G. (2013). Getting beneath the veil of effective schools: Evidence fromNew York City. American Economic Journal: Applied Economics, 5(4):28–60.Dobbie, W. and Fryer Jr, R. G. (2014). The impact of attending a school with high-achieving peers: ev-idence from the New York City exam schools. American Economic Journal: Applied Economics,6(3):58–75.“Early French Immersion: K-7” (2016). Vancouver School Board. Retrieved from http://www.vsb.bc.ca/programs/early-french-immersion. Accessed: 2016-04-13.Eliot, L. (2012). Pink brain, blue brain: How small differences grow into troublesome gaps-and whatwe can do about it. Oneworld Publications.Epple, D. and Romano, R. E. (1998). Competition between private and public schools, vouchers, andpeer-group effects. American Economic Review, pages 33–62.Foley, K. (2012). Can neighbourhoods change the decisions of youth on the margins of universityparticipation? Canadian Journal of Economics/Revue canadienne d’économique, 45(1):167–188.Freeman, Y. S., Freeman, D. E., and Mercuri, S. (2005). Dual language essentials for teachers andadministrators. Heinemann Portsmouth, NH.“French Immersion is Education for the Elite” (2008). Editorial. Retrieved from<http://www.canada.com/vancouversun/news/editorial/story.html?id=144196bf-8a12-47e8-8109-b7be65a7bb9b>. Accessed: 2016-04-13.“Frequently Asked Questions (FAQs) about French Immersion” (2016). North Vancou-ver School Board. Retrieved from http://www.sd44.ca/ProgramsServices/FrenchImmersion/FIFAQ/Pages/default.aspx. Accessed: 2016-04-13.Friesen, J., Javdani, M., Smith, J., and Woodcock, S. (2012). How do school report cards affect schoolchoice decisions? Canadian Journal of Economics/Revue Canadienne d’économique, 45(2):784–807.135Genesee, F. (1978). A longitudinal evaluation of an early immersion school program. CanadianJournal of Education/Revue Canadienne de l’éducation, pages 31–50.Genesee, F. (2007). French immersion and at-risk students: A review of research evidence. CanadianModern Language Review, 63(5):655–687.Genesee, F. (2015). Myths about early childhood bilingualism. Canadian Psychology/PsychologieCanadienne, 56(1):6.Genesee, F. et al. (1977). An experimental french immersion program at the secondary school level1969 to 1974. Canadian Modern Language Review, 33(3):318–332.Genesee, F. and Jared, D. (2008). Literacy development in early French immersion programs. Cana-dian Psychology/Psychologie Canadienne, 49(2):140.Geweke, J. and Keane, M. (2001). Computationally intensive methods for integration in econometrics.Handbook of Econometrics, 5:3463–3568.Gowrisankaran, G. and Town, R. J. (2003). Competition, payers, and hospital quality1. Health ServicesResearch, 38(6p1):1403–1422.Handel, B. R. and Kolstad, J. T. (2015). Health insurance for “humans”: Information frictions, planchoice, and consumer welfare. The American Economic Review, 105(8):2449–2500.Hanushek, E. A., Kain, J. F., and Rivkin, S. G. (2004). Disruption versus Tiebout improvement: Thecosts and benefits of switching schools. Journal of Public Economics, 88(9):1721–1746.Hanushek, E. A., Kain, J. F., Rivkin, S. G., and Branch, G. F. (2007). Charter school quality andparental decision making with school choice. Journal of Public Economics, 91(5):823–848.Harris, E. (2015). Dual language programs are on the rise, even for native english speakers. New YorkTimes. October 8, 2015.Hastings, J. S., Kane, T. J., and Staiger, D. O. (2005). Parental preferences and school competition:Evidence from a public school choice program. NBER working paper No. 11805.Hastings, J. S. and Weinstein, J. M. (2008). Information, school choice, and academic achievement:Evidence from two experiments. The Quarterly Journal of Economics, 123(4):1373–1414.Heckman, J. and Singer, B. (1984). A method for minimizing the impact of distributional assumptionsin econometric models for duration data. Econometrica: Journal of the Econometric Society, pages271–320.Heckman, J. J. and Vytlacil, E. (2005). Structural equations, treatment effects, and econometric policyevaluation. Econometrica, 73(3):669–738.136Hotz, V. J. and Miller, R. A. (1993). Conditional choice probabilities and the estimation of dynamicmodels. The Review of Economic Studies, 60(3):497–529.Hoxby, C. (2000). Peer effects in the classroom: Learning from gender and race variation. NBERworking paper No. 7867.Hoxby, C. M. and Murarka, S. (2009). Charter schools in New York City: Who enrolls and how theyaffect their students’ achievement. NBER working paper No. 14852.Hoxby, C. M. and Weingarth, G. (2005). Taking race out of the equation: School reassignment and thestructure of peer effects. Working paper.Hutchins, A. (2015). Just say ‘non’: The problem with French immersion. Maclean’s, 128(12):16–20.Imbens, G. W. and Angrist, J. D. (1994). Identification and estimation of local average treatmenteffects. Econometrica, 62(2):467–475.Imberman, S. A., Kugler, A. D., and Sacerdote, B. I. (2012). Katrina’s children: Evidence on thestructure of peer effects from hurricane evacuees. The American Economic Review, pages 2048–2082.Jared, D., Cormier, P., Levy, B. A., and Wade-Woolley, L. (2011). Early predictors of biliteracydevelopment in children in French immersion: A 4-year longitudinal study. Journal of EducationalPsychology, 103(1):119.Jensen, R. (2010). The (perceived) returns to education and the demand for schooling. QuarterlyJournal of Economics, 125(2).Kasahara, H. and Shimotsu, K. (2009). Nonparametric identification of finite mixture models of dy-namic discrete choices. Econometrica, 77(1):135–175.Keane, M. P. and Wolpin, K. I. (1994). The solution and estimation of discrete choice dynamic pro-gramming models by simulation and interpolation: Monte Carlo evidence. The Review of Eco-nomics and Statistics, pages 648–672.Keane, M. P. and Wolpin, K. I. (1997). The career decisions of young men. Journal of PoliticalEconomy, 105(3):473–522.Kessler, D. P. and McClellan, M. B. (2000). Is hospital competition socially wasteful? The QuarterlyJournal of Economics, 115(2):577–615.Lange, F. (2007). The speed of employer learning. Journal of Labor Economics, 25(1):1–35.Lapkin, S., Hart, D., and Turnbull, M. (2003). Grade 6 French immersion students’ performance onlarge-scale reading, writing, and mathematics tests: Building explanations. Alberta Journal ofEducational Research, 49(1).137Lavy, V., Paserman, M. D., and Schlosser, A. (2012). Inside the black box of ability peer effects: Evi-dence from variation in the proportion of low achievers in the classroom. The Economic Journal,122(559):208–237.MacCoubrey, S., Wade-Woolley, L., Klinger, D., and Kirby, J. (2004). Early identification of at-risk l2readers. Canadian Modern Language Review, 61(1):11–29.Manski, C. F. (1993). Identification of endogenous social effects: The reflection problem. The Reviewof Economic Studies, 60(3):531–542.McFadden, D. (1973). Conditional logit analysis of qualitative choice behavior. Frontiers in Econo-metrics, pages 105–142.Nagypál, É. (2007). Learning by doing vs. learning about match quality: Can we tell them apart? TheReview of Economic Studies, 74(2):537–566.National Reading Panel (US) and National Institute of Child Health and Human Development (US)(2000). Teaching children to read: An evidence-based assessment of the scientific research litera-ture on reading and its implications for reading instruction. National Institute of Child Health andHuman Development, National Institutes of Health.Nechyba, T. J. (2000). Mobility, targeting, and private-school vouchers. American Economic Review,90(1):130–146.North Vancouver School District (2016). Late French Immersion (LFI) information meeting2016. [Powerpoint slides]. Retrieved from <http://www.sd44.ca/ProgramsServices/FrenchImmersion/LateImmersion/Documents/LFIPresentation.pdf>.Obadia, A. and Theriault, C. (1997). Attrition in French immersion programs: Possible solutions.Canadian Modern Language Review, 53(3):506–529.Panel, N. M. A. (2008). Foundations for success: The final report of the National Mathematics Advi-sory Panel. US Department of Education.Paradis, J., Genesee, F., and Crago, M. B. (2011). Dual language development and disorders: Ahandbook on bilingualism and second language learning. ERIC.Rivkin, S. G., Hanushek, E. A., and Kain, J. F. (2005). Teachers, schools, and academic achievement.Econometrica, 73(2):417–458.Robelan, E. (2011). Study of Mandarin Chinese by US students booming. Re-trieved from http://blogs.edweek.org/edweek/curriculum/2011/03/study_of_foreign_language_cree.html. Accessed: 2016-04-13.Sacerdote, B. (2014). Experimental and quasi-experimental analysis of peer effects: two steps forward?Annu. Rev. Econ., 6(1):253–272.138Slavin, R. E. and Cheung, A. (2005). A synthesis of research on language of reading instruction forEnglish language learners. Review of Educational Research, 75(2):247–284.Stange, K. M. (2012). An empirical investigation of the option value of college enrollment. AmericanEconomic Journal: Applied Economics, 4(1):49–84.Steele, J. L., Slater, R. O., Zamarro, G., Miller, T., Li, J., Burkhauser, S., and Bacon, M. (2015).Effects of dual language immersion on students academic performance. EDRE Working PaperNo. 2015-09.Stinebrickner, R. and Stinebrickner, T. (2014a). Academic performance and college dropout: Us-ing longitudinal expectations data to estimate a learning model. Journal of Labor Economics,32(3):601–644.Stinebrickner, R. and Stinebrickner, T. R. (2014b). A major in science? initial beliefs and final out-comes for college major and dropout. The Review of Economic Studies, 81(1):426–472.Stinebrickner, T. and Stinebrickner, R. (2012). Learning about academic ability and the college dropoutdecision. Journal of Labor Economics, 30(4):707–748.Swain, M. and Lapkin, S. (1982). Evaluating Bilingual Education: A Canadian Case Study. Multilin-gual Matters 2.Synder, T. and Dillow, S. (2012). Digest of education statistics 2011 (NCES 2012-001). NationalCenter for Education Statistics, Institute of Education Statistics, US Department of Education:Washington, DC.Turnbull, M., Lapkin, S., and Hart, D. (2001). Grade 3 immersion students’ performance in literacyand mathematics: Province-wide results from Ontario (1998-99). Canadian Modern LanguageReview, 58(1):9–26.U.S. Department of Education (2010). Education and the language gap: Secretary Arne Duncan’sremarks at the foreign language summit. Retrieved from <http://www.ed.gov/news/speeches/>. Accessed: 2014-08-10.Vytlacil, E. (2002). Independence, monotonicity, and latent index models: An equivalence result.Econometrica, 70(1):331–341.Walters, C. R. (2014). The demand for effective charter schools. NBER working paper No. 20640.Weisbrod, B. A. et al. (1962). Education and investment in human capital. Journal of Political Econ-omy, 70.Weise, E. (2007). As China booms, so does Mandarin in US schools. USA Today.What Works Clearinghouse (2013). Assessing attrition bias.139What Works Clearinghouse (2014a). Assessing attrition bias - addendum.What Works Clearinghouse (2014b). What works clearinghouse procedures and standards handbookv. 3.0.Whitehurst, G. and Klein, E. (2015). Is it groundhog day for school choice? Working paper, BrookingsInstitute Brown Center of Education.Wiley, T., Moore, S., and Fee, M. (2012). A “languages for jobs” initiative: Policy innovationmemorandum No. 24. Retrieved from Council on Foreign Relations Press website: http://www.cfr.org/united-states/languages-jobs-initiative/p28396. Ac-cessed: 2016-04-13.Wise, N. (2014). Phonological Awareness Training for Struggling Readers in Grade 1 French Immer-sion. PhD thesis, University of Toronto.Wise, N. and Chen, X. (2010). At-risk readers in French immersion: Early identification and early in-tervention. Canadian Journal of Applied Linguistics/Revue Canadienne de linguistique appliquée,13(2):128–149.Wooldridge, J. M. (2010). Econometric analysis of cross section and panel data. MIT Press.Wooldridge, J. M. (2015). Control function methods in applied econometrics. Journal of HumanResources, 50(2):420–445.Worswick, C. (2003). School program choice and streaming: Evidence from French immersion pro-grams. In Carleton University, Canada, Paper presented to the Canadian Employment ResearchForum Conference in Ottawa, June, volume 1, page 3.140Appendix A: Data AppendixData From British ColumbiaThe data for this paper comes from the British Columbia Ministry of Education (MED) and was obtainedthrough Edudata, a research unit at the Faculty of Education at the University of British Columba. TheMED defines data at the cohort level. For example, a cohort could be all students who entered theVancouver School District in 2009. Once a cohort is defined, all students in that cohort are subsequentlyfollowed over time so long as they remain in a BC school. The main sample for this paper consists ofall entering Kindergarten or grade 1 cohorts between 1998 and 2009 in 11 school districts: Vancouver,Burnaby, Surrey, Richmond, Abbotsford, Coquitlam, North Vancouver, West Vancouver, Langley, Deltaand Greater Victoria. With the exception of the last district, all of these districts are located in the “MetroVancouver” area, which is an area consisting of the city of Vancouver and surrounding areas.The original dataset contains 319,558 unique students with a total of 2.3 million student-year ob-servations for grades K–8 inclusive. For a small fraction of observations, 15,100 or 0.5%, the gradevariable is missing. 3,190 of these observations come from children being homeschooled and 8,063are what the ministry labels "Elementary ungraded" – which means that the Ministry of Education isunable to declare a specific grade that child is in. This could occur if, for example, the child is in aspecial education program for children of multiple ages. Indeed, many of these children are indicated ashaving some sort of special-education designation. For children with a missing grade variable, I imputewhat the grade would have been and also generate a imputation flag in case we want to later reversethese changes. The imputations assume that children progress from one grade to the next. Even afterthe imputations, there are still 1,339 observations with no grade assigned. In 1,072 of these cases, theprevious grade was grade 10 or above, suggesting that the student was enrolled in some special highschool course or program for students already graduated. Since this paper’s focus is on schooling de-cisions up to a maximum of grade 8, then these children are not a large concern. The remaining casesall end up being dropped because they take place after a censored year (see below). In addition to thegrade variable, there were also cases where the French Immersion indicator required adjustment. Forexample, if all children in an FI-only school were labelled out of FI or if a child was enrolled in FI ingrade K and 1, but not in grade 2, and then enrolled again in grades 3 onwards.145 In total, less than 1%of all children are affected by these changes.146145In later grades, there are cases where this latter example would be perfectly acceptable; for example, if a child leaves FIafter grade 4 and re-enters the late French Immersion program in grade 6.146I consider these changes to be conservative in the sense that they reduce the amount of FI attrition observed which is akey source of identifying variation in the paper.141Appendix A: Data AppendixAll students who ever attended a Francophone school are dropped from the sample. (Francophoneschools are designed for children of French speaking parents as opposed to FI which is designed forchildren of non-French speaking households). This causes 973 children to be dropped. I drop all stu-dents who were ever enrolled in a private school for disabled children (1,944 students or 0.67%) andall special-education students ever labelled as being “Physically Dependant”, “Moderate to High Intel-lectually Disabled”, “Physical Disability/ Chronic Condition”, “Autistic” or having a “Mild IntellectualDisability”.147 This causes an additional 8,918 children to be dropped (or 2.9% of all children). Forchapters 2 and 3, all students who ever repeated a grade or skipped a grade are dropped from the samplein order to simplify the estimation of the discrete choice model. This drops 8,602 students or 2.5%of the sample. Another 1,654 children are dropped for miscellaneous reasons such as data improperlyentered. For example, if all children in a given school year were observed to be out of FI even thoughenrolment reports list that school as offering the both FI and traditional program.In chapters 2 and 3, although I have data on all entering cohorts for both grades K and 1, I decidedto limit the sample to the entering grade K cohorts only. This was done for consistency and because thestructural model in chapter 2 focusses on parental choices from grade K onwards. This drops a further39,768 children or about 13.3% of the total number of children.148 Another 7,077 children are droppedbecause they are only observed in grade K. Finally, another 2,105 children (0.8%) are dropped becausethey are missing either the census variables or driving distance measures (see below).149 In chapter 2,the final starting sample is 248,517 children (or 1,853,548 student-year observations between gradesK and 8), 26,063 of whom enrol in the early FI program. For chapters 3 and 4, the sample is smallerbecause they both require at least one observed test score. Furthermore, in chapter 4, because of thetiming of the standardized tests and structure of the LFI program, I focus on children enrolled in themain school districts starting in grade 4 as opposed to grade K. Each of the individual chapters go on todetail how their respective samples are further reduced.Censored ObservationsApproximately 3% of children are censored in that they either leave the sample permanently (prior to2012, the final year for which I have data) or leave the sample and return a few years later.150 In thesecircumstances, I drop all observations after the missing year from which point out the child is consideredto be a censored observation. This causes 35,591 student-year observations to be dropped or 1.5% ofthe sample.147A label was chosen to be excluded if at least 60% of children with said label failed to write the standardized tests.148Approximately two-thirds of these children come from the initial 1998 cohort where I have children in grade 1, but I donot observe them in grade K for the year 1997. Note that this change did not have any material effect on the descriptive resultsshown in section 3, suggesting that the exclusion of these children did not adversely affect the main results.149Note that some of these children end up back in the sample because for the structural model I impute some missing censusvariables.150Most children are censored during their high school years; however, since the model of interest in this paper only goes upto grade 8. For the purposes of this paper a child not censored until high school is considered to be uncensored.142Appendix A: Data AppendixTest Score Data — Foundation Skills Assessment ExamsThe Foundation Skills Assessment tests are a series of tests in numeracy (hereafter referred to as the“math” test), reading comprehension (hereafter referred to as the “reading” test) and writing adminis-tered to all BC students in grades four and seven. The math test is a combination of multiple choice andproblem solving questions. The reading test is a combination of multiple choice and short-answer (ap-proximately one paragraph) questions. The writing test consists of longer essay type questions (whichare typically descriptive in nature).From the MED, I obtained a file of test score results of all children in the main dataset. In thedata, I observe which students wrote the tests, their “raw score” corresponding to the number of correctresponses and a “scaled score” which adjusts the students’ raw score for the degree of difficulty of thevarious questions.151 While a “year” variable is not included in the data, I do have information on thestudent’s date of birth and the child’s age in days at the time of the test. This allows me to back out theyear variables (and provides a nice check on the data). The data also contains the language the test waswritten in (allowing me to confirm that FI students write the tests in English).In the main dataset, there are 373,593 instances of a child observed in either grade four or gradeseven. Over 99.8% of the test scores were successfully merged onto the student data file. The mergewas done at the student-year-grade level. (Note: Even students who did not write the tests should stillbe in the test score file with a missing value for the test score.) In the data, students who did not write atest are given a raw score of 0, but a missing value for the scaled score. In total, approximately 9% ofstudents do not have a valid test score in any of the three components.152 For all tests written prior tothe 2007/2008 school year I also have a variable indicating if a child was excused from the test. A childcould be excused from the test is they have a disability or were ill the day of the test. Being excusedfrom the standardized test is the prerogative of the school principal. Approximately 50% of the childrenwith no test score in the years prior to the 2007/2008 school year are indicated as being excused fromthe test. For a small percentage of students, the raw score is positive, but the scaled score is still missing.These are students who did not answer enough questions to qualify for a scaled score; they would havereceived the designation “student did not respond meaningfully”. There are only 2067 or 0.6% of thetotal number of test scores with this characteristic. In these cases, I assigned these children the lowestpossible scaled score and flagged them to make sure their inclusion is not biasing any results. This doesnot appear to be the case. This leaves us with a missing test score rate of 8.2%. For every year-grade-subject combination, I standardized the scaled score to be mean 0 and variance 1. This is a standardtechnique used in the education literature to make tests scores more comparable over time by comparinga child’s relative position over time instead of their actual scores.151There is no scaled score for the writing component, which is an integer score on a scale from 1-12. All transformationsfor the writing test were done using the raw score.152This percentage is lower than the true total because I have already dropped children with severe disabilities and studentswho ever repeated or skipped a grade. Both of these groups have a disproportionate number of missing test scores. In the rawtest score data, approximately 12% of all students do not have a valid test score.143Appendix A: Data AppendixTest Score Data — Provincial ExamsData on student achievement at the secondary school level comes in the form of provincial exams admin-istered to all children taking specific courses in grade 10. Based on the initial year of entering cohorts, Ilimit the sample to children who entered kindergarten up to and including the 2002/2003 school year —these are the children who would have entered grade 10 in the final school year of the data, 2012/2013.153The courses I have chosen to include in my analysis are Grade 10 English, Grade 10 Science, and twograde 10 math courses: Principals of Mathematics and Foundations of Mathematics and Pre-Calculus.In BC, students are required in grade 10 to take one math course, one science course and one englishcourse. A vast majority of the students elect to take the courses listed above. The reason there are twomath courses is because starting in the 2010/2011 school year, the BC government significantly changedtheir mathematics curriculum at the secondary school level. Principals of Mathematics was the primarymath course taken by students enrolled in grade 10 prior to the 2010/2011 school year. This course waseffectively replaced by Foundations of Mathematics and Pre-Calculus starting in the 2010/2011 schoolyear. While the courses are not identical, they do have significant overlap. Areas of overlap betweenthe two courses include, but are not limited to: trigonometry, geometry, irrational numbers, factoringand simplifying expressions, patterns and relations and graphing concepts. For this reason, I have cho-sen to combine these two courses into one. In addition, another difference between the pre and post2010/2011 regimes were the alternative options available to students. Prior to the 2010/2011 schoolyear, students could also elect to take Essentials of Mathematics or Applications of Mathematics. Theformer was designed to, “provide students with the necessary numeracy skills and concepts to be suc-cessful in their daily lives, business, industry, and government” (“Essentials of Mathematics 10 to 12”,2006, page 4). while the latter prepares “students for non-calculus based post- secondary programs ofstudy such as certificate programs, diploma programs, continuing education programs, trades programs,technical programs, and some university programs” (Introduction to Applications of Mathematics 10 to12). Starting in the 2010/2011 school year, students had the option of taking Apprenticeship and Work-place Mathematics, which is a course whose goals are similar to the two described above; that is, toprovide students with basic numerical skills needed for today’s job market. Finally, note that if a studentdecides to re-take any of these courses, they have the option of whether or not to re-take the provincialexam. If they elected not to, the exam mark from the previous year was used instead. If they re-take theexam, the higher of the two exam marks is used in the calculation of the student’s final grade.Census DataThe administrative data does not contain very much demographic information beyond gender and homelanguage. In order to account for additional socioeconomic characteristics, I link the individual studentrecords to the 1996, 2001, 2006 Canadian Censuses and 2011 National Household Census. This linkage153Fewer than 1% of exams are seen written by students who entered kindergarten after 2002, and a disproportionate amountdid not write the English or science exam.144Appendix A: Data Appendixis done by first mapping each student’s postal code to their corresponding dissemination area (DA)154,and then merging on DA characteristics obtained from the Statistics Canada Cumulative Area Profiledatasets. A postal code is a six-digit alphanumeric string and is analogous (though much smaller) toa zip code in the US. A dissemination area is a geographic area that contains on average a populationof 400-700 persons and is the smallest geographic area for which census data is aggregated and madeavailable.155In total, I was able to successfully merge on census level variables in 99.5% of observations. Ofthe approximately 12,000 instances where a merge failed to occur, 1,800 occurred because no postalcode was given (in that specific year). In 2,000 of the remaining cases, the given postal code wasvalid in earlier years, but not the current year. This could occur if statistics Canada altered or retiredcertain postal codes, but parents were not yet aware of this fact. Missing census characteristics wereimputed based on the following methodology: If a postal code was invalid in a given year but valid in anadjacent year, then I used the adjacent year’s values. This changed approximately 2,913 or 0.2% of allobservations. Next, I imputed missing census characteristics by calculating a linear trend within eachforward sortation area (FSA) and then interpolated for the missing years.156 This led to a further 1,674observations changed or 0.1% of all observations. Finally, if the variable still had a missing value (forexample, if the postal code was missing entirely for that year), I used the child’s average value for theremaining years which led to 3,568 changes or 0.3% of all observations. Even after these imputationprocedures, 2,105 children still needed to be dropped from the sample if they were missing either thecensus characteristics or distance measures (see next section).Distance to SchoolsThe distance to schools variable is created using the address of the school and the postal code of thechild’s listed home address. On the data-BC website, there is a file available for download that providesthe exact address and longitude and latitude coordinates of every BC school going back to 1995. Eachchild’s postal code was mapped to a corresponding latitude and longitude co-ordinate using the StatisticsCanada Postal Code Conversion Files (PCCF).157 These coordinates generally refer to the geographic“centre” of the area contained within a given postal code. Using these two sets of co-ordinates (onefor the child and one for the school), I calculated the distance variable using two separate methods: “asthe crow flies” and “travel time”. The former method uses the “spherical law of cosines” in order to154This linkage was done using the Statistics Canada Postal Code Conversion Files (PCCF). In some cases the PCCF listedmore than one DA for each postal code. This is an issue that arises more in rural settings where postal codes generallyencompass a much larger area than postal codes in urban areas. In the event that more than one DA was given for a postalcode, I chose the one that was centred on the most number of dwellings as indicated in the PCCF.155Prior to 2001, Dissemination areas did not exist; instead Statistics Canada used a similarly defined “Enumeration Area”.For the purposes of this paper, I use the term dissemination area to refer to enumeration areas as well.156Formally, for a given census characteristic X, I ran the regression:Xit = α+βyear+FSAi +uitand calculated the predicted values for the missing year.157As in a footnote above, in some cases the PCCF listed more than one set of co-ordinates for each postal code. Once again,I chose the set of co-ordinates that was centred on the most populated area as indicated in the PCCF.145Appendix A: Data Appendixcalculate the shortest distance between any two points on the planet. The latter method was calculatedusing the traveltime3 command in Stata which uses Google Maps in order to calculate the distance andtime it takes by car to get between the two points. Note that because of data confidentiality, I couldnot run this command on the data directly. Instead, I estimated the closest FI and non-FI school for theentire universe of postal codes in the main school districts and then merged them onto the main datasetalong with the other census characteristics. All results presented in chapters 2 and 3 use the “traveltime” definition. (This allows me to deal with obstacles such as rivers or inlets that can lead to largedifferences between the two methods.). In chapter 4, I use the “as the crow flies” measure in order tomake the results consistent with the methodology used in the Ontario data.For approximately 1.5% of children, I was not able to successfully calculate a driving time definition.Possible reasons for this include an invalid postal code (which as we saw above happened in about 0.5%of cases) or the traveltime3 command (or google maps) had issues converting the given longitude andlatitude coordinates into driving distances. All children with a missing driving distance measure in gradeK are dropped from the sample.Data From Ontario — Chapter 4Confidential administrative data on Ontario students was obtained from the Education Quality and Ac-countability Office (EQAO) of Ontario and accessed through the PEDAL data laboratories. The dataconsists of all standardized-test takers in grades 3 and 6 for the entire province of Ontario between2004 and 2012 inclusive. The data begins with approximately 873,000 unique students. All studentsobserved twice in the same grade, ever enrolled in a Francophone School as well as all children en-rolled in a French Immersion program in grade 3 are dropped the sample. In total, these three stepscause approximately 100,000 students to be dropped from the sample, with the majority coming fromchildren in immersion and francophone schools. Next, since my MFI designation requires observing achild in both grades 3 and 6, I drop all children who are not observed in both grades for any reason orchildren without valid immersion designations in both of these grades. This leads to another 100,000children dropped from the sample. Furthermore, since this paper’s primary focus is on students leavingto enrol in the MFI program, I drop all school boards in which the program is unavailable. Formally,I drop all school-board-years in which fewer than 5 students observed in grade 3 subsequently enrol inthe MFI program. This step causes approximately 380,000 children to be dropped, most of which arein school-board-years without any MFI students. Recall that this is not to say that no French Programswere offered to children in later grades in these school districts. Alternative programs such as ExtendedFrench which is more popular than MFI are not observed in the data. Next, as described in the data,I only label a school an MFI school is it is observed with at least 5 MFI students in a given year. Ifnot, then I flag these children and remove their MFI designation for the purposes of the remainder ofthe statistical analysis. As a final step, I limit the sample to children without missing test scores andbaseline characteristics and drop all school-year combinations in which greater than 75% of childrenleft to enter the MFI program. The final sample contains 243,652 children, 13,609 of whom enrol in theMFI program.146Appendix B: Appendix to Chapter 2B.1 Additional Model DetailsValue FunctionsThis section provides additional details on solving the parents’ problem in the model and estimationmethods. From equation (2.13), the value functions for grades 2≤ t ≤ T −1 are given by:Vt(Ct−1,St) =max{Ut(1,St)+βESt+1(Vt+1(1,St+1|St))+ξi,t,1,−SC10 +β(Vt+1(0))+ξi,t,0}if Ct−1 = 1β(Vt+1(0))+ξi,t,0 if Ct−1 6= 1(B.1)where Ut is as defined in equation (2.10). Recall that Vt+1(Ct ,St+1|St) = Eξi,t+1,C[Vt+1(Ct,St+1)]. Thisterm has a well-known closed form distribution (McFadden, 1973). In particular, suppose that z1 =U1+ξ1 and z2 =U2+ξ2 with ξ1,ξ2 having a type 1 Extreme value distributions, then E [max(z1,z2)] =euler+ log(eU1 + eU2) where euler is Euler’s number. This implies thatVt(1,St+1|St) = euler+ ln(e(Ut(1,St)+βESt+2(Vt+1(1,St+2,|St+1))) + e(−SC10+β(Vt+1(0))))(B.2)Vt(0,St+1,|St) = euler+β(Vt+1(0)(B.3)for all grades 2 ≤ t ≤ T −1. It is easily seen from the above equations that parents do not just have totake expectations over next period’s shocks, but also over next period’s signal St+1. This requires anupdated distribution for St+1. At grade t, suppose that parents’ updated belief about the distribution ofη is given by η ∼ N(µ t ,σ2t ). It therefore follows that from the parent’s point-of-view:St = η+ εitSt ∼ N(µt ,σ2t +σ2ε ) (B.4)Using equation (B.4), we can rewrite the expected value function asESt+1(Vt+1(1,St+1,|St))=ˆ (Vt+1(1,St+1|St)) 1√σ2t +σ2εφ(St+1−µt√σ2t +σ2ε)dSt+1 (B.5)Since it is the expected mean that directly enters the utility function, in practice I calculate theexpectations over the distribution of µt+1 directly. Recall from equation (2.9) that µt+1 = (1−Wt)µt +147B.1 Additional Model DetailsWt+1St+1, thusµt+1 ∼ N((1−Wt)µt +Wt+1µt ,W 2t+1(σ2t +σ2ε ))= N(µt ,W 2t+1(σ2t +σ2ε ))(B.6)Equations (B.1)-(B.5) apply to value functions from grade 2 onwards. For grade t = 1, we now needto take into account that all children are eligible for FI enrolmentV1(C0,St) =max{U1,1(1,S1)+βES2(V2(1,S2|S1))+ξi,1,1,−SC10 +β(V2(0))+ξi,1,0}if Ct−1 = 1max{U1,0(1,S1)+βES2(V2(1,S2))+ξi,1,1−SC01,β(V2(0))+ξi,1,0}if Ct−1 6= 1(B.7)where V2(1,S2|S1) and V2(0) are calculated as in equations (B.2) and (B.3). In the initial period(t = 0), the value functions are more complicated because of the capacity constraints on enrolment ingrade one. The value functions at t = 0 are given byV (C0) = max{U0(1,S1)+βES1(V1(1,S1))+ξi,0,1,β(V1(0))+ξi,0,0}where V1(1,S1) is defined as above using equations (B.2) and (B.7). In order to solve for the formulafor V1(0), we need to define qdyt as the probability of entry into the FI program for children in districtd in year y in grade t. The expected value of not being in FI in grade 0 must take into account theprobability that in the following period, parents might prefer FI but are unable to enrol. Define UC1 to bethe utility parents get from choice Ct in grade one and p1 to be the probability parents get higher utilityfrom choosing FI = 1 : p1 = Pr (U1 >U0). We have thatV1(0) = E [U1|U1 >U0] p1qdy1+E [U0|U1 >U0] p1(1−qdy1)+E [U0|U0 >U1] (1− p1) (B.8)= qdy1 {E [U1|U1 >U0] p1+E [U0|U0 >U1] (1− p1)}+ ......(1−qdy1) [E [U0|U1 >U0] p1+E [U0|U0 >U1] (1− p1)]= qdy1E [max(U0,U1)]+(1−qdy1)E [U0] (B.9)= qdy1V1(1)+(1−qdy1)E [V1(0)]The first line given by equation (B.8) says that the expected value function is equal the the expectedutility from choosing FI — provided that the choice of FI yields the highest utility and parents are ableto get in — plus the expected utility from not going to FI. The latter could result either because parentschose that option or because they were unable to secure a spot in FI despite it providing higher utility.From equation (B.9), we can interpret the expected value function in terms of two cases: one in whichparents have a choice (V1(1)) and one in which they do not (V1(0)).148B.1 Additional Model DetailsProof that Vt+1(1,St+1|St) is convex in St+1The convexity of the expected value function follows from the closed form solution using logit errors.From equation B.2, we have thaty≡ E [max(z1,z0)] = euler+ log(eU1 + eU0)z1 = U1+ξ1z0 = U0+ξ0where ξ1,ξ2 are type-I extreme value distributed shocks and Ui is the direct utility from choosing FI ={0,1}. Taking the first and second derivatives with respect to U1, we have that∂y∂U1=1(eU1 + eU0)eU1 > 0∂ 2y∂U21=eU1+U0(eU1 + eU0)2> 0and thus y is convex with respect to U1. The last part of the proof is to show that parents utility fromchoosing FI = 1, U1, is increasing in St+1. But, this is obviously true since a higher value of St+1 impliesa child is a better match with the FI program, which raises parents utility from choosing FI. Proof that ESt+1[Vt+1(1,St+1|St)]is increasing in Var(St+1)Suppose that (from the parents’ point of view) St+1 ∼ N(µt ,σ2t ). Define y to be the expected valuefunction tomorrow:y≡ ESt+1(Vt+1(Ct,St+1|St))=ˆ ∞−∞Vt+1(Ct,{St+1,St}|St)φ(St+1−µtσt)1σtdSt+1 (B.10)To simplify notation, let F(St+1) ≡ Vt+1(Ct,{St+1,St}|St) and substitute into equation (B.10) mt+1 ≡St+1−µtσt .y =ˆ ∞−∞F(mt+1σt +µt)φ (mt+1)dmt+1 (B.11)149B.1 Additional Model DetailsDifferentiating equation (B.11) with respect to σty =ˆ ∞−∞F(mt+1σt +µt)φ (mt+1)dmt+1∂y∂σt=ˆ ∞−∞F ′(mt+1σt +µt)φ (mt+1)σtdmt+1=ˆ 0−∞F ′(mt+1σt +µt)φ (mt+1)σtdmt+1+ˆ ∞0F ′(mt+1σt +µt)φ (mt+1)σtdmt+1= −ˆ ∞0F ′(−mt+1σt +µt)φ (−mt+1)σtdmt+1+ˆ ∞0F ′(mt+1σt +µt)φ (mt+1)σtdmt+1=ˆ ∞0[F ′(mt+1σt +µt)−F ′(−mt+1σt +µt)]φ (mt+1)σtdmt+1 (B.12)> 0where equation (B.12) uses the fact that φ(x) = φ(−x) and the final line uses the convexity of F() withrespect to St+1 (see proof above) and the fact that both σt and φ (mt+1) are strictly positive.LikelihoodWith the value functions solved for, it is straightforward to calculate the likelihood of observing a givenoutcome. In the final period, the probability of FI enrolment is given by:P(FIT = 1) =0 if FIT−1 = 0eUT (1,ST )+βVT+1(FI)1+eUT (1,ST )+βVT+1(FI)if FIT−1 = 1For 2≤ t ≤ T −1 and using equation (B.1) we haveP(FIt = 1) =0 if FIt−1 = 0eUt (1,St )−Ut (0,St )+βESt+1(Vt+1(1,St+1 |St))−β(Vt+1(0))1+eUt (1,St )−Ut (0,St )+βESt+1(Vt+1(1,,St+1 |St))−β(Vt+1(0))if FIt−1 = 1for t = 1, we need to adjust the value functions for the capacity constraints.P(FIt = 1) =(eU10(1,S1)−U10(0,S1)+βES2 (V2(1,S2))−β(V2(0))1+eU10(1,S1)−U10(0,S1)+βES2 (V2(1,S2))−β(V2(0)))qdy if FIt−1 = 0(eU11(1,S1)−U11(0,S1)+βES2 (V2(1,S2))−β(V2(0))1+eU11(1,S1)−U11(0,S1)+βES2 (V2(1,S2))−β(V2(0)))if FIt−1 = 1and similarly for t = 0. The remainder of the likelihood is solved exactly as described in the main textin section 4.3.150B.1 Additional Model DetailsEstimationEstimating the Value FunctionsIn order to solve for the integral as in equation (B.5), I use numerical integration. I discretize thedistribution F(µt+1) using the Ada and Cooper (2003) method and solve the value inside the integrandat each point.158 Since this is a finite period model, one concern with discretizing the integral is thatevaluating the expected value functions, especially in early periods, will be computationally intensivedue to the curse of dimensionality. For each discretized value of µt+1 into m points, we will need afurther m different values the following period (for a total of m2) and total of m3 the period after that. Inorder to avoid this problem, I evaluate the value function at only a subset of points (for each individual-time combination) and then interpolate. This method is similar to the one used by Keane and Wolpin(1994) in their structural model. In practice, I evaluate the value function at 30 different values andthere are separate value functions for each individual–grade–type–FI–grade-of-entry (the latter sincethe perceived distribution of St will vary depending on the total number of signals received).Calculating qdyIn the data, I only observe if a child is enrolled in FI. I do not observe the application decision. Therefore,the probability of being accepted into FI is computed as a function of the estimated parameters as wellas exogenous number of FI spaces. By definition,P(spot|District,Apply) =1 if t>0 And FIt−1 = 1max(1, Number of Spaces in District d in year yNumber of Applications in District d in year y) Otherwisewhere this formula makes use of the fact that if a child is enrolled in FI, then they are automaticallygranted a spot in the following year. As discussed in the main text, the total number of spots in a district-year is treated exogenously — a function of the number of FI classrooms and provincial regulation onclass size. Let Ddy refer to the set of all individuals in district d in year y. The “Number of Applications158The Ada and Copper (2003) method of approximating a normal distribution, y∼ N(µ,σ2) is as follows.:1. Set # of values n (In practice, I used n = 15. Higher values led to very small changes in the likelihood)2. Find all mi, i = 1...n such that Φ(mi−µσ ) =in ⇐⇒ mi = σΦ−1( in )+µ3. Define Zi =[−∞,m1] if i=1[mi−1,mi] if i=2...n−1[mn,∞] if i=n4. Define yi = C[y|y ∈ Zi] =µ−σnφ(m1−µσ ) if i=1µ−σn(φ(mi−µσ )−φ(mi−1−µσ ))if i=2...n-1µ+σnφ(mn−1−µσ ) if i=n5. Since P(Y = yi) = 1n∀i = 1...n, we only need to evaluate the function at each value of yi and then take the average.151B.2 Analysis of Change in Administration of the Standardized Testsin district d in year y” is defined as∑i∈DdyeΣ1it1+ eΣ1itI approximate the probability of acceptance by calculating the expected number of applications in agiven district-year. For the grade one probability, I adjust for the fact that children already enrolled inFI in grade K are automatically accepted into FI.159Censored ObservationsIn the dataset I am using there are a significant number of censored observations – mostly becausethe data ends in 2012 and children are still in school. In the estimation of the model, I allow forright-censored individuals. This requires estimating the period t + 1 value function for individuals notobserved after period t. This is done by using the uncensored observations to estimate a relationshipbetween the value function in period t+1 and period t characteristics and then plugging in the predictedvalues for the censored observations. In practice, I ran a separate linear regression of Vt+1 on Zt and µtfor each grade t and type m.160B.2 Analysis of Change in Administration of the Standardized TestsIn this section, I present evidence that the Foundation Skills Assessment tests themselves are not in-fluencing the parents FI exit decisions. I take advantage of a change in the administration of the tests.Prior to the 2007/2008 school year, all tests were administered in May and results were not given toparents until September of the following school year. Starting in the 2007/2008 school year, the testswere administered in February and the results (of the child’s raw scores) were distributed to parents bythe end of March.161 Because most school districts have an enrolment period that takes place aroundFebruary-April (and in some cases to June), it makes it very unlikely the parents in the pre-2007 period159This method also assumes that parents are performing a similar calculation when making their initial entry decisionbecause parents need to forecast what the FI entry probability will be in grade one. Here, I avoid adding another unknownstate variable (initial FI enrolment) by making the following simplifying assumption. Let pFI denote the actual proportionof children enrolled in FI in grade K, App0 be the expected number of FI applications in grade one assuming everyone wasenrolled in FI in grade K and similarly App1 is the expected number of FI applications in grade one assuming everyone wasnot enrolled in FI in grade K. Then, parents’ beliefs about the probability their child will be accepted into an FI program ingrade one conditional upon not applying in grade K is given by:max(1,Number of Spaces in District d in year y− pFI ∗App0App1 ∗ (1− pFI))160Including additional interactions in this regression does not materially affect the results.161There was also a change in the information given to parents. Prior to the 2007/2008 school year, for each subject par-ents were effectively told if their child fell into one of five designations: Below expectations, between below and meetingexpectations, meeting expectations, between meeting and exceeding expectations and exceeding expectations. Starting in the2007/2008 school year, parents were given the child’s raw test score for each subject. In order to help parents place thesescores into context, parents were also told how these scores map into one of three designations: “Below expectations”, “Meet-ing Expectations” and “Exceeding expectations”. (For example, raw math scores of 0-22 would be associated with “belowexpectations.”)152B.2 Analysis of Change in Administration of the Standardized Testswere able to respond to the Foundation Skills Assessment tests. In addition, parents in this period wouldhave had to switch their child out of FI after the start of the school year — something many parents maybe reluctant to do. Therefore, I test the hypothesis that parents are responding to their child’s test scoresdirectly by re-running the regression in equation 2.1, but now including an interaction term between testscores and a dummy variable indicating if the test was taken before 2007.Exitit+1g+1 = G(X ′t β +θT Sitg ∗1(t ≥ 2007)+piT Sitg ∗1(t < 2007)+δsgt +ξit+1g+1)(B.13)where all parameters are as defined in section III.The results for equation (B.13) are displayed in panel A of Appendix table B.6, which shows thatthere is no significant difference on the test score log-odds ratio before or after 2007 and, in fact, thecoefficients are larger in magnitude in the pre-2007 period. In panel B, I re-run the specification inequation (B.13), but now limit the sample to grade four only. I run this specification for two reasons.First, we already saw above that parents are much more sensitive to information in grade four than ingrade seven. Secondly, because the first year of data comes from entering cohorts in 1998, this impliesthat most of the tests in grade seven will be in the post-2007 period. Since we know the grade sevenresult is closer to 0, then this potentially biases the estimates in panel A towards finding no difference.However, from panel B we can see that this is not the case, as there is no significant difference foundbetween the log-odds ratios across the two regimes. At the same time, despite the fact that parents inthe pre-2007 period are not able to respond to the Foundation Skills Assessment, we still observe acorrelation between the standardized test score and program exit that is significant at the one percentlevel.153B.3 Supplemental Figures and TablesB.3 Supplemental Figures and TablesFigure B.1: Distribution of Achievement by Program Enrolment0.1.2.3.4Density−3−2.5−2−1.5−1 −.5 0 .5 1 1.5 2 2.5 3Test ScoreThe mean (se) performance of the FI students is .11 (.006)The mean (se) performance of the non−FI students is .07 (.002)Math − Current FI and Non−FI0.1.2.3.4.5Density−3−2.5−2−1.5−1 −.5 0 .5 1 1.5 2 2.5 3Test ScoreThe mean (se) performance of the FI students is .1 (.006)The mean (se) performance of the non−FI students is .01 (.002)ELA − Current FI and Non−FI0.1.2.3.4Density−3−2.5−2−1.5−1 −.5 0 .5 1 1.5 2 2.5 3Test ScoreThe mean (se) performance of the FI students is .05 (.005)The mean (se) performance of the non−FI students is .07 (.002)Math − Initial FI and Non−FI0.1.2.3.4.5Density−3−2.5−2−1.5−1 −.5 0 .5 1 1.5 2 2.5 3Test ScoreThe mean (se) performance of the FI students is .05 (.005)The mean (se) performance of the non−FI students is .01 (.002)ELA − Initial FI and Non−FINon−FI Students FI StudentsNotes: This figure shows the distribution of student achievement by test score subject and FI program enrolment. In the top panel, the sampleis split depending on whether or not the child is enrolled in the (early) FI program during the actual grade of the test. In the bottom panel, thesample is split depending on whether or not the child was initially enrolled in the early-FI program. All graphs combine the grades 4 and 7Foundation Skills Assessment tests. “ELA” is the average score of the child’s reading and writing tests.154B.3 Supplemental Figures and TablesFigure B.2: Quantile Regression of Test Score on FI Enrolment−.1−.050.05.1.15Impact of FI Enrol ment0 10 20 30 40 50 60 70 80 90 100QuantileCurrent Enrolment−.1−.050.05.1Impact of Initial  FI Enrolment0 10 20 30 40 50 60 70 80 90 100QuantileInitial EnrolmentGrade 4 Grade 7Notes: This figure displays the coefficients on FI in a quantile regression of test scores on FI and both child-specific and Dissemination Areacontrols. In the graph on the top FI defined based on the child’s status at the grade of the test; that is, FI = 1 if a child is enrolled in the (early)FI program in grades four or seven and 0 otherwise. In the graph on the bottom, FI is based on initial enrolment into the FI program.Figure B.3: Average Exit Grade by Ability12345678Average Exit Grade1 2 3 4 5 6 7 8 9 10Decile of FI Ability (η)Notes: This figure displays the predicted average grade of exit for FI children by decile of ability. All displayed results are the average takenover 75 simulated draws of each model.155B.3 Supplemental Figures and TablesFigure B.4: Predicted Test Scores under No Updating Scenarios.08.1.12.14.16.18Simulated Test Score0 1 2 3 4 5 6 7 8GradeBaseline No Info No Info (Alternate)Notes: This figure displays the simulated average student achievement in grades K–8 for both the baseline model and the two counterfactualno-information simulations. Simulated achievement is based on the estimated parameters using the grades 4 and 7 test scores. Simu