Transcription

How Does Your KindergartenClassroom Affect Your Earnings?Evidence from Project StarThe Harvard community has made thisarticle openly available. Please share howthis access benefits you. Your story mattersCitationChetty, Raj, John N. Friedman, Nathaniel Hilger, Emmanuel Saez,Diane Whitmore Schanzenbach, and Danny Yagan. 2011. How DoesYour Kindergarten Classroom Affect Your Earnings? Evidence fromProject Star. Quarterly Journal of Economics 126(4): 1593-1660.Published Versionhttp://dx.doi.org/10.1093/qje/qjr041Citable 9983Terms of UseThis article was downloaded from Harvard University’s DASHrepository, and is made available under the terms and conditionsapplicable to Open Access Policy Articles, as set forth at rrent.terms-ofuse#OAP

NBER WORKING PAPER SERIESHOW DOES YOUR KINDERGARTEN CLASSROOM AFFECT YOUR EARNINGS?EVIDENCE FROM PROJECT STARRaj ChettyJohn N. FriedmanNathaniel HilgerEmmanuel SaezDiane Whitmore SchanzenbachDanny YaganWorking Paper 16381http://www.nber.org/papers/w16381NATIONAL BUREAU OF ECONOMIC RESEARCH1050 Massachusetts AvenueCambridge, MA 02138September 2010We thank Lisa Barrow, David Card, Gary Chamberlain, Elizabeth Cascio, Janet Currie, Jeremy Finn,Edward Glaeser, Bryan Graham, James Heckman, Caroline Hoxby, Guido Imbens, Thomas Kane,Lawrence Katz, Alan Krueger, Derek Neal, Jonah Rockoff, Douglas Staiger, numerous seminar participants,and anonymous referees for helpful discussions and comments. We thank Helen Bain and Jayne Zahariasat HEROS for access to the Project STAR data. The tax data were accessed through contract TIRNO-09-R-00007with the Statistics of Income (SOI) Division at the US Internal Revenue Service. Gregory Bruich,Jane Choi, Jessica Laird, Keli Liu, Laszlo Sandor, and Patrick Turley provided outstanding researchassistance. Financial support from the Lab for Economic Applications and Policy at Harvard, the Centerfor Equitable Growth at UC Berkeley, and the National Science Foundation is gratefully acknowledged.The views expressed herein are those of the authors and do not necessarily reflect the views of theNational Bureau of Economic Research.NBER working papers are circulated for discussion and comment purposes. They have not been peerreviewed or been subject to the review by the NBER Board of Directors that accompanies officialNBER publications. 2010 by Raj Chetty, John N. Friedman, Nathaniel Hilger, Emmanuel Saez, Diane Whitmore Schanzenbach,and Danny Yagan. All rights reserved. Short sections of text, not to exceed two paragraphs, may bequoted without explicit permission provided that full credit, including notice, is given to the source.

How Does Your Kindergarten Classroom Affect Your Earnings? Evidence From Project STARRaj Chetty, John N. Friedman, Nathaniel Hilger, Emmanuel Saez, Diane Whitmore Schanzenbach,and Danny YaganNBER Working Paper No. 16381September 2010, Revised August 2011JEL No. H0,J0ABSTRACTIn Project STAR, 11,571 students in Tennessee and their teachers were randomly assigned to classroomswithin their schools from kindergarten to third grade. This paper evaluates the long-term impacts ofSTAR by linking the experimental data to administrative records. We first demonstrate that kindergartentest scores are highly correlated with outcomes such as earnings at age 27, college attendance, homeownership, and retirement savings. We then document four sets of experimental impacts. First, studentsin small classes are significantly more likely to attend college and exhibit improvements on other outcomes.Class size does not have a significant effect on earnings at age 27, but this effect is imprecisely estimated.Second, students who had a more experienced teacher in kindergarten have higher earnings. Third,an analysis of variance reveals significant classroom effects on earnings. Students who were randomlyassigned to higher quality classrooms in grades K-3 – as measured by classmates' end-of-class testscores – have higher earnings, college attendance rates, and other outcomes. Finally, the effects ofclass quality fade out on test scores in later grades but gains in non-cognitive measures persist.Raj ChettyDepartment of EconomicsHarvard University1805 Cambridge St.Cambridge, MA 02138and [email protected] SaezDepartment of EconomicsUniversity of California, Berkeley549 Evans Hall #3880Berkeley, CA 94720and [email protected] N. FriedmanHarvard Kennedy SchoolTaubman 35679 JFK St.Cambridge, MA 02138and NBERjohn [email protected] Whitmore SchanzenbachSchool of Education and Social PolicyNorthwestern UniversityAnnenberg Hall, Room 2052120 Campus DriveEvanston, IL 60208and [email protected] HilgerDepartment of EconomicsHarvard University1805 Cambridge St.Cambridge, MA [email protected] YaganDepartment of EconomicsLittauer 200, North YardHarvard UniversityCambridge, MA [email protected]

I. IntroductionWhat are the long-term impacts of early childhood education? Evidence on this important policyquestion remains scarce because of a lack of data linking childhood education and outcomes inadulthood. This paper analyzes the long-term impacts of Project STAR, one of the most widelystudied education experiments in the United States.The Student/Teacher Achievement Ratio(STAR) experiment randomly assigned one cohort of 11,571 students and their teachers to di erentclassrooms within their schools in grades K-3. Some students were assigned to small classes (15students on average) in grades K-3, while others were assigned to large classes (22 students onaverage).The experiment was implemented across 79 schools in Tennessee from 1985 to 1989.Numerous studies have used the STAR experiment to show that class size, teacher quality, andpeers have signi cant causal impacts on test scores (see Schanzenbach 2006 for a review). Whetherthese gains in achievement on standardized tests translate into improvements in adult outcomessuch as earnings remains an open question.We link the original STAR data to administrative data from tax returns, allowing us to follow95% of the STAR participants into adulthood.1 We use these data to analyze the impacts of STARon outcomes ranging from college attendance and earnings to retirement savings, home ownership,and marriage. We begin by documenting the strong correlation between kindergarten test scoresand adult outcomes. A one percentile increase in end-of-kindergarten (KG) test scores is associatedwith a 132 increase in wage earnings at age 27 in the raw data, and a 94 increase after controllingfor parental characteristics.Several other adult outcomes – such as college attendance rates,quality of college attended, home ownership, and 401(k) savings – are also all highly correlatedwith kindergarten test scores. These strong correlations motivate the main question of the paper:do classroom environments that raise test scores – such as smaller classes and better teachers –cause analogous improvements in adult outcomes?Our analysis of the experimental impacts combines two empirical strategies. First, we study theimpacts of observable classroom characteristics. We analyze the impacts of class size using the sameintent-to-treat speci cations as Krueger (1999), who showed that students in small classes scoredhigher on standardized tests. We nd that students assigned to small classes are 1.8 percentagepoints more likely to be enrolled in college at age 20, a signi cant improvement relative to the mean1The data for this project were analyzed through a program developed by the Statistics of Income (SOI) Divisionat the U.S. Internal Revenue Service to support research into the e ects of tax policy on economic and social outcomesand improve the administration of the tax system.

college attendance rate of 26.4% at age 20 in the sample. We do not nd signi cant di erences inearnings at age 27 between students who were in small and large classes, although these earningsimpacts are imprecisely estimated. Students in small classes also exhibit statistically signi cantimprovements on a summary index of the other outcomes we examine (home ownership, 401(k)savings, mobility rates, percent college graduate in ZIP code, and marital status).We study variation across classrooms along other observable dimensions, such as teacher andpeer characteristics, using a similar approach. Prior studies (e.g. Krueger 1999) have shown thatSTAR students with more experienced teachers score higher on tests.We nd similar impactson earnings. Students randomly assigned to a KG teacher with more than 10 years of experienceearn an extra 1; 093 (6.9% of mean income) on average at age 27 relative to students with lessexperienced teachers.2 We also test whether observable peer characteristics have long-term impactsby regressing earnings on the fraction of low-income, female, and black peers in KG. These peerimpacts are not signi cant, but are very imprecisely estimated because of the limited variation inpeer characteristics across classrooms.Because we have few measures of observable classroom characteristics, we turn to a secondempirical strategy that captures both observed and unobserved aspects of classrooms. We use ananalysis of variance approach analogous to that in the teacher e ects literature to test whetherearnings are clustered by kindergarten classroom.Because we observe each teacher only oncein our data, we can only estimate “class e ects” – the combined e ect of teachers, peers, andany class-level shock – by exploiting random assignment to KG classrooms of both students andteachers. Intuitively, we test whether earnings vary across KG classes by more than what wouldbe predicted by random variation in student abilities. An F test rejects the null hypothesis thatKG classroom assignment has no e ect on earnings.The standard deviation of class e ects onannual earnings is approximately 10% of mean earnings, highlighting the large stakes at play inearly childhood education.The analysis of variance shows that kindergarten classroom assignment has signi cant impactson earnings, but it does not tell us whether classrooms that improve scores also generate earningsgains.That is, are class e ects on earnings correlated with class e ects on scores? To analyzethis question, we proxy for each student’s KG “class quality” by the average test scores of hisclassmates at the end of kindergarten. We show that end-of-class peer test scores are an omnibus2Because teacher experience is correlated with many other unobserved attributes – such as attachment to theteaching profession –we cannot conclude that increasing teacher experience would improve student outcomes. Thisevidence simply establishes that a student’s KG teacher has e ects on his or her earnings as an adult.2

measure of class quality because they capture peer e ects, teacher e ects, and all other classroomcharacteristics that a ect test scores. Using this measure, we nd that kindergarten class qualityhas signi cant impacts on both test scores and earnings. Students randomly assigned to a classroomthat is one standard deviation higher in quality earn 3% more at age 27.Students assigned tohigher quality classes are also signi cantly more likely to attend college, enroll in higher qualitycolleges, and exhibit improvements in the summary index of other outcomes.The class qualityimpacts are similar for students who entered the experiment in grades 1-3 and were randomizedinto classes at that point. Hence, the ndings of this paper should be viewed as evidence on thelong-term impacts of early childhood education rather than kindergarten in particular.Our analysis of “class quality”must be interpreted very carefully. The purpose of this analysisis to detect clustering in outcomes at the classroom level: are a child’s outcomes correlated withhis peers’outcomes? Although we test for such clustering by regressing own scores and earningson peer test scores, we emphasize that such regressions are not intended to detect peer e ects.Because we use post-intervention peer scores as the regressor, these scores incorporate the impactsof peer quality, teacher quality, and any random class-level shock (such as noise from constructionoutside the classroom). The correlation between own outcomes and peer scores could be due toany of these factors.Our analysis shows that the classroom a student was assigned to in earlychildhood matters for outcomes 20 years later, but does not shed light on which speci c factorsshould be manipulated to improve adult outcomes. Further research on which factors contributeto high “class quality” would be extremely valuable in light of the results reported here.The impacts of early childhood class assignment on adult outcomes may be particularly surprising because the impacts on test scores “fade out”rapidly. The impacts of class size on test scoresbecome statistically insigni cant by grade 8 (Krueger and Whitmore 2001), as do the impacts ofclass quality on test scores.Why do the impacts of early childhood education fade out on testscores but re-emerge in adulthood? We nd some suggestive evidence that part of the explanationmay be non-cognitive skills. We nd that KG class quality has signi cant impacts on non-cognitivemeasures in 4th and 8th grade such as e ort, initiative, and lack of disruptive behavior.Thesenon-cognitive measures are highly correlated with earnings even conditional on test scores but arenot signi cant predictors of future standardized test scores. These results suggest that high qualityKG classrooms may build non-cognitive skills that have returns in the labor market but do notimprove performance on standardized tests. While this evidence is far from conclusive, it highlightsthe value of further empirical research on non-cognitive skills.3

In addition to the extensive literature on the impacts of STAR on test scores, our study buildson and contributes to a recent literature investigating selected long-term impacts of class size inthe STAR experiment. These studies have shown that students assigned to small classes are morelikely to complete high school (Finn, Gerber, and Boyd-Zaharias 2005) and take the SAT or ACTcollege entrance exams (Krueger and Whitmore 2001) and are less likely to be arrested for crime(Krueger and Whitmore 2001). Most recently, Muennig et al. (2010) report that students in smallclasses have higher mortality rates, a nding that we do not obtain in our data as we discuss below.We contribute to this literature by providing a uni ed evaluation of several outcomes, includingthe rst analysis of earnings, and by examining the impacts of teachers, peers, and other attributesof the classroom in addition to class size.Our results also complement the ndings of studies on the long-term impacts of other earlychildhood interventions, such as the Perry and Abecederian preschool demonstrations and theHead Start program, which also nd lasting impacts on adult outcomes despite fade-out on testscores (see Almond and Currie 2010 for a review). We show that a better classroom environmentfrom ages 5-8 can have substantial long-term bene ts even without intervention at earlier ages.The paper is organized as follows.In Section II, we review the STAR experimental designand address potential threats to the validity of the experiment. Section III documents the crosssectional correlation between test scores and adult outcomes. Section IV analyzes the impacts ofobservable characteristics of classrooms – size, teacher characteristics, and peer characteristics –on adult outcomes. In Section V, we study class e ects more broadly, incorporating unobservableaspects of class quality.Section VI documents the fade-out and re-emergence e ects and thepotential role of non-cognitive skills in explaining this pattern. Section VI concludes.II. Experimental Design and DataII.A. Background on Project STARWord et al. (1990), Krueger (1999), and Finn et al. (2007) provide a comprehensive summary ofProject STAR; here, we brie‡y review the features of the STAR experiment most relevant for ouranalysis. The STAR experiment was conducted at 79 schools across the state of Tennessee overfour years. The program oversampled lower-income schools, and thus the STAR sample exhibitslower socioeconomic characteristics than the state of Tennessee and the U.S. population as a whole.In the 1985-86 school year, 6,323 kindergarten students in participating schools were randomlyassigned to a small (target size 13-17 students) or regular-sized (20-25 students) class within their4

schools.3 Students were intended to remain in the same class type (small vs. large) through 3rdgrade, at which point all students would return to regular classes for 4th grade and subsequent years.As the initial cohort of kindergarten students advanced across grade levels, there was substantialattrition because students who moved away from a participating school or were retained in gradeno longer received treatment. In addition, because kindergarten was not mandatory and due tonormal residential mobility, many children joined the initial cohort at the participating schools afterkindergarten. A total of 5,248 students entered the participating schools in grades 1-3. These newentrants were randomly assigned to classrooms within school upon entry. Thus all students wererandomized to classrooms within school upon entry, regardless of the entry grade. As a result, therandomization pool is school-by-entry-grade, and we include school-by-entry-grade xed e ects inall experimental analyzes below.Upon entry into one of the 79 schools, the study design randomly assigned students not onlyto class type (small vs. large) but also to a classroom within each type (if there were multipleclassrooms per type, as was the case in 50 of the 79 schools). Teachers were also randomly assignedto classrooms.Unfortunately, the exact protocol of randomization into speci c classrooms wasnot clearly documented in any of the o cial STAR reports, where the emphasis was instead therandom assignment into class type rather than classroom (Word et al. 1990). We present statisticalevidence con rming that both students and teachers indeed appear to be randomly assigned directlyto classrooms upon entry into the STAR project, as the original designers attest.As in any eld experiment, there were some deviations from the experimental protocol.Inparticular, some students moved from large to small classes and vice versa. To account for suchpotentially non-random sorting, we adopt the standard approach taken in the literature and assigntreatment status based on initial random assignment (intent-to-treat).In each year, students were administered the grade-appropriate Stanford Achievement Test, amultiple choice test that measures performance in math and reading. These tests were given onlyto students participating in STAR, as the regular statewide testing program did not extend to theearly grades.4 Following Krueger (1999), we standardize the math and reading scale scores in eachgrade by computing the scale score’s corresponding percentile rank in the distribution for students3There was also a third treatment group: regular sized class with a full-time teacher’s aide. This was a relativelyminor intervention, since all regular classes were already assigned a 1/3 time teacher’s aide. Prior studies of STAR nd no impact of a full-time teacher’s aide on test scores. We follow the convention in the literature and group theregular and regular plus aide class treatments together.4These K-3 test scores contain considerable predictive content. As reported in Krueger Whitmore (2001), thecorrelation between test scores in grades g and g 1 is 0.65 for KG and 0.80 for each grade 1-3. The values for grades4-7 lie between 0.83 and 0.88, suggesting that the K-3 test scores contain similar predictive content.5

in large classes. We then assign the appropriate percentile rank to students in small classes andtake the average across math and reading percentile ranks. Note that this percentile measure is aranking of students within the STAR sample.II.B. Variable De nitions and Summary StatisticsWe measure adult outcomes of Project STAR participants using administrative data from UnitedStates tax records. 95.0% of STAR records were linked to the tax data using an algorithm based onstandard identi ers (SSN, date of birth, gender, and names) that is described in Online AppendixA.5We obtain data on students and their parents from federal tax forms such as 1040 individualincome tax returns.Information from 1040’s is available from 1996-2008.of adults do not le individual income tax returns in a given year.Approximately 10%We use third-party reportsto obtain information such as wage earnings (form W-2) and college attendance (form 1098-T) forall individuals, including those who do not le 1040s.Data from these third-party reports areavailable since 1999. The year always refers to the tax year (i.e., the calendar year in which theincome is earned or the college expense incurred). In most cases, tax returns for tax year t are ledduring the calendar year t 1. The analysis dataset combines selected variables from individual taxreturns, third party reports, and information from the STAR database, with individual identi ersremoved to protect con dentiality.We now describe how each of the adult outcome measures and control variables used in theempirical analysis is constructed.Table I reports summary statistics for these variables for theSTAR sample as well as a random 0.25% sample of the US population born in the same years(1979-1980).Earnings. The individual earnings data come from W-2 forms, yielding information on earningsfor both lers and non- lers.6 We de ne earnings in each year as the sum of earnings on all W-2forms led on an individual’s behalf. We express all monetary variables in 2009 dollars, adjustingfor in‡ation using the Consumer Price Index. We cap earnings in each year at 100,000 to reducethe in‡uence of outliers; fewer than 1% of individuals in the STAR sample report earnings above5All appendix material is available as an on-line appendix posted as supplementary material to the article. Notethat the matching algorithm was su ciently precise that it uncovered 28 cases in the original STAR dataset that werea single split observation or duplicate records. After consolidating these records, we are left with 11,571 students.6We obtain similar results using household adjusted gross income reported on individual tax returns. We focus onthe W-2 measure because it provides a consistent de nition of individual wage earnings for both lers and non- lers.One limitation of the W-2 easure is that it does not include self-employment income.6

100,000 in a given year. To increase precision, we typically use average (in‡ation indexed) earningsfrom year 2005 to 2007 as an outcome measure. The mean individual earnings for the STAR samplein 2005-2007 (when the STAR students are 25-27 years old) is 15,912.This earnings measureincludes zeros for the 13.9% of STAR students who report no income 2005-2007. The mean level ofearnings in the STAR sample is lower than in the same cohort in the U.S. population, as expectedgiven that Project STAR targeted more disadvantaged schools.College Attendance. Higher education institutions eligible for federal nancial aid – Title IVinstitutions –are required to le 1098-T forms that report tuition payments or scholarships receivedfor every student.7 Title IV institutions include all colleges and universities as well as vocationalschools and other postsecondary institutions.Comparisons to other data sources indicate that1098-T forms accurately capture US college enrollment.8 We have data on college attendance from1098-T forms for all students in our sample since 1999, when the STAR students were 19 yearsold. We de ne college attendance as an indicator for having one or more 1098-T forms led onone’s behalf in a given year. In the STAR sample, 26.4% of students are enrolled in college at age20 (year 2000). 45.5% of students are enrolled in college at some point between 1999 and 2007,compared with 57.1% in the same cohort of the U.S. population. Because the data are based purelyon tuition payments, we have no information about college completion or degree attainment.College Quality.Using the institutional identi ers on the 1098-T forms, we construct anearnings-based index of college quality as follows. First, using the full population of all individualsin the United States aged 20 on 12/31/1999 and all 1098-T forms for year 1999, we group individualsby the higher education institution they attended in 1999. This sample contains over 1.4 millionindividuals.9 We take a 1% sample of those not attending a higher education institution in 1999,comprising another 27,733 individuals, and pool them together in a separate “no college”category.Next, we compute average earnings of the students in 2007 when they are aged 28 by groupingstudents according to the educational institution they attended in 1999.This earnings-basedindex of college quality is highly correlated with the US News ranking of the best 125 colleges and7These forms are used to administer the Hope and Lifetime Learning education tax credits created by the TaxpayerRelief Act of 1997. Colleges are not required to le 1098-T forms for students whose quali ed tuition and relatedexpenses are waived or paid entirely with scholarships or grants; however, in many instances the forms are availableeven for such cases, perhaps because of automation at the university level.8In 2009, 27.4 million 1098-T forms were issued (Internal Revenue Service, 2010). According to the CurrentPopulation Survey (US Census Bureau, 2010, Tables V and VI), in October 2008, there were 22.6 million students inthe U.S. (13.2 million full time, 5.4 million part-time, and 4 million vocational). As an individual can be a studentat some point during the year but not in October and can receive a 1098-T form from more than one institution, thenumber of 1098-T forms for the calendar year should indeed be higher than the number of students as of October.9Individuals who attended more than one institution in 1999 are counted as students at all institutions theyattended.7

universities: the correlation coe cient of our measure and the log US news rank is 0.75.Theadvantages of our index are that while the US News ranking only covers the top 125 institutions,ours covers all higher education institutions in the U.S. and provides a simple cardinal metric forcollege quality.Among colleges attended by STAR students, the average value of our earningsindex is 35,080 for four-year colleges and 26,920 for two-year colleges.10 For students who didnot attend college, the imputed mean wage is 16,475.Other Outcomes.We identify spouses using information from 1040 forms.For individualswho le tax returns, we de ne an indicator for marriage based on whether the tax return is ledjointly.We code non- lers as single because most non- lers in the U.S. who are not receivingSocial Security bene ts are single (Cilke 1998, Table I). We de ne a measure of ever being marriedby age 27 as an indicator for ever ling a joint tax return in any year between 1999 and 2007. Bythis measure, 43.2% of individuals are married at some point before age 27.We measure retirement savings using contributions to 401(k) accounts reported on W-2 formsfrom 1999-2007.28.2% of individuals in the sample make a 401(k) contribution at some pointduring this period.We measure home ownership using data from the 1098 form, a third partyreport led by lenders to report mortgage interest payments. We include the few individuals whoreport a mortgage deduction on their 1040 forms but do not have 1098’s as homeowners. We de neany individual who has a mortgage interest deduction at any point between 1999 and 2007 as ahomeowner. Note that this measure of home ownership does not cover individuals who own homeswithout a mortgage, which is rare among individuals younger than 27. By our measure, 30.8% ofindividuals own a home by age 27. We use data from 1040 forms to identify each household’s ZIPcode of residence in each year. For non- lers, we use the ZIP code of the address to which the W-2form was mailed. If an individual did not le and has no W-2 in a given year, we impute currentZIP code as the last observed ZIP code. We de ne a measure of cross-state mobility by an indicatorfor whether the individual ever lived outside Tennessee between 1999 and 2007. 27.5% of STARstudents lived outside Tennessee at some point between age 19 and 27. We construct a measureof neighborhood quality using data on the percentage of college graduates in the individual’s 2007ZIP code from the 2000 Census. On average, STAR students lived in 2007 in neighborhoods with17.6% college graduates.We observe dates of birth and death until the end of 2009 as recorded by the Social Security10For the small fraction of STAR students who attend more than one college in a single year, we de ne collegequality based on the college that received the largest tuition payments on behalf of the student.8

Administration. We de ne each STAR participant’s age at kindergarten entry as the student’s age(in days divided by 365.25) as of September 1, 1985. Virtually all students in STAR were born inthe years 1979-1980. To simplify the exposition, we say that the cohort of STAR children is ageda in year 1980 a (e.g., STAR children are 27 in 2007). Approximately 1.7% of the STAR sampleis deceased by 2009.Parent Characteristics.We link STAR children to their parents by nding the earliest 1040form from 1996-2008 on which the STAR student was claimed as dependents. Most matches werefound on 1040 forms for the tax year 1996, when the STAR children were 16. We identify parentsfor 86% of the STAR students in ou

John N. Friedman Harvard Kennedy School Taubman 356 79 JFK St. Cambridge, MA 02138 and NBER [email protected] Nathaniel Hilger Department of Economics Harvard University 1805 Cambridge St. Cambridge, MA 02138 [email protected] Emmanuel Saez Department of Economics University