Transcription

NCEE 2009-4039U . S . D E PA R T M E N T O F E D U C AT I O NReading First Impact StudyFinal ReportExecutive Summary

Reading First Impact StudyFinal ReportExecutive SummaryNOVEMBER 2008Beth C. Gamse, Project Director, Abt AssociatesRobin Tepper Jacob, Abt Associates/University of MichiganMegan Horst, Abt AssociatesBeth Boulay, Abt AssociatesFatih Unlu, Abt AssociatesLaurie BozziLinda CaswellChris RodgerW. Carter SmithAbt AssociatesNancy BrighamSheila RosenblumRosenblum Brigham AssociatesWith the assistance ofHoward BloomYequin HeCorinne HerlihyJames KempleDon LalibertyKen LamKenyon MareeRachel McCormickRebecca UntermanPei ZhuNCEE 2009-4039U.S. DEPARTMENT OF EDUCATION

This report was prepared for the Institute of Education Sciences under Contract No. ED-01-CO0093/0004. The project officer was Tracy Rimdzius in the National Center for Education Evaluation andRegional Assistance.U.S. Department of EducationMargaret SpellingsSecretaryInstitute of Education SciencesGrover J. WhitehurstDirectorNational Center for Education Evaluation and Regional AssistancePhoebe CottinghamCommissionerNovember 2008This report is in the public domain. Authorization to reproduce it in whole or in part is granted. Whilepermission to reprint this publication is not necessary, the citation should be: Gamse, B.C., Jacob, R.T.,Horst, M., Boulay, B., and Unlu, F. (2008). Reading First Impact Study Final Report Executive Summary(NCEE 2009-4039). Washington, DC: National Center for Education Evaluation and RegionalAssistance, Institute of Education Sciences, U.S. Department of Education.IES evaluation reports present objective information on the conditions of implementation and impacts ofthe programs being evaluated. IES evaluation reports do not include conclusions or recommendations orviews with regard to actions policymakers or practitioners should take in light of the findings in thereports.To order copies of this report, Write to ED Pubs, Education Publications Center, U.S. Department of Education, P.O. Box 1398,Jessup, MD 20794-1398.Call in your request toll free to 1-877-4ED-Pubs. If 877 service is not yet available in your area,call 800-872-5327 (800-USA-LEARN). Those who use a telecommunications device for the deaf(TDD) or a teletypewriter (TTY) should call 800-437-0833.Fax your request to 301-470-1244.Order online at www.edpubs.org.This report also is available on the IES website at http://ncee.ed.gov.Alternate FormatsUpon request, this report is available in alternate formats such as Braille, large print, audiotape, orcomputer diskette. For more information, please contact the Department's Alternate Format Center at 202260-9895 or 202-205-8113.

AcknowledgementsThe Reading First Impact Study Team would like to thank the students, faculty, and staff in the study’sparticipating schools and districts. Their contributions to the study (via assessments, observations,surveys, and more) are deeply appreciated. We are the beneficiaries of their generosity of time and spirit.The listed authors of this report represent only a small part of the team involved in this project. We wouldlike to acknowledge the support of staff from Computer Technology Services (for the study’s datacollection website), from DataStar (for data entry), from MDRC, from Retail Solutions at Work (and thehundreds of classroom observers who participated in intensive training and data collection activities),from Paladin Pictures (for developing training videos for classroom observations), from RMC Research(especially Chris Dwyer, for help on developing instruments and on training observers), from RosenblumBrigham Associates (for district site visits), from Westat (Sherry Sanborne and Alex Ratnofsky, formanaging the student assessment, and the Student Assessment Coordinators and test administrators), andfrom Westover (Wanda Camper, LaKisha Dyson, and Pamela Wallace for helping with meetinglogistics).The study has also benefited from both external and internal technical advisors, including:External AdvisorsJosh AngristDavid CardRobert BrennanThomas Cook*Jack Fletcher*David FrancisLarry Hedges*Robinson Hollister*Guido ImbensBrian JacobDavid LeeSean ReardonTim Shanahan*Judy SingerJeff SmithFaith Stevens*Petra ToddWilbert Van der KlaauwSharon Vaughn*Internal AdvisorsSteve Bell (A)Gordon Berlin (M)Nancy Burstein (A)Fred Doolittle (M)Barbara Goodson (A)John Hutchins (M)Jacob Klerman (A)Marc Moss (A)Chuck Michalopoulous (M)Larry Orr (A)Cris Price (A)Janet Quint (M)Howard Rolston (A)(A—Abt Associates)(M—MDRC)* Individuals who have served on the study’s Technical Work GroupWe also want to recognize the steady contributions of Abt-SRBI staff, including Brenda Rodriguez, FranCoffey, Kay Ely, Joanne Melton, Judy Meyer, Lynn Reneau, Davyd Roskilly, Jon Schmalz, Estella Sena,and Judy Walker, who were instrumental in completing multiple data collections, and Eileen Fahey,Katheleen Linton, and Jan Nicholson for countless hours of production support. Finally, we want toacknowledge Diane Greene, whose wisdom helped us all.Acknowledgementsiii

Disclosure of Potential Conflicts of Interests 1The research team for this evaluation consists of a prime contractor, Abt Associates, and two majorsubcontractors, MDRC and Westat. None of these organizations or their key staff has financial intereststhat could be affected by findings from the Reading First Impact Study. No one on the Technical WorkGroup, convened to provide advice and guidance, has financial interests that could be affected by findingsfrom the evaluation.1ivContractors carrying out research and evaluation projects for IES frequently need to obtain expert advice and technicalassistance from individuals and entities whose other professional work may not be entirely independent of or separable fromthe particular tasks they are carrying out for the IES contractor. Contractors endeavor not to put such individuals or entities inpositions in which they could bias the analysis and reporting of results, and their potential conflicts of interest are disclosed.Disclosure of Potential Conflict of Interests

Executive SummaryThis report presents findings from the third and final year of the Reading First Impact Study (RFIS), acongressionally mandated evaluation of the federal government’s 1.0 billion-per-year initiative to helpall children read at or above grade level by the end of third grade. The No Child Left Behind Act of 2001(PL 107-110, Title I, Part B, Subpart 1) established Reading First (RF) and mandated its evaluation. Thisevaluation is being conducted by Abt Associates and MDRC with collaboration from RMC Research,Rosenblum-Brigham Associates, Westat, Computer Technology Services, DataStar, Field MarketingIncorporated, and Westover Consulting, under the oversight of the U.S. Department of Education,Institute of Education Sciences (IES).This report examines the impact of Reading First funding on 248 schools in 13 states and includes 17school districts and one statewide program for a total of 18 sites. The study includes data from threeschool years: 2004-05, 2005-06 and 2006-07.The Reading First Impact Study was commissioned to address the following questions:1) What is the impact of Reading First on student reading achievement?2) What is the impact of Reading First on classroom instruction?3) What is the relationship between the degree of implementation of scientifically based readinginstruction and student reading achievement?The primary measure of student reading achievement was the Reading Comprehension subtest from theStanford Achievement Test—10 (SAT 10), given to students in grades one, two, and three. A secondarymeasure of student reading achievement in decoding was given to students in first grade. The measure ofclassroom reading instruction was derived from direct observations of reading instruction, and measuresof program implementation were derived from surveys of educational personnel. Findings related to thefirst two questions are based on results pooled across the study’s three years of data collection (2004-05,2005-06, and 2006-07) for classroom instruction and reading comprehension, results from first gradestudents in one school year (spring 2007) for decoding, and aspects of program implementation fromspring 2007 surveys. Key findings are as follows: Reading First produced a positive and statistically significant impact on amount ofinstructional time spent on the five essential components of reading instruction promoted bythe program (phonemic awareness, phonics, vocabulary, fluency, and comprehension) ingrades one and two. The impact was equivalent to an effect size of 0.33 standard deviationsin grade one and 0.46 standard deviations in grade two. Reading First produced positive and statistically significant impacts on multiple practices thatare promoted by the program, including professional development in scientifically basedreading instruction (SBRI), support from full-time reading coaches, amount of readinginstruction, and supports available for struggling readers. Reading First did not produce a statistically significant impact on student readingcomprehension test scores in grades one, two or three.Final Report: Executive Summaryv

Reading First produced a positive and statistically significant impact on decoding among firstgrade students tested in one school year (spring 2007). The impact was equivalent to an effectsize of 0.17 standard deviations.Results are also presented from exploratory analyses that examine some hypotheses about factors thatmight account for the observed patterns of impacts. These analyses are considered exploratory becausethe study was not designed to provide a rigorous test of these hypotheses, and therefore the results mustbe considered as suggestive. Across different potential predictors of student outcomes, these exploratoryanalyses are based on different subgroups of students, schools, grade levels, and/or years of datacollection. Key findings from these exploratory analyses are as follows: There was no consistent pattern of effects over time in the impact estimates for readinginstruction in grade one or in reading comprehension in any grade. There appeared to be asystematic decline in reading instruction impacts in grade two over time. There was no relationship between reading comprehension and the number of years a studentwas exposed to RF. There is no statistically significant site-to-site variation in impacts, either by grade or overall,for classroom reading instruction or student reading comprehension. There is a positive association between time spent on the five essential components ofreading instruction promoted by the program and reading comprehension measured by theSAT 10, but these findings are sensitive to both model specification and the sample used toestimate the relationship.The Reading First ProgramReading First promotes instructional practices that have been validated by scientific research (No ChildLeft Behind Act, 2001). The legislation explicitly defines scientifically based reading research andoutlines the specific activities state, district, and school grantees are to carry out based upon such research(No Child Left Behind Act, 2001). The Guidance for the Reading First Program provides further detail tostates about the application of research-based approaches in reading (U.S. Department of Education,2002). Reading First funding can be used for: Reading curricula and materials that focus on the five essential components of readinginstruction as defined in the Reading First legislation: 1) phonemic awareness, 2) phonics, 3)vocabulary, 4) fluency, and 5) comprehension; Professional development and coaching for teachers on how to use scientifically basedreading practices and how to work with struggling readers; Diagnosis and prevention of early reading difficulties through student screening,interventions for struggling readers, and monitoring of student progress.Reading First is an ambitious federal program, yet it is also a funding stream that combines localflexibility and national commonalities. The commonalities are reflected in the guidelines to states anddistricts and schools about allowable uses of resources. The flexibility is reflected in two ways: one, states(and districts) could allocate resources to various categories within target ranges rather than on a strictlyformulaic basis, and two, states could make local decisions about the specific choices within givencategories (e.g., which materials, reading programs, assessments, professional development providers,viFinal Report: Executive Summary

etc.). The activities, programs, and resources that were likely to be implemented across states and districtswould therefore reflect both national priorities and local interpretations.Reading First grants were made available to states between July 2002 and September 2003. By April2007, states had awarded subgrants to 1,809 school districts, which had provided funds to 5,880 schools. 2Districts and schools with the greatest demonstrated need, in terms of student reading proficiency andpoverty status, were intended to have the highest funding priority (U.S. Department of Education, 2002).States could reserve up to 20 percent of their Reading First funds to support staff development, technicalassistance to districts and schools, and planning, administration and reporting. According to the programguidance, this funding provided “States with the resources and opportunity to improve instructionbeyond the specific districts and schools that receive Reading First subgrants.” (U.S. Department ofEducation, 2002). Districts could reserve up to 3.5 percent of their Reading First funds for planning andadministration (No Child Left Behind Act, 2001). For the purposes of this study, Reading First is definedas the receipt of Reading First funding at the school level.The Reading First Impact StudyResearch DesignThe Reading First Impact Study uses a regression discontinuity design that capitalizes on the systematicprocesses some school districts used to allocate Reading First funds once their states had received RFgrants. 3 A regression discontinuity design is the strongest quasi-experimental method available to produceunbiased estimates of program impacts. Under certain conditions, all of which are met by the presentstudy, this method can produce unbiased estimates of program impacts. Within each district or site:1) Schools eligible for Reading First grants were rank-ordered for funding based on aquantitative rating, such as an indicator of past student reading performance or poverty; 42) A cut-point in the rank-ordered priority list separated schools that did or did not receiveReading First grants, and this cut-point was set without knowing which schools would thenreceive funding; and3) Funding decisions were based only on whether a school’s rating was above or below its localcut-point; nothing superseded these decisions.Also, assuming that the shape of the relationship between schools’ ratings and outcomes is correctlymodeled, once the above conditions have been met, there should be no systematic differences betweeneligible schools that did and did not receive Reading First grants (Reading First and non-Reading Firstschools respectively), except for the characteristics associated with the school ratings used to determinefunding decisions. Controlling for differences in schools’ ratings allows one to control statistically for allsystematic pre-existing differences between the two groups. One then can estimate the impact of ReadingFirst by comparing the outcomes for Reading First schools and non-Reading First schools in the study2Data were obtained from the SEDL website (www.sedl.org/readingfirst).3Appendix A in the full report indicates when study sites first received their Reading First grants.4Each study site could (and did) use different metrics to rate or rank schools; it is not necessary for all study sites to use thesame metric.Final Report: Executive Summaryvii

sample, controlling for differences in their ratings. Non-Reading First schools in a regressiondiscontinuity analysis thereby play the same role as do control schools in a randomized experiment—it istheir regression-adjusted outcomes that represent the best indications of what outcomes would have beenfor the treatment group (in this instance, Reading First schools) in the absence of the program beingevaluated.Study SampleThe study sample was selected purposively to meet the requirements of the regression discontinuitydesign by selecting a sample of sites that had used a systematic rating or ranking process to select theirReading First school grantees. Within these sites, the selection of schools focused on schools as close tothe site-specific cut-points as possible in order to obtain schools that were as comparable as possible inthe treatment and comparison groups.The study sample includes 18 study sites: 17 school districts and one state-wide program. Sixteen districtsand one state-wide program were selected from among 28 districts and one state-wide program that haddemonstrably met the three criteria listed above. One other school district agreed to randomly assign someof its eligible schools to Reading First or a control group. The final selection reflected wide variation indistrict characteristics and provided enough schools to meet the study’s sample size requirements. Theregression discontinuity sites provide 238 schools for the analysis, and the randomized experimental siteprovides 10 schools. Half the schools at each site are Reading First schools and half are non-Reading Firstschools: in three sites, the study sample includes all the RF schools (in that site), in the remaining 15 sites,the study sample includes some, but not all, of the RF schools (in that site).At the same time, the study deliberately endeavored to obtain a sample that was geographically diverseand as similar as possible to the population of all RF schools. The final study sample of 248 schools, 125of which are Reading First schools, represents 44 percent of the Reading First schools in their respectivesites (at the time the study selected its sample in 2004). The study’s sample of RF schools is large, is quitesimilar to the population of all RF schools, is geographically diverse, and represents states (and districts)that received their RF grants across the range of RF state award dates. The average Year 1 grant for RFschools in the study sample ranged from about 81,790 to 708,240, with a mean of 188,782. Thistranslates to an average of 601 per RF student. For more detailed information about the selection processand the study sample, see the study’s Interim Report (Gamse, Bloom, Kemple & Jacob, 2008).Data Collection Schedule and MeasuresExhibit ES.1 summarizes the study’s three-year, multi-source data collection plan. The present report isbased on data for school years 2004-05, 2005-06, and 2006-07. Data collection included studentassessments in reading comprehension and decoding, and classroom observations of teachers’instructional practices in reading, teachers’ instructional organization and order, and students’engagement with print. Data were also collected through surveys of teachers, reading coaches, andprincipals, and interviews of district personnel.viiiFinal Report: Executive Summary

Exhibit ES.1: Data Collection Schedule for the Reading First Impact Study2004-2005Data Collection ElementsStudent pring99999999thStanford Achievement Test, 10 Edition(SAT 10)Test of Silent Word Reading Fluency(TOSWRF)9Classroom Observations99999Instructional Practice in ReadingInventory (IPRI)99999Student Time-on-Task andEngagement with Print (STEP)9999Global Appraisal of Teaching Strategies(GATS)9999Teacher, Principal, Reading CoachSurveys99District Staff Interviews99Exhibit ES.2 lists the principal domains for the study, the outcome measures within each domain, and thedata sources for each measure. These include:Student reading performance, assessed with the reading comprehension subtest of the StanfordAchievement Test, 10th Edition (SAT 10, Harcourt Assessment, Inc., 2004). The SAT 10 wasadministered to students in grades one, two and three during fall 2004, spring 2005, spring 2006, andspring 2007, with an average completion rate of 83 percent across all administrations. In the spring of2007 only, first grade students were assessed with the Test of Silent Word Reading Fluency (TOSWRF,Mather et al., 2004), a measure designed to assess students’ ability to decode words from among stringsof letters. The average completion rate was 86 percent. Three outcome measures of student readingperformance were created from SAT 10 and TOSWRF data.Classroom reading instruction, assessed in first-grade and second-grade reading classes through anobservation system developed by the study team called the Instructional Practice in Reading Inventory(IPRI). Observations were conducted during scheduled reading blocks in each sampled classroom on twoconsecutive days during each wave of data collection: spring 2005, fall 2005 and spring 2006, and fall2006 and spring 2007. The average completion rate was 98 percent across all years. The IPRI, which isdesigned to record instructional behaviors in a series of three-minute intervals, can be used forobservations of varying lengths, reflecting the fact that schools’ defined reading blocks can and do vary.Most reading blocks are 90 minutes or more. Eight outcome measures of classroom reading instructionwere created from IPRI data to represent the components of reading instruction emphasized by theReading First legislation. 5 Six of these measures are reported in terms of the amount of time spent on the5For ease of explication, the measures created from IPRI data are referred to as the five dimensions of reading instruction (or“the five dimensions”) throughout the report. References to the programmatic emphases as required by legislation are labeledas the five essential components of reading instruction.Final Report: Executive Summaryix

Exhibit ES.2: Description of Domains, Outcome Measures, and Data Sources Utilized in theReading First Impact StudyDomainStudent ntengagementwith printxOutcome Measure and DescriptionstndSourcerdMean scaled scores for 1 , 2 , and 3 grade students, representedas a continuous measure of student reading comprehension. Becausescaled scores are continuous across grade levels, values for all threegrade levels can be shown on a single set of axes.Percentage of 1st, 2nd, and 3rd grade students at or above grade level,based upon established test norms that correspond to grade levelperformance, by grade and month. The on or above grade levelperformance percentages were based on the start of the school year, dateof the test and the scaled score, as well as the related grade equivalent.Mean standard scores for 1st grade students, represented as acontinuous measure of first grade students’ decoding skill.Minutes of instruction in phonemic awareness, or how muchinstructional time 1st and 2nd grade teachers spent on phonemicawareness.Minutes of instruction in phonics, or how much instructional time 1stand 2nd grade teachers spent on phonics.Minutes of instruction in fluency building, or how much instructionaltime 1st and 2nd grade teachers spent on fluency building.Minutes of instruction in vocabulary development, or how muchinstructional time 1st and 2nd grade teachers spent on vocabularydevelopment.Minutes of instruction in comprehension, or how much instructionaltime 1st and 2nd grade teachers spent on comprehension of connectedtext.Minutes of instruction in all five dimensions combined, or how muchinstructional time 1st and 2nd grade teachers spent on all five dimensionscombined.Proportion of each observation with highly explicit instruction, or theproportion of time spent within the five dimensions when teachers usedhighly explicit instruction (e.g., instruction included teacher modeling, clearexplanations, and the use of examples).Proportion of each observation with high quality student practice, orthe proportion of time spent within the five dimensions when teachersprovided students with high quality student practice opportunities (e.g.,teachers asked students to practice such word learning strategies ascontext, word structure, and meanings).Percentage of 1st and 2nd grade students engaged with print,represented as the per-classroom average of the percentage of studentsengaged with print across three sweeps in each classroom duringobserved reading instruction.StanfordAchievement Test,th10 Edition (SAT10)StanfordAchievement Test,th10 Edition (SAT10)Test of Silent WordReading FluencyRFIS InstructionalPractice in ReadingInventoryRFIS IPRIRFIS IPRIRFIS IPRIRFIS IPRIRFIS IPRIRFIS IPRIRFIS IPRIRFIS StudentTime-on-Task andEngagement withPrint (STEP)Final Report: Executive Summary

Exhibit ES.2: Description of Domains, Outcome Measures, and Data Sources Utilized in theReading First Impact Study (continued)DomainOutcome Measure and DescriptionSourceProfessionaldevelopment inscientificallybased readinginstructionAmount of PD in reading received by teachers, or teachers’ selfreported number of hours of professional development in reading during2006-07.Teacher receipt of PD in the five essential components of readinginstruction, or the number of essential components teachers reportedwere covered in professional development they received during 2006-07.Teacher receipt of coaching, or whether or not a teacher reportedreceiving coaching or mentoring from a reading coach in readingprograms, materials, or strategies in 2006-07.Amount of time dedicated to serving as K-3 reading coach, or readingcoaches’ self-reported percentage of time spent as the K-3 reading coachfor their school in 2006-07.Minutes of reading instruction per day, or teachers’ reported averageamount of time devoted to reading instruction per day over the prior week.RFIS TeacherAvailability of differentiated instructional materials for strugglingreaders, or whether or not schools reported that specialized instructionalmaterials beyond the core reading program were available for strugglingreaders.Provision of extra classroom practice for struggling readers, or thenumber of dimensions in which teachers reported providing extra practiceopportunities for struggling students in the past month.Use of assessments to inform classroom practice, or the number ofinstructional purposes for which teachers reported using assessmentresults.RFIS ReadingAmount ofreadinginstructionSupports forstrugglingreadersUse ofassessmentsSurveyRFIS TeacherSurveyRFIS TeacherSurveyRFIS ReadingCoach SurveyRFIS TeacherSurveyCoach andPrincipal SurveysRFIS TeacherSurveyRFIS TeacherSurveyvarious dimensions of instruction. Two of these measures are reported in terms of the proportion of theintervals within each observation .Student engagement with print. Beginning in fall 2005, the study conducted classroom observationsusing the Student Time-on-Task and Engagement with Print (STEP) instrument to measure the percentageof students engaged in academic work who are reading or writing print. The STEP observation wascompleted by recording a time-sampled “snapshot” of student engagement three times in each observedclassroom, for a total of three such “sweeps” during each STEP observation. The STEP was used toobserve classrooms in fall 2005, spring 2006, fall 2006, and spring 2007, with an average completion rateof 98 percent across all years. One outcome measure was created using STEP data.Professional development in scientifically based reading instruction, amount of reading instruction,supports for struggling readers, and use of assessments. Within these four domains, eight outcomemeasures were created based on data from surveys of principals, reading coaches, and teachers aboutschool and classroom resources. The eight outcome measures represent aspects of scientifically basedreading instruction promoted in the Reading First legislation and guidance. Surveys were fielded in spring2005 and again in spring 2007 with an average completion rate across all respondents of 73 percent inspring 2005 and 86 percent in spring 2007. This final report includes findings from 2007 surveys only.Final Report: Executive Summaryxi

Additional data were collected by the study team in order to create measures used in correlationalanalyses. These data include:The Global Appraisal of Teaching Strategies (GATS), a 12-item checklist designed to measure teachers’instructional strategies related to overall instructional organization and order, is adapted from TheChecklist of Teacher Competencies (Foorman and Schatschneider, 2003). Unlike the IPRI, which focuseson discrete teacher behaviors, the GATS was designed to capture global classroom management andenvironmental factors. Items covered topics such as the teacher’s organization of materials, lessondelivery, responsiveness to students, and behavior management. The GATS was completed by theclassroom observer immediately after each IPRI observation, meaning that each sampled classroom wasrated on the GATS twice in the fall and twice in the spring in both the 2005-2006 school year and the2006-2007 school year. The GATS was fielded in fall 2005, spring 2006, fall 2006, and spring 2007, withan average completion rate of over 99 percent. A single measure from the GATS data was created for usein correlational analyses.Average Impacts on Classroom Reading Instruction, Key Componentsof Scientifically Based Reading Instruction, and Student ReadingAchievementExhibit ES.3 reports average impacts on classroom reading instruction and student readingcomprehension pooled across school years 2004-05 and 2005-06 and 2006-07. 6 Exhibit ES.4 reportsaverage impacts on key components of scientifically based reading instruction from spring 2007. Exhib

Beth Boulay, Abt Associates Fatih Unlu, Abt Associates Laurie Bozzi . The study has also benefited from both external and internal technical advisors, including: External Advisors Internal Advisors . None of these organizations or their key staff has financial interests that could be affected by findings