Cognitive Difficulties in Struggling Comprehenders and their Relation to Reading Comprehension: A Comparison of Group Selection and Regression-Based Models
Difficulties suppressing previously encountered, but currently irrelevant information from working memory characterize less skilled comprehenders in studies in which they are matched to skilled comprehenders on word decoding and nonverbal IQ. These "extreme" group designs are associated with several methodological issues. When sample size permits, regression approaches permit a more accurate estimation of effects. Using data for students in grades 6 to 12 ( = 766), regression techniques assessed the significance and size of the relation of suppression to reading comprehension across the distribution of comprehension skill. After accounting for decoding efficiency and nonverbal IQ, suppression, measured by performance on a verbal proactive interference task, accounted for a small amount of significant unique variance in comprehension (less than 1%). A comparison of suppression in less skilled comprehenders matched to more skilled comprehenders (48 per group) on age, word reading efficiency and nonverbal IQ did not show significant group differences in suppression. The implications of the findings for theories of reading comprehension and for informing comprehension assessment and intervention are discussed.
Reducing Inequality in Academic Success for Incoming College Students: A Randomized Trial of Growth Mindset and Belonging Interventions
Light-touch social psychological interventions have gained considerable attention for their potential to improve academic outcomes for underrepresented and/or disadvantaged students in postsecondary education. While findings from previous interventions have demonstrated positive effects for racial and ethnic minority and first-generation students in small samples, few interventions have been implemented at a larger scale with more heterogeneous student populations. To address this research gap, 7,686 students, representing more than 90% of incoming first-year students at a large Midwestern public university, were randomly assigned to an online growth mindset intervention, social belonging intervention, or a comparison group. Results suggest that after the fall semester, the growth mindset intervention significantly improved grade point averages for Latino/a students by about .40 points. This represents a 72% reduction in the GPA gap between White and Latino/a students. Further, this effect was replicated for both spring semester GPA and cumulative GPA. These findings indicate that light-touch interventions may be a minimally invasive approach to improving academic outcomes for underrepresented students. Our findings also highlight the complexity of implementing customized belonging interventions in heterogeneous contexts.
Persistence and Fadeout in the Impacts of Child and Adolescent Interventions
Many interventions targeting cognitive skills or socioemotional skills and behaviors demonstrate initially promising but then quickly disappearing impacts. Our paper seeks to identify the key features of interventions, as well as the characteristics and environments of the children and adolescents who participate in them, that can be expected to sustain persistently beneficial program impacts. We describe three such processes: skill-building, foot-in-the-door and sustaining environments. We argue that skill-building interventions should target "trifecta" skills - ones that are malleable, fundamental, and would not have developed eventually in the absence of the intervention. Successful foot-in-the-door interventions equip a child with the right skills or capacities at the right time to avoid imminent risks (e.g., grade failure or teen drinking) or seize emerging opportunities (e.g., entry into honors classes). The sustaining environments perspective views high quality of environments subsequent to the completion of the intervention as crucial for sustaining early skill gains. These three perspectives generate both complementary and competing hypotheses regarding the nature, timing and targeting of interventions that generate enduring impacts.
Does Early Mathematics Intervention Change the Processes Underlying Children's Learning?
Early educational intervention effects typically fade in the years following treatment, and few studies have investigated why achievement impacts diminish over time. The current study tested the effects of a preschool mathematics intervention on two aspects of children's mathematical development. We tested for separate effects of the intervention on "state" (occasion-specific) and "trait" (relatively stable) variability in mathematics achievement. Results indicated that, although the treatment had a large impact on state mathematics, the treatment had no effect on trait mathematics, or the aspect of mathematics achievement that influences stable individual differences in mathematics achievement over time. Results did suggest, however, that the intervention could affect the underlying processes in children's mathematical development by inducing more transfer of knowledge immediately following the intervention for students in the treated group.
Developing Optimized Adaptive Interventions in Education
Hedges (2018) encourages us to consider asking new scientific questions concerning the optimization of adaptive interventions in education. In this commentary, we have expanded on this (albeit briefly) by providing concrete examples of scientific questions and associated experimental designs to optimize adaptive interventions, and commenting on some of the ways such designs might challenge us to think differently. A great deal of methodological work remains to be done. For example, we have only begun to consider experimental design and analysis methods for developing "cluster-level adaptive interventions" (NeCamp, Kilbourne, & Almirall, 2017), or to extend methods for comparing the marginal mean trajectories between the adaptive interventions embedded in a SMART (Lu et al., 2016) to accommodate random effects. These methodological advances, among others, will propel educational research concerning the construction of more complex, yet meaningful, interventions that are necessary for improving student and teacher outcomes.
Using a Multi-Site RCT to Predict Impacts for a Single Site: Do Better Data and Methods Yield More Accurate Predictions?
Multi-site randomized controlled trials (RCTs) provide unbiased estimates of the average impact in the study sample. However, their ability to accurately predict the impact for individual sites outside the study sample, to inform local policy decisions, is largely unknown. To extend prior research on this question, we analyzed six multi-site RCTs and tested modern prediction methods-lasso regression and Bayesian Additive Regression Trees (BART)-using a wide range of moderator variables. The main study findings are that: (1) all of the methods yielded accurate impact predictions when the variation in impacts across sites was close to zero (as expected); (2) none of the methods yielded accurate impact predictions when the variation in impacts across sites was substantial; and (3) BART typically produced "less inaccurate" predictions than lasso regression or than the Sample Average Treatment Effect. These results raise concerns that when the impact of an intervention varies considerably across sites, statistical modelling using the data commonly collected by multi-site RCTs will be insufficient to explain the variation in impacts across sites and accurately predict impacts for individual sites.
Exploring the Impact of Student Teaching Apprenticeships on Student Achievement and Mentor Teachers
We exploit within-teacher variation in the years that math and reading teachers in grades 4-8 host an apprentice ("student teacher") in Washington State to estimate the causal effect of these apprenticeships on student achievement, both during the apprenticeship and afterwards. While the average causal effect of hosting a student teacher on student performance in the year of the apprenticeship is indistinguishable from zero in both math and reading, hosting a student teacher is found to have modest positive impacts on student math and reading achievement in a teacher's classroom in following years. These findings suggest that schools and districts can participate in the student teaching process without fear of short-term decreases in student test scores while potentially gaining modest long-term test score increases.
Teaching for All? Teach For America's Effects across the Distribution of Student Achievement
This paper examines the effect of Teach For America (TFA) on the distribution of student achievement in elementary school. It extends previous research by estimating quantile treatment effects (QTE) to examine how student achievement in TFA and non-TFA classrooms differs across the broader distribution of student achievement. It also updates prior distributional work on TFA by correcting for previously unidentified missing data and estimating unconditional, rather than conditional QTE. Consistent with previous findings, results reveal a positive impact of TFA teachers across the distribution of math achievement. In reading, however, relative to veteran non-TFA teachers, students at the bottom of the reading distribution score worse in TFA classrooms, and students in the upper half of the distribution perform better.
Why Does a Growth Mindset Intervention Impact Achievement Differently across Secondary Schools? Unpacking the Causal Mediation Mechanism from a National Multisite Randomized Experiment
The growth mindset or the belief that intelligence is malleable has garnered significant attention for its positive association with academic success. Several recent randomized trials, including the National Study of Learning Mindsets (NSLM), have been conducted to understand why, for whom, and under what contexts a growth mindset intervention can promote beneficial achievement outcomes during critical educational transitions. Prior research suggests that the NSLM intervention was particularly effective in improving low-achieving 9th graders' GPA, while the impact varied across schools. In this study, we investigated the underlying causal mediation mechanism that might explain this impact and how the mechanism varied across different types of schools. By extending a recently developed weighting method for multisite causal mediation analysis, the analysis enhances the external and internal validity of the results. We found that challenge-seeking behavior played a significant mediating role, only in medium-achieving schools, which may partly explain the reason why the intervention worked differently across schools. We conclude by discussing implications for designing interventions that not only promote students' growth mindsets but also foster supportive learning environments under different school contexts.
Replication and extension of a family-based training program to improve cognitive abilities in young children
Childhood socioeconomic status (SES) is associated with persistent academic achievement gaps, which necessitates evidence-based, scalable interventions to improve children's outcomes. The present study reports results from a replication and extension of a family-based training program previously found to improve cognitive development in lower-SES preschoolers (Neville et al., 2013). One hundred and one primarily low-SES families with 107 children aged 4-7 years were randomly assigned to the intervention or passive control group. Intent-to-treat regression models revealed that children whose families were assigned to the intervention group did not exhibit significant benefit on composite measures of nonverbal IQ, executive functioning, or language skills, though post-hoc analyses suggested marginal improvement on the fluid reasoning subcomponent of nonverbal IQ. Treatment-on-treated models revealed a significant positive effect of intervention attendance on fluid reasoning and a negative effect on vocabulary. We discuss potential causes for the non-replication, including differences in the sample composition, size, and assessment choices. Results suggest the need to more broadly assess scalable interventions with varying populations and ensure appropriate cultural and geographical adaptations to achieve maximum benefits for children from diverse backgrounds.
Teacher Effects on Student Achievement and Height: A Cautionary Tale
We apply "value-added" models to estimate the effects of teachers on an outcome they cannot plausibly affect: student height. When fitting commonly estimated models to New York City data, we find that the standard deviation of teacher effects on height is nearly as large as that for math and reading, raising potential concerns about value-added estimates of teacher effectiveness. We consider two explanations: non-random sorting of students to teachers and idiosyncratic classroom-level variation. We cannot rule out sorting on unobservables, but find students are not sorted to teachers based on lagged height. The correlation in teacher effects estimates on height across years and the correlation between teacher effects on height and teacher effects on achievement are insignificant. The large estimated "effects" for height appear to be driven by year-to-year classroom by teacher variation that is not often separable from true effects in models commonly estimated in practice. Reassuringly for use of these models in research settings, models which disentangle persistent effects from transient classroom-level variation yield the theoretically expected effects of zero for teacher value added on height.
Design and Analytic Features for Reducing Biases in Skill-Building Intervention Impact Forecasts
Despite policy relevance, longer-term evaluations of educational interventions are relatively rare. A common approach to this problem has been to rely on longitudinal research to determine targets for intervention by looking at the correlation between children's early skills (e.g., preschool numeracy) and medium-term outcomes (e.g., first-grade math achievement). However, this approach has sometimes over-or under-predicted the long-term effects (e.g., 5th-grade math achievement) of successfully improving early math skills. Using a within-study comparison design, we assess various approaches to forecasting medium-term impacts of early math skill-building interventions. The most accurate forecasts were obtained when including comprehensive baseline controls and using a combination of conceptually proximal and distal short-term outcomes (in the nonexperimental longitudinal data). Researchers can use our approach to establish a set of designs and analyses to predict the impacts of their interventions up to two years post-treatment. The approach can also be applied to power analyses, model checking, and theory revisions to understand mechanisms contributing to medium-term outcomes.
Consistency between Household and County Measures of Onsite Schooling during the COVID-19 Pandemic
The academic, socioemotional, and health impacts of school policies throughout the COVID-19 pandemic have been a source of many questions that require accurate information about the extent of onsite schooling occurring. This article investigates school operational status datasets during the pandemic, comparing (1) self-report data collected nationally on the household level through a Facebook-based survey, (2) county-level school policy data, and (3) a school-level closure status dataset based on phone GPS tracking. The percentage of any onsite instruction within states and counties are compared across datasets from December 2020 to May 2021. Sources were relatively consistent at the state level and for large counties, but key differences were revealed between units of measurement, showing differences between policy and household decisions surrounding children's schooling experiences. The consistency levels across sources support the usage of each of the school policy sources to answer questions about the educational experiences, factors, and impacts related to K-12 education across the nation during the pandemic, but it remains vital to think critically as to which unit of measurement is most relevant to targeted research questions.
Characteristics of School Districts That Participate in Rigorous National Educational Evaluations
Given increasing interest in evidence-based policy, there is growing attention to how well the results from rigorous program evaluations may inform policy decisions. However, little attention has been paid to documenting the characteristics of schools or districts that participate in rigorous educational evaluations, and how they compare to potential target populations for the interventions that were evaluated. Utilizing a list of the actual districts that participated in 11 large-scale rigorous educational evaluations, we compare those districts to several different target populations of districts that could potentially be affected by policy decisions regarding the interventions under study. We find that school districts that participated in the 11 rigorous educational evaluations differ from the interventions' target populations in several ways, including size, student performance on state assessments, and location (urban/rural). These findings raise questions about whether, as currently implemented, the results from rigorous impact studies in education are likely to generalize to the larger set of school districts-and thus schools and students-of potential interest to policymakers, and how we can improve our study designs to retain strong internal validity while also enhancing external validity.
Teachers, Schools, and Pre-K Effect Persistence: An Examination of the Sustaining Environment Hypothesis
Latent Profiles of Reading and Language and Their Association With Standardized Reading Outcomes in Kindergarten Through Tenth Grade
The objective of this study was to determine the latent profiles of reading and language skills that characterized 7,752 students in kindergarten through tenth grade and to relate the profiles to norm-referenced reading outcomes. Reading and language skills were assessed with a computer-adaptive assessment administered in the middle of the year and reading outcome measures were administered at the end of the year. Three measures of reading comprehension were administered in third through tenth grades to create a latent variable. Latent profile analysis (LPA) was conducted on the reading and language measures and related to reading outcomes in multiple regression analyses. Within-grade multiple regressions were subjected to a linear step-up correction to guard against false-discovery rate. LPA results revealed five to six profiles in the elementary grades and three in the secondary grades that were strongly related to standardized reading outcomes, with average absolute between-profile effect sizes ranging from 1.10 to 2.53. The profiles in the secondary grades followed a high, medium, and low pattern. Profiles in the elementary grades revealed more heterogeneity, suggestive of strategies for differentiating instruction.
Distinctions without a difference? Preschool curricula and children's development
Assessing methods for generalizing experimental impact estimates to target populations
Randomized experiments are considered the gold standard for causal inference, as they can provide unbiased estimates of treatment effects for the experimental participants. However, researchers and policymakers are often interested in using a specific experiment to inform decisions about other target populations. In education research, increasing attention is being paid to the potential lack of generalizability of randomized experiments, as the experimental participants may be unrepresentative of the target population of interest. This paper examines whether generalization may be assisted by statistical methods that adjust for observed differences between the experimental participants and members of a target population. The methods examined include approaches that reweight the experimental data so that participants more closely resemble the target population and methods that utilize models of the outcome. Two simulation studies and one empirical analysis investigate and compare the methods' performance. One simulation uses purely simulated data while the other utilizes data from an evaluation of a school-based dropout prevention program. Our simulations suggest that machine learning methods outperform regression-based methods when the required structural (ignorability) assumptions are satisfied. When these assumptions are violated, all of the methods examined perform poorly. Our empirical analysis uses data from a multi-site experiment to assess how well results from a given site predict impacts in other sites. Using a variety of extrapolation methods, predicted effects for each site are compared to actual benchmarks. Flexible modeling approaches perform best, although linear regression is not far behind. Taken together, these results suggest that flexible modeling techniques can aid generalization while underscoring the fact that even state-of-the-art statistical techniques still rely on strong assumptions.
Do High-Quality Kindergarten and First-Grade Classrooms Mitigate Preschool Fadeout?
Prior research shows that short-term effects from preschool may disappear, but little research has considered which environmental conditions might sustain academic advantages from preschool into elementary school. Using secondary data from two preschool experiments, we investigate whether features of elementary schools, particularly advanced content and high-quality instruction in kindergarten and first grade, as well as professional supports to coordinate curricular instruction, reduce fadeout. Across both studies, our measures of instruction did not moderate fadeout. However, results indicated that targeted teacher professional supports substantially mitigated fadeout between kindergarten and first grade but that this was not mediated through classroom quality. Future research should investigate the specific mechanisms through which aligned preschool-elementary school curricular approaches can sustain the benefits of preschool programs for low-income children.
Benchmarks for Expected Annual Academic Growth for Students in the Bottom Quartile of the Normative Distribution
Effect sizes are commonly reported for the results of educational interventions. However, researchers struggle with interpreting their magnitude in a way that transcends generic guidelines. Effect sizes can be interpreted in a meaningful context by benchmarking them against typical growth for students in the normative distribution. Such benchmarks are not currently available for students in the bottom quartile. This report remedies this by providing a comparative context for interventions involving these students. Annual growth effect sizes for K-12 students were computed from nationally normed assessments and a longitudinal study of students in special education. They reveal declining growth over time, especially for reading and math. These results allow researchers to better interpret the effects of their interventions and help practitioners by quantifying typical growth for struggling students. More longitudinal research is needed to show growth trajectories for students in the bottom quartile.
Student Behavior Ratings and Response to Tier 1 Reading Intervention: Which Students Do Not Benefit?
Core reading instruction and interventions have differential effects based on student characteristics such as cognitive ability and pre-intervention skill level. Evidence for differential effect based on affective characteristics is scant and ambiguous; however, students with problem behavior are more often non-responsive to core reading instruction and intensive reading interventions. In this study, we estimated the range of students' behavior ratings in which a core reading instruction intervention was effective using a data set including 3,024 students in K-3. Data came from seven independent studies evaluating the Individualized Student Instruction (ISI) Tier 1 reading intervention and were pooled using integrative data analysis. We estimated Johnson-Neyman intervals of student behavior ratings that showed a treatment effect both at the within and between classroom level. ISI was effective in improving reading scores (=0.51, =.020, = 0.08). However, students with very low or very high behavior ratings did not benefit from the approaches (range of behavior rating factor scores: -0.95 - 2.87). At the classroom level, students in classrooms with a higher average of problem behaviors did not benefit from ISI (average classroom behavior rating factor score: 0.05 - 4.25). Results suggest differentiating instruction alone is not enough for students with behavior problems to grow in reading ability.