The title of this post captures the key findings of a study that has come across my desk now over 25 times during the past two weeks; hence, I decided to summarize and share out, also as significant to our collective understandings about value-added models (VAMs).
The study — “Teacher Effects on Student Achievement and Height: A Cautionary Tale” — was recently published by the National Bureau of Economic Research (NBER) (Note 1) and authored by Marianne Bitler (Professor of Economics at the University of California, Davis), Sean Corcoran (Associate Professor of Public Policy and Education at Vanderbilt University), Thurston Domina (Professor at the University of North Carolina at Chapel Hill), and Emily Penner (Assistant Professor at the University of California, Irvine).
In short, study researchers used administrative data from New York City Public Schools to estimate the “value” teachers “add” to student achievement, and (also in comparison) to student height. The assumption herein, of course, is that teachers’ cannot plausibly or literally “grow” their students’ heights. If they were found to do so using a VAM (also oft-referred to as “growth” models, hereafter referred to more generally as VAMs), this would threaten the overall validity of the output derive via any such VAM, given VAMs’ sole purposes are to measure teacher effects on “growth” in student achievement and only student achievement over time. Put differently, if a VAM was found to “grow” students’ height, this would ultimately negate the validity of any such VAM given the very purposes for which VAMs have been adopted, implemented, and used, misused, and abused across states, especially over the last decade.
Notwithstanding, study researchers found that “the standard deviation of teacher effects on height is nearly as large as that for math and reading achievement” (Abstract). More specifically, they found that the “estimated teacher ‘effects’ on height [were] comparable in magnitude to actual teacher effects on math and ELA achievement, 0.22 [standard deviations] compared to 0.29 [standard deviations] and 0.26 [standard deviations], respectively (p. 24).
Put differently, teacher effects, as measured by a commonly used VAM, were about the same in terms of the extent to which teachers “added value” to their students’ growth in achievement over time and their students’ physical heights. Clearly, this raises serious questions about the overall validity of this (and perhaps all) VAMs in terms of not only what they are intended to do, and what they did (at least in this study) as well. To yield such spurious results (i.e., results that are nonsensical and more likely due to noise than anything else) threatens the overall validity of the output derived via these models, as well as the extent to which their output can or should be trusted. This is clearly an issue with validity, or rather the validity of the inferences to be drawn from this (and perhaps/likely any other) VAM.
Ultimately the authors conclude that the findings from their paper should “serve as a cautionary tale” for the use of VAMs in practice. With all due respect to my colleagues, in my opinion their findings are much more serious than those that might merely warrant caution. Only one other study of which I am aware (Note 2), as akin to the study conducted here, could be as damming to the validity of VAMs and their too often “naïve application[s]” (p. 24).
Citation: Bitler, M., Corcoran, S., Domina, T., & Penner, E. (2019, November). Teacher effects on student achievement and height: A cautionary tale. National Bureau of Economic Research (NBER) Working Paper No. 26480. Retrieved from https://www.nber.org/papers/w26480.pdf
Note 1: As I have oft-commented in prior posts about papers published by the NBER, it is important to note that NBER papers such as these (i.e., “working papers”) have not been internally reviewed (e.g., by NBER Board Directors), nor have they been peer-reviewed or vetted. Rather, such “working papers” are widely circulated for discussion and comment, prior to what the academy of education would consider appropriate vetting. While listed in the front matter of this piece are highly respected scholars who helped critique and likely improve this paper, this is not the same as putting any such piece through a double-blinded, peer reviewed, process. Hence, caution is also warranted here when interpreting study results.
Note 2: Rothstein (2009, 2010) conducted a falsification test by which he tested, also counter-intuitively, whether a teacher in the future could cause, or have an impact on his/her students’ levels of achievement in the past. Rothstein demonstrated that given non-random student placement (and tracking) practices, VAM-based estimates of future teachers could be used to predict students’ past levels of achievement. More generally, Rothstein demonstrated that both typical and complex VAMs demonstrated counterfactual effects and did not mitigate bias because students are consistently and systematically grouped in ways that explicitly bias value-added estimates. Otherwise, the backwards predictions Rothstein demonstrated could not have been made.
Citations: Rothstein, J. (2009). Student sorting and bias in value-added estimation: Selection on observables and unobservables. Education Finance and Policy, (4)4, 537-571. doi:http://dx.doi.org/10.1162/edfp.2009.4.4.537
Rothstein, J. (2010). Teacher quality in educational production: Tracking, decay, and student achievement. Quarterly Journal of Economics. 175-214. doi:10.1162/qjec.2010.125.1.175