New Mexico Lawsuit: Final Update

In December 2015 in New Mexico, via a preliminary injunction set forth by state District Judge David K. Thomson, all consequences attached to teacher-level value-added model (VAM) scores (e.g., flagging the files of teachers with low VAM scores) were suspended throughout the state until the state (and/or others external to the state) could prove to the state court that the system was reliable, valid, fair, uniform, and the like. The trial during which this evidence was to be presented was set, and re-set, and re-set again, never to actually occur. More specifically, after December 2015 and through 2018, multiple depositions and hearings occurred. In April 2019, the case was reassigned to a new judge (via a mass reassignment state policy), again, while the injunction was still in place.

Thereafter, teacher evaluation was a hot policy issue during the state’s 2018 gubernatorial election. The now-prior state governor, Republican Susana Martinez, who essentially ordered and helped shape the state’s teacher evaluation system at issue during this lawsuit, had reached the maximum number of terms served and could not run again. All candidates running to replace her had grave concerns about the state’s teacher evaluation system. Democrat Michelle Lujan Grisham ending up winning.

Two days after Grisham was sworn in, she signed an Executive Order for the entire state system to be amended, including no longer using value-added data to evaluate teachers. Her Executive Order also stipulated that the state department was to work with teachers, administrators, parents, students, and the like, to determine more appropriate methods of measuring teacher effectiveness. While the education task force charged with this task is still in the process of finalizing the state’s new system, it is important to note that now, although actually beginning in the 2018-2019 school year, teachers are being evaluated via (primarily) classroom observations and student/family surveys. The value-added component (and a teacher attendance component that was also the source of contention during this lawsuit) were removed entirely from the state’s teacher evaluation framework.

Likewise, the plaintiffs (the lawyers, teachers, and administrators with whom I worked on this case) are no longer moving forward with the 2015 lawsuit as Grisham’s Executive Order also rendered this lawsuit as moot.

The initial victory that we achieved in 2015 ultimately yielded a victory in the end. Way to go New Mexico!

*This post was co-authored by one of my PhD students – Tray Geiger – who is finishing up his dissertation about this case.

Teachers “Grow” Their Students’ Heights As Much As Their Achievement

The title of this post captures the key findings of a study that has come across my desk now over 25 times during the past two weeks; hence, I decided to summarize and share out, also as significant to our collective understandings about value-added models (VAMs).

The study — “Teacher Effects on Student Achievement and Height: A Cautionary Tale” — was recently published by the National Bureau of Economic Research (NBER) (Note 1) and authored by Marianne Bitler (Professor of Economics at the University of California, Davis), Sean Corcoran (Associate Professor of Public Policy and Education at Vanderbilt University), Thurston Domina (Professor at the University of North Carolina at Chapel Hill), and Emily Penner (Assistant Professor at the University of California, Irvine).

In short, study researchers used administrative data from New York City Public Schools to estimate the “value” teachers “add” to student achievement, and (also in comparison) to student height. The assumption herein, of course, is that teachers’ cannot plausibly or literally “grow” their students’ heights. If they were found to do so using a VAM (also oft-referred to as “growth” models, hereafter referred to more generally as VAMs), this would threaten the overall validity of the output derive via any such VAM, given VAMs’ sole purposes are to measure teacher effects on “growth” in student achievement and only student achievement over time. Put differently, if a VAM was found to “grow” students’ height, this would ultimately negate the validity of any such VAM given the very purposes for which VAMs have been adopted, implemented, and used, misused, and abused across states, especially over the last decade.

Notwithstanding, study researchers found that “the standard deviation of teacher effects on height is nearly as large as that for math and reading achievement” (Abstract). More specifically, they found that the “estimated teacher ‘effects’ on height [were] comparable in magnitude to actual teacher effects on math and ELA achievement, 0.22 [standard deviations] compared to 0.29 [standard deviations] and 0.26 [standard deviations], respectively (p. 24).

Put differently, teacher effects, as measured by a commonly used VAM, were about the same in terms of the extent to which teachers “added value” to their students’ growth in achievement over time and their students’ physical heights. Clearly, this raises serious questions about the overall validity of this (and perhaps all) VAMs in terms of not only what they are intended to do, and what they did (at least in this study) as well. To yield such spurious results (i.e., results that are nonsensical and more likely due to noise than anything else) threatens the overall validity of the output derived via these models, as well as the extent to which their output can or should be trusted. This is clearly an issue with validity, or rather the validity of the inferences to be drawn from this (and perhaps/likely any other) VAM.

Ultimately the authors conclude that the findings from their paper should “serve as a cautionary tale” for the use of VAMs in practice. With all due respect to my colleagues, in my opinion their findings are much more serious than those that might merely warrant caution. Only one other study of which I am aware (Note 2), as akin to the study conducted here, could be as damming to the validity of VAMs and their too often “naïve application[s]” (p. 24).

Citation: Bitler, M., Corcoran, S., Domina, T., & Penner, E. (2019, November). Teacher effects on student achievement and height: A cautionary tale. National Bureau of Economic Research (NBER) Working Paper No. 26480. Retrieved from

Note 1: As I have oft-commented in prior posts about papers published by the NBER, it is important to note that NBER papers such as these (i.e., “working papers”) have not been internally reviewed (e.g., by NBER Board Directors), nor have they been peer-reviewed or vetted. Rather, such “working papers” are widely circulated for discussion and comment, prior to what the academy of education would consider appropriate vetting. While listed in the front matter of this piece are highly respected scholars who helped critique and likely improve this paper, this is not the same as putting any such piece through a double-blinded, peer reviewed, process. Hence, caution is also warranted here when interpreting study results.

Note 2: Rothstein (2009, 2010) conducted a falsification test by which he tested, also counter-intuitively, whether a teacher in the future could cause, or have an impact on his/her students’ levels of achievement in the past. Rothstein demonstrated that given non-random student placement (and tracking) practices, VAM-based estimates of future teachers could be used to predict students’ past levels of achievement. More generally, Rothstein demonstrated that both typical and complex VAMs demonstrated counterfactual effects and did not mitigate bias because students are consistently and systematically grouped in ways that explicitly bias value-added estimates. Otherwise, the backwards predictions Rothstein demonstrated could not have been made.

Citations: Rothstein, J. (2009). Student sorting and bias in value-added estimation: Selection on observables and unobservables. Education Finance and Policy, (4)4, 537-571. doi:

Rothstein, J. (2010). Teacher quality in educational production: Tracking, decay, and student achievement. Quarterly Journal of Economics. 175-214. doi:10.1162/qjec.2010.125.1.175