Individual-Level VAM Scores Over Time: “Less Reliable than Flipping a Coin”

Please follow and like us:

In case you missed it (I did), an article authored by Stuart Yeh (Associate Professor at the University of Minnesota) titled “A Reanalysis of the Effects of Teacher Replacement Using Value-Added Modeling” was (recently) published in the esteemed, peer-reviewed journal: Teachers College Record. While the publication suggests a 2013 publication date, my understanding is that the actual article was more recently released.

Regardless, it’s contents are important to share, particularly in terms of VAM-based levels of reliability, whereas reliability is positioned as follows: “The question of stability [reliability/consistency] is not a question about whether average teacher performance rises, declines, or remains flat over time. The issue that concerns critics of VAM is whether individual teacher performance fluctuates over time in a way that invalidates inferences that an individual teacher is “low-” or “high-” performing. This distinction is crucial because VAM is increasingly being applied such that individual teachers who are identified as low-performing are to be terminated. From the perspective of individual teachers, it is inappropriate and invalid to fire a teacher whose performance is low this year but high the next year, and it is inappropriate to retain a teacher whose performance is high this year but low next year. Even if average teacher performance remains stable over time, individual teacher performance may fluctuate wildly from year to year” (p. 7).

Yeh’s conclusions, then (and as based on the evidence presented in this piece) is that “VAM is less reliable than flipping a coin for the purpose of categorizing high- and low-performing teachers” (p. 19). More specifically, VAMs have an estimated, overall error rate of 59% (see Endnote 2, page 26 for further explanation).

That being said, not only is the assumption that teacher quality is a fixed characteristic (i.e., that a high-performing teacher this year will be a high-performing teacher next year, and a low-performing teacher this year will be a low-performing teacher next year) false and not supported by the available data, albeit continuously assumed by many VAM proponents, (including Chetty et al.; see, for example, here, here, and here), prior estimates that using VAMs to identify teachers is no different than the flip of a coin may actually be an underestimate given current reliability estimates (see also Table 2, p. 19; see also p. 25, 26).

In section two of this article, for those of you following the Chetty et al. debates, Yeh also critiques other assumptions supporting and behind the Chetty et al. studies (see other, similar critiques here, here, here, and here). For example, Yeh critiques the VAM-based proposals to raise student achievement by (essentially) terminating low-value-added teachers. Here, the assumption is that “the use of VAM to identify and replace the lowest-performing 5% of teachers with average teachers would increase student achievement and would translate into sizable gains in the lifetime earnings of their students” (p. 2). However, because this also assumes that “there is an adequate supply of unemployed teachers who are ready and willing to be hired and would perform at a level that is 2.04 standard deviations above the performance of teachers who are fired based on value-added rankings [and] Chetty et al. do not justify this assumption with empirical data” (p. 14), this too proves much more idealistic than realistic in the grand scheme of things.

In section three of this article, for those of you generally interested in better and in this case more cost effective solutions, Yeh discusses a number of cost-effectiveness analyses comparing 22 leading approaches for raising student achievement, the results of which suggest that “the most efficient approach—rapid performance feedback (RPF)—is approximately 5,700 times as efficient as the use of VAM to identify and replace low-performing teachers” (p. 25; see also p. 23-24).


Citation: Yeh, S. S. (2013). A re-analysis of the effects of teacher replacement using value-added modeling. Teachers College Record, 115(12), 1-35. Retrieved from

1 thought on “Individual-Level VAM Scores Over Time: “Less Reliable than Flipping a Coin”

Leave a Reply

Your email address will not be published. Required fields are marked *