Research Study: VAM-Based Bias

Please follow and like us:

Researchers from Indiana and Michigan State University, in a study released in the fall of 2012 but that recently came through my email again (thanks to Diane Ravitch), deserves a special post here as it relates to not only VAMs but also the extent to which all VAM models yield biased results.

In this study (albeit still not peer reviewed, so please interpret accordingly), researchers “investigate whether commonly used value-added estimation strategies produce accurate estimates of teacher effects under a variety of…student grouping and teacher assignment scenarios.” Researchers find that no VAM “accurately captures true teacher effects in all scenarios, and the potential for misclassifying teachers as high- or low-performing can be substantial [emphasis added].”

While these researchers suggest different statistical controls to yield less biased results (i.e., a dynamic ordinary least square [DOLS] estimator), the bottom line is that VAMs cannot “effectively isolate the ‘true’ contribution of teachers and schools to achievement growth” over time. Whether this will ever be possible given mainly the extraneous variables that are outside of the control of teachers and schools, but that continue to confound and complicate VAM-based estimates deeming them (still) unreliable and invalid, particularly for the high-stakes decision-making purposes for which VAMs are increasingly being tasked, is highly suspect.

The only way we might reach truer/more valid and less biased results is to randomly assign students and teachers to classrooms, which as evidenced in a recent article one of my doctoral students and I recently had published in the highly esteemed American Educational Research Journal, is highly impractical, professionally unacceptable, and realistically impossible. Hence, “[i]f higher achieving students are grouped within certain schools and lower achieving students in others, then the teachers in the high-achieving schools, regardless of their true teaching ability, will [continue to] have higher probabilities of high-achieving classrooms. Similarly, if higher ability teachers are grouped within certain schools and lower ability teachers in others, then students in the schools with better teachers will [continue to] realize higher gains.” This exacerbates the nonrandom sorting issues immensely.

The researchers write, as well, that “it is clear that every estimator has an Achilles heel (or more than one area of potential weakness).” While VAMs seem to have plenty of potential and very real weaknesses, VAM-based bias is one weakness that certainly stands out, here and elsewhere, especially in that so many pro-VAM statisticians believe and continue to perpetuate beliefs about how their complex statistics (e.g., shrinkage estimators) can (miraculously) control for everything and all things causing chaos. As evidenced in this study, the notable work of Jesse Rothstein – Associate Professor at UC Berkeley (see two of his articles here), and other studies cited in the aforementioned study (linked again here), this is not and likely never will be the case. It just isn’t!

Finally, these researchers conclude that, “even in the best scenarios and under the simplistic and idealized conditions…the potential for misclassifying above average teachers as below average or for misidentifying the ‘worst’ or ‘best’ teachers remains nontrivial.” Accordingly, misclassification rates can range “from at least seven to more than 60 percent” depending on the statistical controls and estimators used and the moderately to highly non-random student sorting practices and scenarios across schools.

Full study citation: Guarino, C. M., Reckase, M. D., & Wooldridge, J. M. (2012, December 12). Can value-added measures of teacher education performance be trusted? East Lansing, MI: The Education Policy Center at Michigan State University. Retrieved from

1 thought on “Research Study: VAM-Based Bias

  1. I wonder if there is any agreement on a framework or variables that can indicate the effectiveness of teachers with a 98% probability (the common 95% probability might not be enough), no matter if these variables can be measurable. If researchers can reach agreement on this, then the problem becomes which evaluation model can most effectively measure these variables. Otherwise, researchers might need to focus on identifying the factors or variables or framework that can indicate a teacher’s effectiveness. I do think ethnographic study plays a role here first to identify the effectiveness variables/indicators, instead of focusing on isolated statistical models.

Leave a Reply

Your email address will not be published. Required fields are marked *