In a recent speech and subsequent paper written by Dr. Edward Haertel – National Academy of Education member and Professor at Stanford University – he writes about VAMs and the extent to which VAMs, being based on student test scores, can be used to make reliable and valid inferences about teachers and teacher effectiveness. This is a must-read, particularly for those out there who are new to the research literature in this area. Dr. Haertel is certainly an expert here, actually one of the best we have, and in this piece he captures the major issues well.
Some of the issues highlighted include concerns about the tests used to model value-added and how their scales (falsely assumed to be as objective and equal as units on a measuring stick) complicate and distort VAM-based estimates. He also discusses the general issues with the tests almost if not always used when modeling value-added (i.e., the state-level tests mandated as per No Child Left Behind in 2002).
He discusses why VAM estimates are least trustworthy, and most volatile and error prone, when used to compare teachers who work in very different schools with very different student populations – students who do not attend schools in randomized patterns and who are rarely if ever randomly assigned to classrooms. The issues with bias, as highlighted by Dr. Haertel and also in a recent VAMboozled! post with a link to a new research article here, are probably the most major VAM-related, problems/issues going. As captured in his words, “VAMs will not simply reward or penalize teachers according to how well or poorly they teach. They will also reward or penalize teachers according to which students they teach and which schools they teach in” (Haertel, 2013, p. 12-13).
He reiterates issues with reliability, or a lack thereof. As per one research study he cites, researchers found that “a minimum of 10% of the teachers in the bottom fifth of the distribution one year were in the top fifth the next year, and conversely. Typically, only about a third of 1 year’s top performers were in the top category again the following year, and likewise, only about a third of 1 year’s lowest performers were in the lowest category again the following year. These findings are typical [emphasis added]…[While a] few studies have found reliabilities around .5 or a little higher…this still says that only half the variation in these value-added estimates is signal, and the remainder is noise [and/or error, which makes VAM estimates entirely invalid about half of the time]” (Haertel, 2013, p. 18).
Dr. Haertel also discusses other correlations among VAM estimates and teacher observational scores, VAM estimates and student evaluation scores, and VAM estimates taken from the same teachers at the same time but using different tests, all of which also yield abysmally (and unfortunately) low correlations, similar to those mentioned above.
His bottom line? “VAMs are complicated, but not nearly so complicated as the reality they are intended to represent” (Haertel, 2013, p. 12). They just do not measure well what so many believe they measure so very well.
Again, to find out more reasons and more in-depth explanations as to why, click here for the full speech and subsequent paper.