VAMs Are Never “Accurate, Reliable, and Valid”

Please follow and like us:

The Educational Researcher (ER) journal is the highly esteemed, flagship journal of the American Educational Research Association. It may sound familiar in that what I view to be many of the best research articles published about value-added models (VAMs) were published in ER (see my full reading list on this topic here), but as more specific to this post, the recent “AERA Statement on Use of Value-Added Models (VAM) for the Evaluation of Educators and Educator Preparation Programs” was also published in this journal (see also a prior post about this position statement here).

After this position statement was published, however, many critiqued AERA and the authors of this piece for going too easy on VAMs, as well as VAM proponents and users, and for not taking a firmer stance against VAMs given the current research. The lightest of the critiques, for example, as authored by Brookings Institution affiliate Michael Hansen and University of Washington Bothell’s Dan Goldhaber was highlighted here, after which Boston College’s Dr. Henry Braun responded also here. Some even believed this response to also be too, let’s say, collegial or symbiotic.

Just this month, however, ER released a critique of this same position statement, as authored by Steven Klees, a Professor at the University of Maryland. Klees wrote, essentially, that the AERA Statement “only alludes to the principal problem with [VAMs]…misspecification.” To isolate the contributions of teachers to student learning is not only “very difficult,” but “it is impossible—even if all the technical requirements in the [AERA] Statement [see here] are met.”

Rather, Klees wrote, “[f]or proper specification of any form of regression analysis…All confounding variables must be in the equation, all must be measured correctly, and the correct functional form must be used. As the 40-year literature on input-output functions that use student test scores as the dependent variable make clear, we never even come close to meeting these conditions…[Hence, simply] adding relevant variables to the model, changing how you measure them, or using alternative functional forms will always yield significant differences in the rank ordering of teachers’…contributions.”

Therefore, Klees argues “that with any VAM process that made its data available to competent researchers, those researchers would find that reasonable alternative specifications would yield major differences in rank ordering. Misclassification is not simply a ‘significant risk’— major misclassification is rampant and inherent in the use of VAM.”
Klees concludes: “The bottom line is that regardless of technical sophistication, the use of VAM is never [and, perhaps never will be] ‘accurate, reliable, and valid’ and will never yield ‘rigorously supported inferences” as expected and desired.
Citation: Klees, S. J. (2016). VAMs Are Never “Accurate, Reliable, and Valid.” Educational Researcher, 45(4), 267. doi: 10.3102/0013189X16651081

11 thoughts on “VAMs Are Never “Accurate, Reliable, and Valid”

  1. Dr. Beardsley, you know this is misleading. We are not trying to measure the speed of light to the tenth significant digit. We are trying to get as reliable a measure as we can. We do know the current evaluation system that results in 99%+ teacher effectiveness ratings are completely unreliable. Even reliability among observers is much more unstable than VAMs.

    When the military began to develop fire control systems in WWII to track enemy planes so that anti-aircraft guns could shoot them down, there was significant error. However, those systems were way more successful than a single gunner trying to eyeball the target. They led to our great success in repelling aerial assaults on carriers. Would you simply dismiss such fire control systems because they are not 100% accurate?

    You seem to imply a teacher has a greater right to their job than a student has to an effective education. That is what this really comes down to. Do folks care about having a zero false negative (bad rating for decent teacher) rate. But kids can’t afford the extreme false positive rate (good rating for a bad teacher) that we know exists right now!

    • I love the false equivalence scenario that you set up to argue your support for an invalid, arbitrary and capricious evaluation system. What financial benefit of yours is so threatened that you reliably show up on every blog post that refutes your pet theories about standardized testing and VAMs?

    • Interesting, revealing and profoundly appalling juxtaposition of shooting down planes, children learning and developing, and firing teachers.


  2. “Just this month, however, ER released a critique…”

    Just to be clear, this was not an editorial position taken by ER nor was it a peer-reviewed submission accepted for publication. It was a 3-paragraph letter from a reader.

  3. Virginiasgp,

    The analytics are already there, or at the very least the exam results within the context of a particular school, its history of recent achievement and initiatives and its current state of evolution. Giving undue weight is deleterious to everything including the students. Giving undue power to the State, a political entity, is reprehensible.

  4. Hmmmm…….. For some reason, in most the lowest ranked teachers end up being the teachers at schools with students in the lowest socio-economic neighborhoods. It boggles the mind that the work of 20 to 150 students would be considered enough to get an accurate statistical picture of a teacher’s work.

    This would seem to indicate that the formulas are flawed. Ranking teachers, regardless of how it is done, encourages competition, which is contrary to best educator practices. In the best schools teachers mentor, share, and support each other as professionals.

    Meanwhile the stress of this model causes teachers, regardless of quality, to leave the profession.

Leave a Reply

Your email address will not be published. Required fields are marked *