In the most recent post, I highlighted Dr. Edward Haertel’s recent speech and subsequent paper about VAMs. As referenced in this post, Dr. Haertel did a solid and thorough review of the major issues with VAMs.
In his conclusion, it is also worth mentioning what he and I (for the most part) agree about is his answer to what he positions as a bottom-line question: “Are teacher-level VAM scores good for anything?”
He writes: “Yes, absolutely” But only for some purposes, and for some purposes “they must be used with considerable caution.”
For example, VAMs are useful “for researchers comparing large groups of teachers to investigate the effects of teacher training approaches or educational policies, or simply to investigate the size and importance of long-term teacher effects…[I]t is clear that value-added scores are far superior to unadjusted end-of-year student test scores” (Haertel, 2013, p. 23).
VAMs are indeed better alternatives to the snapshot or status models used for decades past to measure student achievement at one point in time (e.g., like a snapshot), typically once per year. With this, most if not all VAM researchers agree, largely because these snapshot tests were/are incapable of capturing where students were, in terms of their academic achievement before entering a classroom or school, and where they ended up X months later. VAMs are far-better alternatives to such prior approaches in that VAMs at least attempt to measure student growth over time.
The more nuanced question to ask, though, is whether VAMs offer much more than other statistical and analytical alternatives (e.g, ANCOVA [Analyses of Covariance] and multiple regression models) when conducting such large-scale analyses. I disagree that at the very least VAMs might be useful when conducting large-scale research, because the other methods often used to measure gains or the impacts of large-scale interventions and policies over time seem to fare well as well. If anything, VAMs might make statistical differences more difficult to come by and shrink effect sizes because of the multiple (and sometimes hyper) controls typically used.
VAMs are not the only nor are they the best method to examine things like large-scale interventions and policies. Hence, I disagree with the statement that, “for research purposes, VAM estimates definitely have a place” (Haertel, 2013, p. 23). Plenty of other models perform well for such large-scale educational research studies.
Dr. Haertel also writes, “A considerably riskier use, but one I would cautiously endorse, would be providing individual teachers’ VAM estimates to the teachers themselves and to their principals, provided all 5 of the following critically important conditions are met” (Haertel, 2013, p. 25):
- Scores are based on sound and appropriate student tests;
- Comparisons are limited to homogeneous teacher groups (to facilitate apples to apples comparisons);
- Fixed weights (e.g., 50% of a teacher’s effectiveness score will be based on VAM output) are NOT used and educators retain full rights in terms of how they interpret their VAM scores, in context, all things considered, case-by-case, etc. Educators also retain rights to completely set aside VAM scores — to ignore them completely — if educators have specific information that could plausibly render the VAM output invalid;
- Users are well trained and able to interpret scores validly; and
- Clear and accurate information about VAM levels of uncertainty (e.g., margins of error) are disclosed and continuously made available so that all can be critical consumers of the data that pertain directly to them.
With these assertions I agree, but also with a very cautious endorsement as well. VAMs are not “all” wrong, but they are often wrong, actually too often wrong, especially at the teacher-levels when a lot of “things” cannot be statistically controlled or accounted for.
Accordingly, “VAMs may have a modest place in teacher evaluation systems, but only as an adjunct to other information [and] used in a context where teachers and principals have genuine autonomy in their decisions about using and interpreting teacher effectiveness estimates [i.e. VAM output] in local contexts” (Haertel, 2013, p. 25).