As per the Merriam-Webster dictionary, the word wizardry is defined as something that is “very impressive in a way that seems magical.” It includes the “magical things done by a wizard.” While educational statisticians of all sorts have certainly engaged in statistical wizardry in one form or another, across many states for many years past, especially when it comes to working VAM magic, the set of statistical wizards in the land of enchantment — New Mexico — are at it again (see prior posts about this state here, here, and here).
In an article recently released in the Albuquerque Journal, news staff wrote an article titled, “Teacher Evaluations Show Dip in ‘Effective’ Rating.” The full headline should have read more explicitly that across the state “the percentage of effective teachers decreased while the percentage of highly effective and exemplary teachers rose.”
What is highlighted and advanced (as a causal conjecture) is that “the effects” of the state’s teacher evaluations for the academic 2014-15 year, given the state’s evaluation system’s “overhaul” (i.e., on average teachers are now to be evaluated 50% using student test scores, 40% using observational scores, and 10% using other “multiple measures,” including attendance), was the cause of the aforementioned decrease and both increases.
That is, the state system not only helped with (1) the more accurate identification and labeling of even more ineffective teachers, it also helped with, albeit in contradiction, (2) the improvement of other teachers who were otherwise accurately positioned the year prior. The teachers on the left side of the bell curve (see below) were more accurately identified this year, and the teachers on the “right” side became more effective due to the new and improved teacher evaluation system constructed by the state…and what might be renamed the Hogwarts Department of Education, led by Hanna Skandera – the state’s Voldemort – who, in this article pointed out that these results evidence (and I use that term loosely) “that the system is doing a better job of pointing out good teachers.”
But is this really the reality, oh wise one of the dark arts?
Here is the primary figure of interest:
What is illustrated are New Mexico’s teachers by proportion and by score (i.e., Ineffective to Exemplary) covering the 2013-2014 and 2014-2015 years. More importantly what is evidenced here, though, is yet another growing trend across the country, although New Mexico is one state taking the lead in this regard, especially in terms of publicity.
The trend is that instead of having such figures with 99% of teachers being rated as satisfactory or above (see “The Widget Effect” report here), these new and improved teacher evaluation systems are to distribute teachers’ evaluation scores around a normal curve, that is more likely true, whereas many more teachers are to be identified for their ineffectiveness.
Apparently, it’s working! Or is it…
This can occur, regardless of what is actually happening in terms of actual effectiveness across America’s classrooms, when the purported value that teachers add to or detract from student learning (i.e., 50% of the state’s model) is to substantively count, because VAM output is not calculated in absolute terms, but rather in relative or normative terms. Herein lies the potion to produce the policy results so desired.
VAM-based scores can be easily constructed and manufactured by those charged with constructing such figures and graphs, also because tests themselves are also constructed to fit normal curves; hence, it is actually quite easy to distribute such scores around a bell curve, even if the data do not look nearly as clean from the beginning (they never do) and even if these figures do not reflect reality.
Regardless, such figures are often used because they give the public easy-to-understand illustrations, that lead to commonsensical perceptions that teachers are not only widely varying in terms of their effectiveness, but also that new and improved evaluation systems are helping to better differentiate and identify teachers in terms of their variation in (in)effectiveness.
However, before we accept these figures and the text around them as truth, we must agree that such a normal curve is actually a reflection of reality. We must also question whether for every high performing teacher, we must have another teacher performing equally bad, and vice versa. Generalizing upwards, we must also question whether 50% of all of America’s public school teachers are actually effective as compared to the other 50% who are not. Where some teachers get better, must other teachers get worse? For every one who succeeds must we have one who fails? For those of you who might be familiar, recall the debate surrounding The Bell Curve, as this is precisely what we are witnessing here.
By statistical design, in such cases, there will always be some teachers who will appear relatively less effective simply because they fall on the wrong side of the mean, and vice versa, but nothing here (or elsewhere as per similar graphs and figures) is actually a “true” indicator of the teacher’s actual effectiveness. This is yet another assumption that must be kept in check, especially when grand wizards claim that the new teacher evaluation systems they put in place caused such magical moments.