Recently, I posted a critique of the newly released and highly publicized Mathematica Policy Research study about the (vastly overstated) “value” of value-added measures and their ability to effectively measure teacher quality. The study, which did not go through a peer review process, is wrought with methodological and conceptual problems, which I dismantled in the post, with a consumer alert.
Yet again, VAM enthusiasts are attempting to VAMboozle policymakers and the general public with another faulty study, this time released to the media by the National Bureau of Economic Research (NBER). The “working paper” (i.e., not peer-reviewed, and in this case not even internally reviewed by those at NBER) analyzed the controversial teacher evaluation system (i.e., IMPACT) that was put into place in DC Public Schools (DCPS) under the then Chancellor, Michelle Rhee.
The authors, Thomas Dee and James Wyckoff (2013), present what they term “novel evidence” to suggest that the “uniquely high-powered incentives” linked to “teacher performance” worked to improve the “performance” of high-performing teachers, and that “dismissal threats” worked to increase the “voluntary attrition of low-performing teachers.” The authors, however, and similar to those of the Mathematica study, assert highly troublesome claims based on a plethora of problems that had this study undergone peer review before it was released to the public and before it was hailed in the media, would not have created the media hype that ensued. Hence, it is appropriate to issue yet another consumer alert.
The most major problems include, but are not limited to, the following:
“Teacher Performance:” Probably the largest fatal flaw, or the study’s most major limitation was that only 17% of the teachers included in this study (i.e., teachers of reading and mathematics in grades 4 through 8) were actually evaluated under the IMPACT system for their “teacher performance,” or for that which they contributed to the system’s most valued indicator: student achievement. Rather, 83% of the teachers did not have student test scores available to determine if they were indeed effective (or not) using individual value-added scores. It is implied throughout the paper, as well as the media reports covering this study post release, that “teacher performance” was what was investigated when in fact for four out of five DC teachers their “performance” was evaluated only as per what they were observed doing or self-reported doing all the while. These teachers were evaluated on their “performance” using almost exclusively (except for the 5% school-level value-added indicator) the same subjective measures integral to many traditional evaluation systems as well as student achievement/growth on teacher-developed and administrator-approved classroom-based tests, instead.
Score Manipulation and Inflation: Related, a major study limitation was that the aforementioned indicators that were used to define and observe changes in “teacher performance” (for the 83% of DC teachers) were based almost entirely on highly subjective, highly manipulable, and highly volatile indicators of “teacher performance.” Given the socially constructed indicators used throughout this study were undoubtedly subject to score bias by manipulation and artificial inflation as teachers (and their evaluators) were able to influence their ratings. While evidence of this was provided in the study, the authors banally dismissed this possibility as “theoretically [not really] reasonable.” When using tests, and especially subjective indicators to measure “teacher performance,” one must exercise caution to ensure that those being measured do not engage in manipulation and inflation techniques known to effectively increase the scores derived and valued, particularly within such high-stakes accountability systems. Again, for 83% of the teachers their “teacher performance” indicators were almost entirely manipulable (with the exception of school-level value-added weighted at 5%).
Unrestrained Bias: Related, the authors set forth a series of assumptions throughout their study that would have permitted readers to correctly predict the study’s findings without reading it. This is highly problematic, as well, and this would not have been permitted had the scientific community been involved. Researcher bias can certainly impact (or sway) study findings and this most certainly happened here.
Other problems include gross overstatements (e.g., about how the IMPACT system has evidenced itself as financially sound and sustainable over time), dismissed yet highly complex technical issues (e.g., about classification errors and the arbitrary thresholds the authors used to statistically define and examine whether teachers “jumped” thresholds and became more effective), other over-simplistic treatments of major methodological and pragmatic issues (e.g., cheating in DC Public Schools and whether this impacted outcome “teacher performance” data, and the like.
The claims the authors have asserted in this study are disconcerting, at best. I wouldn’t be as worried if I knew that this paper truly was in a “working” state and still had to undergo peer-review before being released to the public. Unfortunately, it’s too late for this, as NBER irresponsibly released the report without such concern. Now, we as the public are responsible for consuming this study with critical caution and advocating for our peers and politicians to do the same.