As we all likely know well by now, speakers for both parties during last and this weeks’ Republican and Democratic Conventions, respectively, spoke and in many cases spewed a number of exaggerated, misleading, and outright false claims about multiple areas of American public policy…educational policy included. Hence, many fact-checking journalists, websites, social mediaists, and the like, have since been trying to hold both parties accountable for their facts and make “the actual facts” more evident. For a funny video about all of this, actually, see HBO’s John Oliver’s most recent bit on “last week’s unsurprisingly surprising Republican convention” here (11 minutes) and some of their expressions of “feelings” as “facts.”
Fittingly, The 74 — an (allegedly) non-partisan, honest, and fact-based news site (ironically) covering America’s education system “in crisis,” and publishing articles “backed by investigation, expertise, and experience” and backed by Editor-in-Chief Campbell Brown — took on such a fact-checking challenge in an article senior staff writer Matt Burnum wrote: “Researchers: No Consensus Against Using Test Scores in Teacher Evaluations, Contra Democratic Platform.”
Apparently, what author Barnum actually did to justify the title and contents of his article, however, was (1) take the claim written into the 55-page “2016 Democratic Party Platform” document that: “We [the Democratic Party] oppose…the use of student test scores in teacher and principal evaluations, a practice which has been repeatedly rejected by researchers” (p. 33); then (2) generalize what being “repeatedly rejected by researchers” means, to inferring that a “consensus,” “wholesale,” and “categorical rejection” among researchers “that such scores should not be used whatsoever in evaluation” exists; then (3) proceed to ask a non-random or representative sample of nine researchers on the topic about whether, indeed, his deduced conclusion was true; to (4) ultimately claim that “the [alleged] suggestion that there is a scholarly consensus against using test scores in teacher evaluation is misleading.”
Misleading, rather, is Barnum’s framing of his entire piece, as Barnum twisted the original statement into something more alarmist, which apparently warranted his fact-checking, after which he engaged in a weak convenience-based investigation, with unsubstantiated findings ultimately making the headline of this subsequent article. It seems that those involved in reporting “the actual facts” also need some serious editing and fact-checking themselves in that, “The 74’s poll of just nine researchers [IS NOT]
may not be a representative sample of expert opinion,” whatsoever.
Nonetheless, the nine respondents (also without knowledge of who was contacted but did not respond, i.e., a response rate) included: Dan Goldhaber — Adjunct Professor of Education and Economics at the University of Washington, Bothell; Kirabo Jackson — Associate Professor of Education and Economics at Northwestern University; Cory Koedel — Associate Professor of Economics and Public Policy at the University of Missouri; Matthew Kraft — Assistant Professor of Education and Economics at Brown University; Susan Moore Johnson — Professor of Teacher Policy at Harvard University; Jesse Rothstein — Professor of Public Policy and Economics at the University of California, Berkeley; Matthew Steinberg — Assistant Professor of Educational Policy at the University of Pennsylvania; Katharine Strunk — Associate Professor of Educational Policy at the University of Southern California; Jim Wyckoff — Professor of Educational Policy at the University of Virginia. You can see what appear to be these researchers’ full responses to Barnum’s undisclosed solicitation at the bottom of this article, available again here, noting that the opinions of these nine are individually important as I too would value some of these nine as among (but not representative of) the experts in the area of research (see a fuller list of 37 such experts here, 2/3rds of whom are listed above).
Regardless, and assuming that Barnum’s original misinterpretation was correct, I think how Katharine Strunk put it is likely more representative of the group of researchers on this topic as a whole as based on the research: “I think the research suggests that we need multiple measures — test scores [depending on the extent to which evidence supports low- and more importantly high-stakes use], observations, and others – to rigorously and fairly evaluate teachers.” Likewise, how Jesse Rothstein framed his response, in my opinion, is another takeaway for those looking for what is more likely a more accurate and representative statement on this hypothetical consensus: “the weight of the evidence, and the weight of expert opinion, points to the conclusion that we haven’t figured out ways to use test scores in teacher evaluations that yield benefits greater than costs.”
With that being said, what is likely most the “fact” desired in this particular instance is that “the use of student test scores in teacher and principal evaluations, [IS] a practice which has been repeatedly rejected by researchers.” But it has also been disproportionately promoted by researchers with disciplinary backgrounds in economics (although this is not always the case), and disproportionately rejected so by those with disciplinary backgrounds in education, educational policy, educational measurement and statistics, and the like (although this is not always the case). The bottom line is that reaching a consensus in this area of research is much more difficult than Barnum and others might otherwise assume.
Should one really want to “factually” answer such a question, (s)he would have to more carefully: (1) define the problem and subsequent research question (e.g., the platform never claimed in the first place that said “consensus” existed), (2) engage in background research to (3) methodically define the population of researchers from which (4) the research sample is to be drawn to adequately represent the population, after which (5) an appropriate response rate is to be secured. If there are methodological weaknesses in any of these steps, the research exercise should likely stop, as Barnum should have during step #1 in this case here.