States’ Teacher Evaluation Systems Now “All over the Map”

We are now just one year past the federal passage of the Every Student Succeeds Act (ESSA), within which it is written that states must no longer set up teacher-evaluation systems based in significant part on their students’ test scores. As per a recent article written in Education Week, accordingly, most states are still tinkering with their teacher evaluation systems—particularly regarding the student growth or value-added measures (VAMs) that were also formerly required to help states assesses teachers’ purported impacts on students’ test scores over time.

“States now have a newfound flexibility to adjust their evaluation systems—and in doing so, they’re all over the map.” Likewise, though, “[a] number of states…have been moving away from [said] student growth [and value-added] measures in [teacher] evaluations,” said a friend, colleague, co-editor, and occasional writer on this blog (see, for example, here and here) Kimberly Kappler Hewitt (University of North Carolina at Greensboro).  She added that this is occurring “whether [this] means postponing [such measures’] inclusion, reducing their percentage in the evaluation breakdown, or eliminating those measures altogether.”

While states like Alabama, Iowa, and Ohio seem to still be moving forward with the attachment of students’ test scores to their teachers, other states seem to be going “back and forth” or putting a halt to all of this altogether (e.g, California). Alaska cut back the weight of the measure, while New Jersey tripled the weight to count for 30% of a teacher’s evaluation score, and then introduced a bill to reduce it back to 0%. In New York teacher are to still receive a test-based evaluation score, but it is not to be tied to consequences and completely revamped by 2019. In Alabama a bill that would have tied 25% of a teacher’s evaluation to his/her students’ ACT and ACT Aspire college-readiness tests has yet to see the light of day. In North Carolina state leaders re-framed the use(s) of such measures to be more for improvement tool (e.g., for professional development), but not “a hammer” to be used against schools or teachers. The same thing is happening in Oklahoma, although this state is not specifically mentioned in this piece.

While some might see all of this as good news — or rather better news than what we have seen for nearly the last decade during which states, state departments of education, and practitioners have been grappling with and trying to make sense of student growth measures and VAMs — others are still (and likely forever will be) holding onto what now seems to be some of the now unclenched promises attached to such stronger accountability measures.

Namely in this article, Daniel Weisberg of The New Teacher Project (TNTP) and author of the now famous “Widget Effect” report — about “Our National Failure to Acknowledge and Act on Differences in Teacher Effectiveness” that helped to “inspire” the last near-decade of these policy-based reforms — “doesn’t see states backing away” from using these measures given ESSA’s new flexibility. We “haven’t seen the clock turn back to 2009, and I don’t think [we]’re going to see that.”

Citation: Will, M. (2017). States are all over the map when it comes to how they’re looking to approach teacher-evaluation systems under ESSA. Education Week. Retrieved from

The Elephant in the Room – Fairness

While VAMs have many issues pertaining, fundamentally, to their levels of reliability, validity, and bias, they are wholeheartedly unfair. This is one thing that is so very important but so rarely discussed when those external to VAM-based metrics and metrics use are debating, mainly the benefits of VAMs.

Issues of “fairness” arise when a test, or more likely its summative (i.e., summary and sometimes consequential) and formative (i.e., informative) uses, impact some more than others in unfair yet often important ways. In terms of VAMs, the main issue here is that VAM-based estimates can be produced for only approximately 30-40% of all teachers across America’s public schools. The other 60-70%, which sometimes includes entire campuses of teachers (e.g., early elementary and high school teachers), cannot altogether be evaluated or “held accountable” using teacher- or individual-level VAM data.

Put differently, what VAM-based data provide, in general, “are incredibly imprecise and inconsistent measures of supposed teacher effectiveness for only a tiny handful [30-40%] of teachers in a given school” (see reference here). But this is often entirely overlooked, not only in the debates surrounding VAM use (and abuse) but also in the discussions surrounding how many taxpayer-derived funds are still being used to support such a (purportedly) reformatory overhaul of America’s public education system. The fact of the matter is that VAMs only directly impact the large minority.

While some states and districts are rushing into adopting “multiple measures” to alleviate at least some of these issues with fairness, what state and district leaders don’t entirely understand is that this, too, is grossly misguided. Should any of these states and districts also tie serious consequences to such output (e.g., merit pay, performance plans, teacher termination, denial of tenure), or rather tie serious consequences to measures of growth derived via any varieties of the “multiple assessment” that can be pulled from increasingly prevalent multiple assessment “menus,” states and districts are also setting themselves for lawsuits…no joke! Starting with the basic psychometrics, and moving onto the (entire) lack of research in support of using more “off-the-shelf” tests to help alleviate issues with fairness, would be the (easy) approach to take in a court of law as, really, doing any of this is entirely wrong.

School-level value-added is also being used to accommodate the issue of “fairness,” just less frequently now than before given the aforementioned “multiple assessment” trends. Regardless, many states and districts also continue to attribute a school-level aggregate score to teachers who do not teach primarily reading/language arts and mathematics, primarily in grades 3-8. That’s right, a majority of teachers receive a value-added score that is based on students whom they do not teach. This also calls for legal recourse, also in that this has been a contested issue within all of the lawsuits in which I’ve thus far been engaged.