A research article was recently published in the peer-reviewed Education Policy Analysis Archives journal titled “The Stability of Teacher Performance and Effectiveness: Implications for Policies Concerning Teacher Evaluation.” This articles was published by professors/doctoral researchers from Baylor University, Texas A & M, & University of South Carolina. Researchers set out to explore the stability (or fluctuations) observed across 132 teachers’ effectiveness ratings across 23 schools in South Carolina over time using both observational and VAM-based output.
Researchers found, not surprisingly given prior research in this area, that neither teacher performance using value-added nor effectiveness using observations was highly stable over time. This is most problematic when “sound decisions about continued employment, tenure, and promotion are predicated on some degree of stability over time. It is imprudent to make such decisions of the performance and effectiveness of a teacher as “Excellent” one year and “Mediocre” the next.”
They also observed “a generally weak positive relationship between the two sets of ratings [i.e., value-added estimates and observations], which has also been the source of many literature studies.
On both of these findings, really, we are well pas the point of saturation. That is, we could not have more agreement across research studies on the following (1) that teacher-level value-added scores are highly unstable over time and (2) that these value-added scores do not align well with observational scores, as they should if both measures were to be appropriate capturing the “teacher effectiveness” construct.
Another interesting finding from this study, discussed before but that has not yet reached the point of saturation in the research literature like the prior two is how (3) different teacher performance ratings as based on observational data are also markedly different across schools. At issue here is “that performance ratings may be school-specific.” Or as per a recent post on this blog, that there is indeed much “Arbitrariness Inherent in Teacher Observations.” This is also highly problematic in that where a teacher might be housed might determine more his/her ratings based not necessarily (or entirely) on his/her actual “quality” or “effectiveness” but his/her location, his/her rater, and his/her rater’s scoring approach given differential tendencies towards leniency, or severity. This might leave us with more of a luck-of-the-draw approach than an actually “objective” measurement of true teacher quality, contrary to current and popular (especially policy) beliefs.
Accordingly, and also per the research, this is not getting much better in that, as per the authors of this article as well as many other scholars, (1) “the variance in value-added scores that can be attributed to teacher performance rarely exceeds 10 percent; (2) in many ways “gross” measurement errors that in many ways come, first, from the tests being used to calculate value-added; (3) the restricted ranges in teacher effectiveness scores also given these test scores and their limited stretch, and depth, and instructional insensitivity — this was also at the heart of a recent post whereas in what demonstrated that “the entire range from the 15th percentile of effectiveness to the 85th percentile of [teacher] effectiveness [using the EVAAS] cover[ed] approximately 3.5 raw score points [given the tests used to measure value-added];” (4) context or student, family, school, and community background effects that simply cannot be controlled for, or factored out; (5) especially at the classroom/teacher level when students are not randomly assigned to classrooms (and teachers assigned to teach those classrooms)… although this will likely never happen for the sake of improving the sophistication and rigor of the value-added model over students’ “best interests.”