Some researchers continue to explore the potential worth of value-added models (VAMs) for measuring teacher effectiveness. Not that I endorse the perpetual tweaking of this or twisting of that to explore how VAMs might be made “better” for such purposes, also given the abundance of decades research we now have evidencing the plethora of problems with using VAMs for such purposes, I do try to write about current events including current research published on this topic for this blog. Hence, I write here about a study researchers from Mathematica Policy Research released last month, about whether more teachers might be VAM-eligible (download the full study here).

One of the main issues with VAMs is that they can typically be used to measure the effects of only approximately 30% of all public school teachers. The other 70%, which sometimes includes entire campuses of teachers (e.g., early elementary and high school teachers) or teachers who do not teach the core subject areas assessed using large-scale standardized tests (e.g., mathematics and reading/language arts) cannot be evaluated or held accountable using VAM data. This is more generally termed an issue with *fairness*, defined by our profession’s *Standards for Educational and Psychological Testing* as the impartiality of “test score interpretations for intended use(s) for individuals from *all* [emphasis added] relevant subgroups” (p. 219). Issues of fairness arise when a test, or test-based inference or use impacts some more than others in unfair or prejudiced, yet often consequential ways.

Accordingly, in this study researchers explored whether VAMs can be used to evaluate teachers of subject areas that are only tested occasionally and in non-consecutive grade levels (e.g., science and social studies, for example, in grades 4 and 7 or 5 and 8) using teachers’ students’ other, consecutively administered subject area tests (i.e., mathematics and reading/language arts) can be used to help isolate teachers’ contributions to students’ achievement in said excluded subject areas. Indeed, it is true that “states and districts have little information about how value-added models [VAMs] perform in grades when tests in the same subject are not available from the previous year.” Yet, states (e.g., New Mexico) continue to do this without evidence that it works. This is also one point of contention in the ongoing lawsuit there. Hence, the purpose of this study was to explore (using state-level data from Oklahoma) how well doing this works, again, given the use of such proxy pretests “could allow states and districts to increase the number of teachers for whom value-added models [could] be used” (i.e., increase fairness).

However, researchers found that when doing just this (1) VAM estimates that do not account for a same-subject pretests may be less credible than estimates that use same-subject pretests from prior and adjacent grade levels (note that authors do not explicitly define what they mean by credible but infer the term to be synonymous with valid). In addition, (2) doing this may subsequently lead to relatively more biased VAM estimates, even more so than changing some other features of VAMs, and (3) doing this may make VAM estimates less precise, or reliable. Put more succinctly, using mathematics and reading/language arts as pretest scores to help measure (e.g., science and social studies) teachers’ value-added effects yields VAM estimates that are less credible (aka less valid), more biased, and less precise (aka less reliable).

The authors conclude that “some policy makers might interpret [these] findings as firm evidence against using value-added estimates that rely on proxy pretests [may be] too strong. The choice between different evaluation measures always involves trade-offs, and alternatives to value-added estimates [e.g., classroom observations and student learning objectives {SLOs)] also have important limitations.”

Their suggestion, rather, is for “[p]olicymakers [to] reduce the weight given to value-added estimates from models that rely on proxy pretests relative to the weight given to those of other teachers in subjects with pretests.” With all of this, I disagree. Using this or that statistical adjustment, or shrinkage approach, or adjusted weights, or…etc., is as I said before, at this point frivolous.

Reference: Walsh, E., Dotter, D., & Liu, A. Y. (2018). *Can more teachers be covered? The accuracy, credibility, and precision of value-added estimates with proxy pre-tests.* Washington DC: Mathematica Policy Research. Retrieved from https://www.mathematica-mpr.com/our-publications-and-findings/publications/can-more-teachers-be-covered-the-accuracy-credibility-and-precision-of-value-added-estimates