Recall that the peer-reviewed journal Educational Researcher (ER) – recently published a “Special Issue” including nine articles examining value-added measures (VAMs). I have reviewed the next of nine articles (#7 of 9), which is actually a commentary titled “The Value in Value-Added Depends on the Ecology.” This commentary is authored by Henry Braun – Professor of Education and Public Policy, Educational Research, Measurement, and Evaluation at Boston College (also the author of a previous post on this site here).
In this article Braun, importantly, makes explicit the assumptions on which this special issue of ER is based; that is, on assumptions that (1) too many students in America’s public schools are being inadequately educated, (2) evaluation systems as they currently exist “require radical overhaul,” and (3) it is therefore essential to use student test performance with low- and high-stakes attached to improve that which educators do (or don’t do) to adequately address the first assumption. There are counterarguments Braun also offers to readers on each of these assumptions (see p. 127), but more importantly he makes evident that the focus of this special issue is situated otherwise, as in line with current education policies. This special issue, overall, then “raise[s] important questions regarding the potential for high-stakes, test-driven educator accountability systems to contribute to raising student achievement” (p. 127).
Given this context, the “value-added” provided within this special issue, again according to Braun, is that the authors of each of the five main research articles included report on how VAM output actually plays out in practice, given “careful consideration to how the design and implementation of teacher evaluation systems could be modified to enhance the [purportedly, see comments above] positive impact of accountability and mitigate the negative consequences” at the same time (p. 127). In other words, if we more or less agree to the aforementioned assumptions, also given the educational policy context influence, perpetuating, or actually forcing these assumptions, these articles should help others better understand VAMs’ and observational systems’ potentials and perils in practice.
At the same time, Braun encourages us to note that “[t]he general consensus is that a set of VAM scores does contain some useful information that meaningfully differentiates among teachers, especially in the tails of the distribution [although I would argue bias has a role here]. However, individual VAM scores do suffer from high variance and low year-to-year stability as well as an undetermined amount of bias [which may be greater in the tails of the distribution]. Consequently, if VAM scores are to be used for evaluation, they should not be given inordinate weight and certainly not treated as the “gold standard” to which all other indicators must be compared” (p. 128).
Likewise, it’s important to note that IF consequences are to be attached to said indicators of teacher evaluation (i.e., VAM and observational data), there should be validity evidence made available and transparent to warrant the inferences and decisions to be made, and the validity evidence “should strongly support a causal [emphasis added] argument” (p. 128). However, both indicators still face major “difficulties in establishing defensible causal linkage[s]” as theorized, and desired (p. 128); hence, this prevents validity in inference. What does not help, either, is when VAM scores are given precedence over other indicators OR when principals align teachers’ observational scores with the same teachers’ VAM scores given the precedence often given to (what are often viewed as the superior, more objective) VAM-based measures. This sometimes occurs given external pressures (e.g., applied by superintendents) to artificially conflate, in this case, levels of agreement between indicators (i.e., convergent validity).
Related, in the section Braun titles his “Trio of Tensions,” (p. 129) he notes that (1) [B]oth accountability and improvement are undermined, as attested to by a number of the articles in this issue. In the current political and economic climate, [if possible] it will take thoughtful and inspiring leadership at the state and district levels to create contexts in which an educator evaluation system constructively fulfills its roles with respect to both public accountability and school improvement” (p. 129-130); (2) [T]he chasm between the technical sophistication of the various VAM[s] and the ability of educators to appreciate what these models are attempting to accomplish…sow[s] further confusion…[hence]…there must be ongoing efforts to convey to various audiences the essential issues—even in the face of principled disagreements among experts on the appropriate roles(s) for VAM[s] in educator evaluations” (p. 130); and finally (3) [H]ow to balance the rights of students to an adequate education and the rights of teachers to fair evaluations and due process [especially for]…teachers who have value-added scores and those who teach in subject-grade combinations for which value-added scores are not feasible…[must be addressed; this] comparability issue…has not been addressed but [it] will likely [continue to] rear its [ugly] head” (p. 130).
In the end, Braun argues for another “Trio,” but this one including three final lessons: (1) “although the concerns regarding the technical properties of VAM scores are not misplaced, they are not necessarily central to their reputation among teachers and principals. [What is central is]…their links to tests of dubious quality, their opaqueness in an atmosphere marked by (mutual) distrust, and the apparent lack of actionable information that are largely responsible for their poor reception” (p. 130); (2) there is a “very substantial, multiyear effort required for proper implementation of a new evaluation system…[related, observational] ratings are not a panacea. They, too, suffer from technical deficiencies and are the object of concern among some teachers because of worries about bias” (p. 130); and (3) “legislators and policymakers should move toward a more ecological approach [emphasis added; see also the Review of Article (Essay) #6 – on VAMs as tools for “egg-crate” schools here] to the design of accountability systems; that is, “one that takes into account the educational and political context for evaluation, the behavioral responses and other dynamics that are set in motion when a new regime of high-stakes accountability is instituted, and the long-term consequences of operating the system” (p. 130).
If interested, see the Review of Article #1 – the introduction to the special issue here; see the Review of Article #2 – on VAMs’ measurement errors, issues with retroactive revisions, and (more) problems with using standardized tests in VAMs here; see the Review of Article #3 – on VAMs’ potentials here; see the Review of Article #4 – on observational systems’ potentials here; see the Review of Article #5 – on teachers’ perceptions of observations and student growth here; and see the Review of Article (Essay) #6 – on VAMs as tools for “egg-crate” schools here.
Article #7 Reference: Braun, H. (2015). The value in value-added depends on the ecology. Educational Researcher, 44(2), 127-131. doi:10.3102/0013189X15576341