Two weeks ago I published a post about the newly released “American Educational Research Association (AERA) Statement on Use of Value-Added Models (VAM) for the Evaluation of Educators and Educator Preparation Programs.”
In this post I also included a summary of the AERA Council’s eight key, and very important points abut VAMs and VAM use. I also noted that I contributed to this piece in one of its earliest forms. More importantly, however, the person who managed the statement’s external review and also assisted the AERA Council in producing the final statement before it was officially released was Boston College’s Dr. Henry Braun, Boisi Professor of Education and Public Policy and Educational Research, Measurement, and Evaluation.
Just this last week, the Brookings Institution published a critique of the AERA statement for, in my opinion, no other apparent reason than just being critical. The critique was written by Brookings affiliate Michael Hansen and University of Washington Bothell’s Dan Goldhaber, titled a “Response to AERA statement on Value-Added Measures: Where are the Cautionary Statements on Alternative Measures?”
Accordingly, I invited Dr. Henry Braun to respond, and he graciously agreed:
In a recent posting, Michael Hansen and Dan Goldhaber complain that the AERA statement on the use of VAMs does not take a similarly critical stance with respect to “alternative measures”. True enough! The purpose of the statement is to provide a considered, research-based discussion of the issues related to the use of value-added scores for high-stakes evaluation. It culminates in a set of eight requirements to be met before such use should be made.
The AERA statement does not stake out an extreme position. First, it is grounded in the broad research literature on drawing causal inferences from observational data subject to strong selection (i.e., the pairings of teachers and students is highly non-random), as well as empirical studies of VAMs in different contexts. Second, the requirements are consistent with the AERA, American Psychological Association (APA), and National Council on Measurement in Education (NCME) Standards for Educational and Psychological Testing. Finally, its cautions are in line with those expressed in similar statements released by the Board on Testing and Assessment of the National Research Council and by the American Statistical Association.
Hansen and Goldhaber are certainly correct when they assert that, in devising an accountability system for educators, a comparative perspective is essential. One should consider the advantages and disadvantages of different indicators, which ones to employ, how to combine them and, most importantly, consider both the consequences for educators and the implications for the education system as a whole. Nothing in the AERA statement denies the importance of subjecting all potential indicators to scrutiny. Indeed, it states: “Justification should be provided for the inclusion of each indicator and the weight accorded to it in the evaluation process.” Of course, guidelines for designing evaluation systems would constitute a challenge of a different order!
In this context, it must be recognized that rankings based on VAM scores and ratings based on observational protocols will necessarily have different psychometric and statistical properties. Moreover, they both require a “causal leap” to justify their use: VAM scores are derived directly from student test performance, but require a way of linking to the teacher of record. Observational ratings are based directly on a teacher’s classroom performance, but require a way of linking back to her students’ achievement or progress.
Thus, neither approach is intrinsically superior to the other. But the singular danger with VAM scores, being the outcome of a sophisticated statistical procedure, is that they are seen by many as providing a gold standard against which other indicators should be judged. Both the AERA and ASA statements offer a needed corrective, by pointing out the path that must be traversed before an indicator based on VAM scores approaches the status of a gold standard. Though the requirements listed in the AERA statement may be aspirational, they do offer signposts against which we can judge how far we have come along that path.
Henry Braun, Lynch School of Education, Boston College