Following up on my most recent post about whether “The Test Matters,” a colleague of mine from the University of Southern California (USC) – Morgan Polikoff whom I also referenced on this blog this past summer as per a study he and Andrew Porter (University of Pennsylvania) released about the “Surprisingly Weak” Correlations among VAMs and Other Teacher Quality Indicators” – wrote me an email. In this email he sent another study very similar to the one referenced above, about whether “The Test Matters.” But this study is titled “Does the Test Matter?” (see full reference below).
Given this is coming from a book chapter included within a volume capturing many of the Bill & Melinda Gates Measures of Effective Teaching (MET) studies – studies that have also been at the sources of prior posts here and here – I thought it even more appropriate to share this with you all given book chapters are sometimes hard to access and find.
Like the authors of “The Test Matters,” Polikoff used MET data to investigate whether large-scale standardized state tests “differ in the extent to which they reflect the content or quality of teachers’ instruction” (i.e., tests’ instructional sensitivity).” He also investigated whether differences in instructional sensitivity affect recommendations made in the MET reports for creating multiple-measure evaluation systems.
Polikoff found that “state tests indeed vary considerably in their correlations with observational and student survey measures of effective teaching.” These correlations, as they were in the previous article/post cited above, were statistically significant, positive, and fell in the very weak range in mathematics (0.15 < r < 0.19) and very weak range in English/language arts (0.07 < r < 0.15). This, and other noted correlations that approach zero (i.e., r = 0 or no correlation at all), all, as per Polikoff, indicate “weak sensitivity to instructional quality.”
Put differently, while this indicates that the extent to which state tests are instructionally sensitive appears slightly more favorable in mathematics versus English/language arts, we might otherwise conclude that the state tests used in at least the MET partner states “are only weakly sensitive to pedagogical quality, as judged by the MET study instruments” – the instruments that in many ways are “the best” we have thus far to offer in terms of teacher evaluation (i.e., observational, and student survey instruments).
But why does instructional sensitivity matter? Because the inferences made from these state test results, independently or more likely post VAM calculation “rely on the assumption that [state test] results accurately reflect the instruction received by the students taking the test. This assumption is at the heart of [investigations] of instructional sensitivity.” See another related (and still troublesome) post about instructional sensitivity here.
Polikoff, M. S. (2014). Does the Test Matter? Evaluating teachers when tests differ in their sensitivity to instruction. In T. J. Kane, K. A. Kerr, & R. C. Pianta (Eds.). Designing teacher evaluation systems: New guidance from the Measures of Effective Teaching project (pp. 278-302). San Francisco, CA: Jossey-Bass.