Some, particularly educational practitioners, might respond with a sense of “duh”-like sarcasm to the title of this post above, but as per a new research study recently released in the highly reputable, peer-reviewed American Educational Research Journal (AERJ), researchers evidenced this very headline via an extensive research study they conducted in the northeast United States. Hence, this title has now been substantiated with empirical evidence.
Researchers David Blazar (Doctoral Candidate at Harvard), Erica Litke (Assistant Professor at University of Delaware), and Johanna Barmore (Doctoral Candidate at Harvard) examined (1) the comparability of teachers’ value-added estimates within and across four urban districts and (2), given the extent to which variations observed, how and whether said value-added estimates consistently captured differences in teachers’ observed, videotaped, and scored classroom practices.
Regarding their first point of investigation, they found that teachers were categorized differently when compared within versus across districts (i.e., when compared to other similar teachers within districts versus across districts, which is a methodological choice that value-added modelers often make). Researchers found and asserted not that either approach yielded more valid interpretations, however. Rather, they evidenced that the differences they observed within and across districts were notable, and these differences had notable implications for validity, whereas a teacher classified as adding X value in one context could be categorized as adding Y value in another, given the context in which (s)he was teaching. In other words, the validity of the inferences to be drawn about potentially any teacher depended greatly on context in which the teacher taught, in that his/her value-added estimate did not necessarily generalize across contexts. Put in their words, “it is not clear whether the signal of teachers’ effectiveness sent by their value-added rankings retains a substantive interpretation across contexts” (p. 326). Inversely put, “it is clear that labels such as highly effective or ineffective based on value-added scores do not have fixed meaning” (p. 351).
Regarding their second point of investigation, they found “stark differences in instructional practices across districts among teachers who received similar within-district value-added rankings” (p. 324). In other words, “when comparing [similar] teachers within districts, value-added rankings signaled differences in instructional quality in some but not all instances” (p. 351), whereas similarly ranked teachers did not necessarily display effective or ineffective teachers. This has also been more loosely evidenced via those who have investigated the correlations between teachers’ value-added and observational scores, and have found weak to moderate correlations (see prior posts on this here, here, here, and here). In the simplest of terms, “value-added categorizations did not signal common sets of instructional practices across districts” (p. 352).
The bottom line here, then, is that those in charge of making consequential decisions about teachers, as based even if in part on teachers’ value-added estimates, need to be cautious when making particularly high-stakes decisions about teachers as based on said estimates. A teacher, as based on the evidence presented in this particular study could logically but also legally argue that had (s)he been teaching in a different district, even within the same state and using the same assessment instruments, (s)he could have received a substantively different value-added score given the teacher(s) to whom (s)he was compared when estimating his/her value-added elsewhere. Hence, the validity of the inferences and statements asserting that one teacher was effective or not as based on his/her value-added estimates is suspect, again, as based on the contexts in which teachers teach, as well as when compared to whatever other comparable teachers to whom teachers are compared when estimating teacher-level value-added. “Here, the instructional quality of the lowest ranked teachers was not particularly weak and in fact was as strong as the instructional quality of the highest ranked teachers in other districts” (p. 353).
This has serious implications, not only for practice but also for the lawsuits ongoing across the nation, especially in terms of those pertaining to teachers’ wrongful terminations, as charged.
Citation: Blazar, D., Litke, E., & Barmore, J. (2016). What does it mean to be ranked a ‘‘high’’ or ‘‘low’’ value-added teacher? Observing differences in instructional quality across districts. American Educational Research Journal, 53(2), 324–359. doi:10.3102/0002831216630407
The value-added model (VAM) is not only used to determine teacher effectiveness but in many districts it is used to regulate a teacher’s performance pay. More and more school districts around the country have moved towards a performance-related pay (PRP) model. The rationale behind this movement includes increasing the motivation and commitment of teachers and improving the quality of instruction to benefit student’s achievement. Typically, student’s achievement is measured by student growth. Some researchers believe that the value-added model can be applied to demonstrate effective teaching. However, a statistical formula should not be yielding inaccurate and biased results. Researchers McCaffrey, Daniel, Lockwood, Koretz, Daniel, Hamilton, Laura with the RAND Corporation (http://eric.ed.gov/?id=ED529961) identified numerous possible sources of error and bias in teacher effects. These researchers recommend a number of steps for future research into these potential errors. The conclusion of their research suggested that there is currently insufficient to support the use of VAM for high-stakes decisions about individual teachers or schools. It is crucial that decision makers are aware of the potential pitfalls when applying VAM to determine both teacher effectiveness and teacher performance-based pay.
This doesn’t mean VAMs are invalid. All it means is that the results are more reliable when comparing teachers’ student growth across a larger set of educators. And it means that there are many different ways to teach. Some teachers connect to kids on a personal level and inspire hard work. Others are better at conveying concepts. Others might have better classroom discipline. To say that teachers obtained similar VAMs by using different methods proves their validity, it doesn’t discount it.