An article about the “Sensitivity of Teacher Value-Added Estimates to Student and Peer Control Variables” was recently published in the peer-reviewed Journal of Research on Educational Effectiveness. While this article is not open-access, here is a link to the article released by Mathematica prior, which nearly mirrors the final published article.
In this study, researchers Matthew Johnson, Stephen Lipscomb, and Brian Gill, all of whom are associated with Mathematica, examined the sensitivity and precision of various VAMs, the extent to which their estimates vary given whether modelers include student- and peer-level background characteristics as control variables, and the extent to which their estimates vary given whether modelers include 1+ years of students’ prior achievement scores, also as control variables. They did this while examining state data, as also compared to what they called District X – a district within the state with three-times more African-American students, two-times more students receiving free or reduced-price lunch, and generally more special educations students than the state average. While the state data included more students, the district data included additional control variables, supporting researchers’ analyses.
Here are the highlights, with thanks to lead author Matthew Johnson for edits and clarifications.
- Different VAMs produced similar results overall [when using the same data], almost regardless of specifications. “[T]eacher estimates are highly correlated across model specifications. The correlations [they] observe[d] in the state and district data range[d] from 0.90 to 0.99 relative to [their] baseline specification.”
This has been something that has been evidenced in the literature prior, when the same models are applied to the same datasets taken from the same sets of tests at the same time. Hence, many critics argue that similar results come about when using different VAMs on the same data because conditions are such that when using the same, fallible, large-scale standardized test data, even the most sophisticated models are processing “garbage in” and “garbage out.” When the tests used and inserted into the same VAM vary, however, even if the tests used are measuring the same constructs (e.g., mathematics learning in grade X), things go haywire. For more information about this, please see Papay (2010) here.
- However, “even correlation coefficients above 0.9 do not preclude substantial amounts of potential misclassification of teachers across performance categories.” The researchers also found that, even with such consistencies, 26% of teachers rated in the bottom quintile were placed in higher performance categories under an alternative model.
- Modeling choices impacted the rankings of teachers in District X in “meaningful” ways, given District X’s varying student demographics compared with those in the state overall. In other words, VAMs that do not include all relevant student characteristics can penalize teachers in districts that serve students who are more disadvantaged than statewide averages.
See an article related to whether VAMs can include all relevant student characteristics, also given the non-random assignment of students to teachers (and teachers to classrooms) here.
Original Version Citation: Johnson, M., Lipscomb, S., & Gill, B. (2013). Sensitivity of teacher value-added estimates to student and peer control variables. Princeton, NJ: Mathematica Policy Research.
Published Version Citation: Johnson, M., Lipscomb, S., & Gill, B. (2013). Sensitivity of teacher value-added estimates to student and peer control variables. Journal of Research on Educational Effectiveness, 8(1), 60-83. doi:10.1080/19345747.2014.967898