Recall the Chetty et al. study at focus of many posts on this blog (see for example here, here, and here)? The study was cited in President Obama’ 2012 State of the Union address when Obama said, “We know a good teacher can increase the lifetime income of a classroom by over $250,000,” and this study was more recently the focus of attention when the judge in Vergara v. California cited Chetty et al.’s study as providing evidence that “a single year in a classroom with a grossly ineffective teacher costs students $1.4 million in lifetime earnings per classroom.” Well, this study is at the source of a new, and very interesting VAM-based debate, again.
This time, new research conducted by Berkeley Associate Professor of Economics – Jesse Rothstein – provides evidence that puts the aforementioned Chetty et al. results under another appropriate light. While Rothstein and others have written critiques of the Chetty et al. study prior (see prior reviews here, here, here, and here), what Rothstein recently found (in his working, not-yet-peer-reviewed-study here) is that by using “teacher switching” statistical procedures, Chetty et al. masked evidence of bias in their prior study. While Chetty et al. have repeatedly claimed bias was not an issue (see for example a series of emails on this topic here), it seems indeed it was.
While Rothstein replicated Chetty et al.’s overall results using a similar dataset, Rothstein did not replicate Chetty et al.’s findings when it came to bias. As mentioned, Chetty et al. used a process of “teacher switching” to test for bias in their study, and by doing so found, with evidence, that bias did not exist in their value-added output. Rothstein found that when “teacher switching” is appropriately controlled, however, “bias accounts for about 20% of the variance in [VAM] scores.” This makes suspect, more now than before, Chetty et al.’s prior assertions that their model, and their findings, were immune to bias.
What this means, as per Rothstein, is that “teacher switching [the process used by Chetty et al.] is correlated with changes in students’ prior grade scores that bias the key coefficient toward a finding of no bias.” Hence, there was a reason Chetty et al. did not find bias in their value-added estimates because they did not use the proper, statistical controls to control for bias in the first place. When properly controlled, or adjusted, estimates yield “evidence of moderate bias;” hence, “[t]he association between [value-added] and long-run outcomes is not robust and quite sensitive to controls.”
This has major implications in the sense that this makes suspect the causal statements also made by Chetty et al. and repeated by President Obama, the Vergara v. California judge, and others – that “high value-added” teachers caused students to ultimately realize higher long-term incomes, fewer pregnancies, etc. X years down the road. If Chetty et al. did not appropriately control for bias, which again Rothstein argues with evidence they did not, it is likely that students would have realized these “things” almost if not entirely regardless of their teachers or what “value” their teachers purportedly “added” to their learning X years prior.
In other words, students were likely not randomly assigned to classrooms in either the Chetty et al. or the Rothstein datasets (making these datasets comparable). So if the statistical controls used did not effectively “control for” the lack of non random assignment of students into classrooms, teachers may have been assigned high value-added scores not necessarily because they were high value-added teachers but because they were non-randomly assigned higher performing, higher aptitude, etc. students, in the first place and as a whole. Thereafter, they were given credit for the aforementioned long-term outcomes, regardless.
If the name Jesse Rothstein sounds familiar, it should. I have referenced his research in prior posts here, here, and here, as he is well-known in the area of VAM research, in particular, for a series of papers in which he provided evidence that students who are assigned to classrooms in non-random ways can create biased, teacher-level value-added scores. If random assignment was the norm (i.e., whereas students are randomly assigned to classrooms and, ideally, teachers are randomly assigned to teach those classrooms of randomly assigned students), teacher-level bias would not be so problematic. However, given research I also recently conducted on this topic (see here), random assignment (at least in the state of Arizona) occurs 2% of the time, at best. Principals otherwise outright reject the notion as random assignment is not viewed as in “students’ best interests,” regardless of whether randomly assigning students to classrooms might mean “more accurate” value-added output as a result.
So it seems, we either get the statistical controls right (which I doubt is possible) or we randomly assign (which I highly doubt is possible). Otherwise, we are left wondering whether value-added analyses will ever work as per their intended (and largely ideal) purposes, especially when it comes to evaluating and holding accountable America’s public school teachers for their effectiveness.
In case you’re interested, Chetty et al have responded to Rothstein’s critique. Their full response can be accessed here. Not surprisingly, they first highlight that Rothstein (and another set of their colleagues at Harvard), replicated their results. That “value-added (VA) measures of teacher quality show very consistent properties across different settings” is that on which Chetty et al. focus first and foremost. What they dismiss, however, is whether the main concerns raised by Rothstein threaten the validity of their methods, and their conclusions. They also dismiss the fact that Rothstein addressed Chetty et al.’s counterpoints before they published them, in Appendix B of his paper given Chetty et al. shared their concerns with Rothstein prior to his study’s release.
Nonetheless, the concerns Chetty et al. attempt to counter are whether their “teacher-switching” approach was invalid, and whether the “exclusion of teachers with missing [value-added] estimates biased the[ir]conclusion[s]” as well. The extent to which missing data bias value-added estimates has also been discussed prior when statisticians force the assumption in their analyses that missing data are “missing at random” (MAR), which is a difficult (although for some like Chetty et al, necessary) assumption to swallow (see, for example, the Braun 2004 reference here).