A reader wrote a very good question (see the VAMmunition post) that I feel is worth “sharing out,” with a short but (hopefully) informative answer that will help others better understand some of “the issues.”
(S)he wrote: “[W]hat exactly would a test look like if it were, indeed, ‘designed to estimate teachers’ causal effects’? Moreover, how different would it be from today’s tests?”
Here is (most of) my response: While large-scale standardized tests are typically limited in both the number and types of items included, among other things, one could use a similar test with more items and more “instructionally sensitive” items to better capture a teacher’s causal effects, quite simply actually. This would be done with the pre and post-tests occurring in the same year while students are being instructed by the same (albeit not only…) teacher. However, this does not happen in any value-added system at this point as these tests are given once per year (typically spring to spring). Hence, student growth scores include prior and other teachers’ effects, as well as the differential learning gains/losses that also occur over the summers during which students have little to no interactions with formal education systems, or their teachers. This “biases” these measures of growth, big time!
The other necessary condition for doing this would be random assignment. If students were randomly assigned to classrooms (and teachers were randomly assigned to classrooms), this would help to make sure that indeed all students are similar at the outset, before what we might term the “treatment” (i.e., how effectively a teacher teaches for X amount of time). However, again, this rarely if ever happens in practice as administrators and teachers (rightfully) see random assignment practices…while great for experimental research purposes, bad for students and their learning! Regardless, some statisticians suggest that their sophisticated controls can “account” for non-random assignment practices, yet again evidence suggests that no matter how sophisticated the controls are, they simply do not work here either.
See, for example, the Hermann et al. (2013), the Newton et al. (2010), and the Rothstein (2009, 2010) citations here, in this blog, under the “VAM Readings” link. I also have an article coming out about this this month, co-authored with one of my doctoral students, in a highly esteemed peer-reviewed journal. Here is the reference if you want to keep an eye out for it. These references should (hopefully) explain all of this with greater depth and clarity: Paufler, N. A. & Amrein-Beardsley, A. (2013, October). The random assignment of students Into elementary classrooms: Implications for value-added analyses and interpretations. American Educational Research Journal. doi: 10.3102/0002831213508299