As per a recent post about Raj Chetty, the fantabulous study he and his colleagues (i.e., Chetty, Friedman, and Rockoff) “Keep on Giving…,” and a recent set of emails exchanged among Raj, Diane, and me about their (implied) nobel-worthy work in this area, given said fantabulous study, it seems Chetty et al are “et” it “al” again!
Excuse my snarkiness here as I am still in shock that they took it upon themselves to “take on” the American Statistical Association.” Yes, the ASA as critqued by Chetty et al. If that’s not a sign of how Chetty et al feel about their work in this area, then I don’t know what is.”
Given my already close proximity to this debate, I thought it wise to consult a colleague of mine–one who, in fact, is an economist (like Chetty et al) and who is also an expert in this area–to write a response to their newly released “take” on the ASA’s recently released “Statement on Using Value-Added Models for Educational Assessment.” Margarita Pivovarova is also an economist conducting research in this area, and I value her perspective more than many other potential reviewers out there, as she is both smart and wise when it comes to the careful use and interpretation of data and methods like those at the focus of these papers. Here’s what she wrote:
What an odd set of opponents in the interpretation of ASA statement, this one apparently being a set of scholars who self-selected themselves to answer what they seemingly believed others were asking; that is, “What say Chetty et al on this?”
That being said, the overall tone of this response reminds of a “response to the editor” as if, in a more customary manner, ASA would have reviewed [the aforementioned] Chetty et al paper and Chetty et al were invited to respond to the ASA. Interestingly here, though, Chetty et al seemed compelled to take a self-assigned “expert” stance on the ASA’s statement, regardless of the fact that this statement represents the expertise and collective stance of a whole set of highly regarded and highly deserving scholars in this area, who quite honestly would NOT have asked for Chetty et al’s advice on the matter.
This is the ASA, and these are the ASA experts Chetty et al also explicitly marginalize in the reference section of their critique. [Note for yourselves the major lack of alignment between Chetty et al.’s references versus the 70 articles linked to here on this website]. This should cause others pause in terms of why Chetty et al are so extremely selective in their choices of relevant literature – the literature on which they rely “to prove” some of their own statements and “to prove” some of the ASA’s statements false or suspect. The largest chunk of the best and most relevant literature [as again linked to here] is completely left out of their “critique.” This also includes the whole set of peer-reviewed articles published in educational and education finance journals. A search on ERIC (The Educational Resources Information Center) with “value-added” as key words for the last 5 years yields 406 entries, and a similar search in JSTOR (a shared digital library) returning 495. This is quite a deep and rich literature to outright and in its entirety leave out of Chetty et al’s purportedly more informed and expert take on this statement, ESPECIALLY given many in the ASA wrote these aforementioned, marginalized articles! While Chetty et al critique the ASA for not including citations, this is quite customary in a position statement that is meant to be research-based, but not cluttered with the hundreds of articles in support of the association’s position.
Another point of contention surrounds ASA Point #5:, VAMs typically measure correlation, not causation: Effects – positive or negative – attributed to a teacher may actually be caused by other factors that are not captured in the model. Although I should mention that this was taken from within the ASA statement, interpreted, and then defined by Chetty et al as “ASA Point #5.” Notwithstanding, the answer to “ASA Point #5” guides policy implications, so it is important to consider: Do value-added estimates yield causal interpretations as opposed to just descriptive information? In 2004, the editors of the American Educational Research Association’s (AERA) Journal of Educational and Behavioral Statistics (JEBS) published a special issue entirely devoted to value-added models (VAMs). One notable publication in that issue was written by Rubin, Stuart, and Zanutto (2004). Any discussion of the causal interpretations of VAMs cannot be devoid of this foundational article and its potential outcomes framework, or Rubin Causal Model, advanced within this piece and elsewhere (see also Holland, 1986). These authors clearly communicated that value-added estimates cannot be considered causal unless a set of “heroic assumptions” are agreed to and imposed. Accordingly, and pertinent here, “anyone familiar with education will realize that this [is]…fairly unrealistic.” Even back then, Rubin et al suggested we switch gears and instead evaluate interventions and reward incentives as based on the descriptive qualities of the indicators and estimates derived via VAMs.This is something Chetty et al overlooked here and continue to overlook and neglect in their research.
Another concern surrounds how Chetty et al suggest “we” deal with the interpretation of VAM estimates (what they define as ASA Points #2 and #3). The grander question here is: Does one need to be an expert to understand the “truth” behind the numbers? If we, in general, are talking about those who are in charge of making decisions based on these numbers, then the answer is yes, or at least aware of what the numbers do and do not do! While probably different levels of sophistication are required to develop VAMs versus interpret VAM results, interpretation is a science in itself and ultimately requires background knowledge about the model’s design and how the estimates were generated if informed, accurate, and valid inferences are to be drawn. While it is true that using an iPad doesn’t require one to understand the technology behind it, the same cannot be said about the interpretation of this set of statistics upon which people’s lives increasingly depend.
Related, ASA rightfully states that the standardized test scores used in VAMs are not the only outcomes that should be of interest for policy makers and stake-holders (ASA Point #4). In view of recent findings of the role of cognitive versus non-cognitive skills for students’ future well-beings, current agreement is that test scores might not even be one of the most important outcomes capturing a students’ educated self. Addressing this point, Chetty et al, rather, demonstrate how “interpretation matters.” In Jackson’s (2013) study [not surprisingly cited by Chetty et al, likely because it was published by the National Bureau of Educational Research (NBER) – the same organization that published their now infamous study without it being peer or even internally reviewed way back when], Jackson used value-added techniques to estimate teacher effects on the non-cognitive and long-term outcomes of students. One of his main findings was that teachers who are good at boosting test scores are not always the same teachers who have positive and long-lasting outcomes on non-cognitive skills acquisition. In fact, value-added to test scores and to non-cognitive outcomes for the same teacher were then and have since been shown to be only weakly correlated [or related] with one another. Regardless, Chetty et al use the results of standardized test scores to rank teachers and then use those rankings to estimate the effects of “highly-effective” teachers on the long-run outcomes of students, REGARDLESS OF THE RESEARCH in which their study should have been situated (excluding the work of Thomas Kane and Eric Hanushek). As noted before, if value-added estimates from standardized test scores cannot be interpreted as causal, then the effect of “high value-added” teachers on college attendance, earnings, and reduced teenage birth rates CANNOT BE CONSIDERED CAUSAL either.
Lastly, ASA expressed concerns about the sensitivity of value-added estimates to model specifications (ASA Point #6). Likewise, researchers contributing to a now large body of literature have also recently explored this same issue, and collectively they have found that value-added estimates are highly sensitive to the assessments being used, even within the same subject areas (Papay, 2011) and different subject areas taught by the same teachers given different student compositions (Loeb & Candelaria, 2012). Echoing those findings, others (also ironically cited by Chetty et al in the discussion to demonstrate the opposite) have also found that “even when the correlations between performance estimates generated by different models are quite high for the workforce as a whole, there are still sizable differences in teacher rankings generated by different models that are associated with the composition of students in a teacher’s classroom” (Goldhaber, Walch, & Gabele, 2014). And others have noted that “models that are less aggressive in controlling for student-background and schooling-environment information systematically assign higher rankings to more-advantaged schools, and to individuals who teach at these schools” (Ehlert, Koeldel, Parsons, and Podgursky, 2014).”
In sum, these are only a few “points” from this “point-by-point discussion” that would, or rather should, immediately strike anyone fairly familiar with the debate surrounding the use and abuse of VAMs as way off-base. While the debates surrounding VAMs continue, it is so easy for a casual and/or careless observer to lose the big picture behind the surrounding debates and get distracted by technicalities. This certainly seems to be the case here, as Chetty et al continue to argue their technical arguments from a million miles away from the actual schools’, teachers’, and students’ data they study, and from which they make miracles.
*To see VAMboozled!’s initial “take” on the same ASA statement, a take that in our minds still stands as it was not a self-promotional critique but only a straightforward interpretation for our followers, please click here.
Somebody has a rather inflated opinion of himself and his work…..
I think you need a better expert who is an econometrician familiar with VAMs; the one you have is not an expert. More crucially though, I don’t think you need an expert to cast reasonable doubt on any statistical model whose estimates get used for high-stakes testing. Cheers!
While she has not published in peer reviewed journals on this topic, yet, she’s definitely emerging as an expert. We have a few papers going and she gets the issues more than many others I know, hence my “expert” tag. But point taken nonetheless. Thanks.
Would I be wrong to say that VAM and SLOs are not, in fact, an explicit part of federal policies for teacher evaluation, NCLB, Teacher Incentive Fund, RttT. Who might have this answer?
Found the anwers to this question. VAM federally supported in 2006 pilot under Margaret Spellings. SLOs not.. by product of heavy marketing by Slotnick et. al. begining with Denver Pay-for-Performance pilot.
All measurement problems aside, I just don’t see the practicality of his plan. Let’s pretend he is actually correct, and that everything he says actually works. Let’s even pretend that VAMs are perfectly accurate, and always identify the bottom five percent of teachers every time, and no good teachers get fired. So far, so good.
Year 1: Five the bottom 5% of teachers. Check. Replace these teachers… how? Brand new teachers obviously won’t have VAMs to put on their resume, so they are no good. Get teachers from the next district over? Maybe, but their VAMs are not actually comparable to ours. As it turns out, it is entirely possible that a teacher with the highest VAM in her district could be the lowest VAM in a different district with the exact same scores. Ok, that’s a problem. For the sake of argument, let’s pretend we are clairvoyant, and know an incoming teacher’s VAM, and are able to replace the bottom 5% with average teachers. Check.
Life is good now, right? All we need to do is make our principals clairvoyant! Well, actually, not so much.
Year 2: Teach students, test students, calculate VAMs. what is this? I thought we had gotten rid of the bottom 5%, but it is back! What happened? Well, VAMs are a comparative model. You could have better teachers than any other district in the entire country, and you would still have a bottom 5%. Should you still fire those teachers? Who would you hire to replace them? But ok, let’s continue. Fire the bad teachers. Hire better ones. Check. Check.
Year 3: Repeat. Year 4: Repeat. Year 5: Repeat.
Where will you get this supply of amazing teachers I wonder? You had better hope that no one retires or leaves the profession, too.
The lack of validity is certainly a problem, but the lack of practicality is substantially worse.