Many of you will recall a post I made public in January including “Houston Lawsuit Update[s], with Summar[ies] of Expert Witnesses’ Findings about the EVAAS” (Education Value-Added Assessment System sponsored by SAS Institute Inc.). What you might not have recognized since, however, was that I pulled the post down a few weeks after I posted it. Here’s the back story.
In January 2016, the Houston Federation of Teachers (HFT) published an “EVAAS Litigation Update,” which summarized a portion of Dr. Jesse Rothstein’s expert report in which he conclude[d], among other things, that teachers do not have the ability to meaningfully verify their EVAAS scores. He wrote that “[a]t most, a teacher could request information about which students were assigned to her, and could read literature — mostly released by SAS, and not the product of an independent investigation — regarding the properties of EVAAS estimates.” On January 10, 2016, I posted the post: “Houston Lawsuit Update, with Summary of Expert Witnesses’ Findings about the EVAAS” summarizing what I considered to be the twelve key highlights of HFT’s “EVAAS Litigation Update,” in which I highlighted Rothstein’s above conclusions.
Lawyers representing SAS Institute Inc. charged that this post, along with the more detailed “EVAAS Litigation Update” I summarized within the post (authored by the Houston Federation of Teachers (HFT) to keep their members in Houston up-to-date on the progress of this lawsuit) violated a protective order that was put in place to protect SAS’s EVAAS computer source code. Even though there is/was nothing in the “EVAAS Litigation Update” or the blog post that disclosed the source code, SAS objected to both as disclosing conclusions that, SAS said, could not have been reached in the absence of a review of the source code. They threatened HFT, its lawyers, and its experts (myself and Dr. Rothstein) with monetary sanctions. HFT went to court in order to get the court’s interpretation of the protective order and to see if a Judge agreed with SAS’s position. In the meantime, I removed the prior post (which is now back up here).
The great news is that the Judge found in HFT’s favor. He found that neither the “EVAAS Litigation Update” nor the related blog post violated the protective order. Further, he found that “we” have the right to share other updates on the Houston lawsuit, which is still pending, as long as the updates do not violate the protective order still in place. This includes discussion of the conclusions or findings of experts, provided that the source code is not disclosed, either explicitly or by necessary implication.
In more specific terms, as per his ruling in his Court Order, the judge ruled that SAS Institute Inc.’s lawyers “interpret[ed] the protective order too broadly in this instance. Rothstein’s opinion regarding the inability to verify or replicate a teacher’s EVAAS score essentially mimics the allegations of HFT’s complaint. The Litigation Update made clear that Rothstein confirmed this opinion after review of the source code; but it [was] not an opinion ‘that could not have been made in the absence of [his] review’ of the source code. Rothstein [also] testified by affidavit that his opinion is not based on anything he saw in the source code, but on the extremely restrictive access permitted by SAS.” He added that “the overly broad interpretation urged by SAS would inhibit legitimate discussion about the lawsuit, among both the union’s membership and the public at large.” That, also in his words, would be an “unfortunate result” that should, in the future, be avoided.
- Large-scale standardized tests have never been validated for their current uses. In other words, as per my affidavit, “VAM-based information is based upon large-scale achievement tests that have been developed to assess levels of student achievement, but not levels of growth in student achievement over time, and not levels of growth in student achievement over time that can be attributed back to students’ teachers, to capture the teachers’ [purportedly] causal effects on growth in student achievement over time.”
- The EVAAS produces different results from another VAM. When, for this case, Rothstein constructed and ran an alternative, albeit sophisticated VAM using data from HISD both times, he found that results “yielded quite different rankings and scores.” This should not happen if these models are indeed yielding indicators of truth, or true levels of teacher effectiveness from which valid interpretations and assertions can be made.
- EVAAS scores are highly volatile from one year to the next. Rothstein, when running the actual data, found that while “[a]ll VAMs are volatile…EVAAS growth indexes and effectiveness categorizations are particularly volatile due to the EVAAS model’s failure to adequately account for unaccounted-for variation in classroom achievement.” In addition, volatility is “particularly high in grades 3 and 4, where students have relatively few[er] prior [test] scores available at the time at which the EVAAS scores are first computed.”
- EVAAS overstates the precision of teachers’ estimated impacts on growth. As per Rothstein, “This leads EVAAS to too often indicate that teachers are statistically distinguishable from the average…when a correct calculation would indicate that these teachers are not statistically distinguishable from the average.”
- Teachers of English Language Learners (ELLs) and “highly mobile” students are substantially less likely to demonstrate added value, as per the EVAAS, and likely most/all other VAMs. This, what we term as “bias,” makes it “impossible to know whether this is because ELL teachers [and teachers of highly mobile students] are, in fact, less effective than non-ELL teachers [and teachers of less mobile students] in HISD, or whether it is because the EVAAS VAM is biased against ELL [and these other] teachers.”
- The number of students each teacher teaches (i.e., class size) also biases teachers’ value-added scores. As per Rothstein, “teachers with few linked students—either because they teach small classes or because many of the students in their classes cannot be used for EVAAS calculations—are overwhelmingly [emphasis added] likely to be assigned to the middle effectiveness category under EVAAS (labeled “no detectable difference [from average], and average effectiveness”) than are teachers with more linked students.”
- Ceiling effects are certainly an issue. Rothstein found that in some grades and subjects, “teachers whose students have unusually high prior year scores are very unlikely to earn high EVAAS scores, suggesting that ‘ceiling effects‘ in the tests are certainly relevant factors.” While EVAAS and HISD have previously acknowledged such problems with ceiling effects, they apparently believe these effects are being mediated with the new and improved tests recently adopted throughout the state of Texas. Rothstein, however, found that these effects persist even given the new and improved.
- There are major validity issues with “artificial conflation.” This is a term I recently coined to represent what is happening in Houston, and elsewhere (e.g., Tennessee), when district leaders (e.g., superintendents) mandate or force principals and other teacher effectiveness appraisers or evaluators, for example, to align their observational ratings of teachers’ effectiveness with value-added scores, with the latter being the “objective measure” around which all else should revolve, or align; hence, the conflation of the one to match the other, even if entirely invalid. As per my affidavit, “[t]o purposefully and systematically endorse the engineering and distortion of the perceptible ‘subjective’ indicator, using the perceptibly ‘objective’ indicator as a keystone of truth and consequence, is more than arbitrary, capricious, and remiss…not to mention in violation of the educational measurement field’s Standards for Educational and Psychological Testing” (American Educational Research Association (AERA), American Psychological Association (APA), National Council on Measurement in Education (NCME), 2014).
- Teaching-to-the-test is of perpetual concern. Both Rothstein and I, independently, noted concerns about how “VAM ratings reward teachers who teach to the end-of-year test [more than] equally effective teachers who focus their efforts on other forms of learning that may be more important.”
- HISD is not adequately monitoring the EVAAS system. According to HISD, EVAAS modelers keep the details of their model secret, even from them and even though they are paying an estimated $500K per year for district teachers’ EVAAS estimates. “During litigation, HISD has admitted that it has not performed or paid any contractor to perform any type of verification, analysis, or audit of the EVAAS scores. This violates the technical standards for use of VAM that AERA specifies, which provide that if a school district like HISD is going to use VAM, it is responsible for ‘conducting the ongoing evaluation of both intended and unintended consequences’ and that ‘monitoring should be of sufficient scope and extent to provide evidence to document the technical quality of the VAM application and the validity of its use’ (AERA Statement, 2015).
- EVAAS lacks transparency. AERA emphasizes the importance of transparency with respect to VAM uses. For example, as per the AERA Council who wrote the aforementioned AERA Statement, “when performance levels are established for the purpose of evaluative decisions, the methods used, as well as the classification accuracy, should be documented and reported” (AERA Statement, 2015). However, and in contrast to meeting AERA’s requirements for transparency, in this district and elsewhere, as per my affidavit, the “EVAAS is still more popularly recognized as the ‘black box’ value-added system.”
- Related, teachers lack opportunities to verify their own scores. This part is really interesting. “As part of this litigation, and under a very strict protective order that was negotiated over many months with SAS [i.e., SAS Institute Inc. which markets and delivers its EVAAS system], Dr. Rothstein was allowed to view SAS’ computer program code on a laptop computer in the SAS lawyer’s office in San Francisco, something that certainly no HISD teacher has ever been allowed to do. Even with the access provided to Dr. Rothstein, and even with his expertise and knowledge of value-added modeling, [however] he was still not able to reproduce the EVAAS calculations so that they could be verified.”Dr. Rothstein added, “[t]he complexity and interdependency of EVAAS also presents a barrier to understanding how a teacher’s data translated into her EVAAS score. Each teacher’s EVAAS calculation depends not only on her students, but also on all other students with- in HISD (and, in some grades and years, on all other students in the state), and is computed using a complex series of programs that are the proprietary business secrets of SAS Incorporated. As part of my efforts to assess the validity of EVAAS as a measure of teacher effectiveness, I attempted to reproduce EVAAS calculations. I was unable to reproduce EVAAS, however, as the information provided by HISD about the EVAAS model was far from sufficient.”