Learning from What Doesn’t Work in Teacher Evaluation

One of my doctoral students — Kevin Close — and I just had a study published in the practitioner journal Phi Delta Kappan that I wanted to share out with all of you, especially before the study is no longer open-access or free (see full study as currently available here). As the title indicates, the study is about how states, school districts, and schools can “Learn from What Doesn’t Work in Teacher Evaluation,” given an analysis that the two of us conducted of all documents pertaining to the four teacher evaluation and value-added model (VAM)-centered lawsuits in which I have been directly involved, and that I have also covered in this blog. These lawsuits include Lederman v. King in New York (see here), American Federation of Teachers et al. v. Public Education Department in New Mexico (see here), Houston Federation of Teachers v. Houston Independent School District in Texas (see here), and Trout v. Knox County Board of Education in Tennessee (see here).

Via this analysis we set out to comb through the legal documents to identify the strongest objections, as also recognized by the courts in these lawsuits, to VAMs as teacher measurement and accountability strategies. “The lessons to be learned from these cases are both important and timely” given that “[u]nder the Every Student Succeeds Act (ESSA), local education leaders once again have authority to decide for themselves how to assess teachers’ work.”

The most pertinent and also common issues as per these cases were as follows:

(1) Inconsistencies in teachers’ VAM-based estimates from one year to the next that are sometimes “wildly different.” Across these lawsuits, issues with reliability were very evident, whereas teachers classified as “effective” one year were either theorized or demonstrated to have around a 25%-59% chance of being classified as “ineffective” the next year, or vice versa, with other permutations also possible. As per our profession’s Standards for Educational and Psychological Testing, reliability should, rather, be observed whereby VAM estimates of teacher effectiveness are more or less consistent over time, from one year to the next, regardless of the type of students and perhaps subject areas that teachers teach.

(2) Bias in teachers’ VAM-based estimates were also of note, whereby documents suggested or evidenced that bias, or rather biased estimates of teachers’ actual effects does indeed exist (although this area was also of most contention and dispute). Specific to VAMs, since teachers are not randomly assigned the students they teach, whether their students are invariably more or less motivated, smart, knowledgeable, or capable can bias students’ test-based data, and teachers’ test-based data when aggregated. Court documents, although again not without counterarguments, suggested that VAM-based estimates are sometimes biased, especially when relatively homogeneous sets of students (i.e., English Language Learners (ELLs), gifted and special education students, free-or-reduced lunch eligible students) are non-randomly concentrated into schools, purposefully placed into classrooms, or both. Research suggests that this also sometimes happens regardless of the the sophistication of the statistical controls used to block said bias.

(3) The gaming mechanisms in play within teacher evaluation systems in which VAMs play a key role, or carry significant evaluative weight, were also of legal concern and dispute. That administrators sometimes inflate the observational ratings of their teachers whom they want to protect, while simultaneously offsetting the weight the VAMs sometimes carry was of note, as was the inverse. That administrators also sometimes lower teachers’ ratings to better align them with their “more objective” VAM counterparts were also at issue. “So argued the plaintiffs in the Houston and Tennessee lawsuits, for example. In those systems, school leaders appear to have given precedence to VAM scores, adjusting their classroom observations to match them. In both cases, administrators admitted to doing so, explaining that they sensed pressure to ensure that their ‘subjective’ classroom ratings were in sync with the VAM’s ‘objective’ scores.” Both sets of behavior distort the validity (or “truthfulness”) of any teacher evaluation system and are in violation of the same, aforementioned Standards for Educational and Psychological Testing that call for VAM scores and observation ratings to be kept separate. One indicator should never be adjusted to offset or to fit the other.

(4) Transparency, or the lack thereof, was also a common issue across cases. Transparency, which can be defined as the extent to which something is accessible and readily capable of being understood, pertains to whether VAM-based estimates are accessible and make sense to those at the receiving ends. “Not only should [teachers] have access to [their VAM-based] information for instructional purposes, but if they believe their evaluations to be unfair, they should be able to see all of the relevant data and calculations so that they can defend themselves.” In no case was this more legally pertinent than in Houston Federation of Teachers v. Houston Independent School District in Texas. Here, the presiding judge ruled that teachers did have “legitimate claims to see how their scores were calculated. Concealing this information, the judge ruled, violated teachers’ due process protections under the 14th Amendment (which holds that no state — or in this case organization — shall deprive any person of life, liberty, or property, without due process). Given this precedent, it seems likely that teachers in other states and districts will demand transparency as well.”

In the main article (here) we also discuss what states are now doing to (hopefully) improve upon their teacher evaluation systems in terms of using multiple measures to help to evaluate teachers more holistically. We emphasize the (in)formative versus the summative and high-stakes functions of such systems, and allowing teachers to take ownership over such systems in their development and implementation. I will leave you all to read the full article (here) for these details.

In sum, though, when rethinking states’ teacher evaluation systems, especially given the new liberties afforded to states via the Every Student Succeeds Act (ESSA), educators, education leaders, policymakers, and the like would do well to look to the past for guidance on what not to do — and what to do better. These legal cases can certainly inform such efforts.

Reference: Close, K., & Amrein-Beardsley, A. (2018). Learning from what doesn’t work in teacher evaluation. Phi Delta Kappan, 100(1), 15-19. Retrieved from http://www.kappanonline.org/learning-from-what-doesnt-work-in-teacher-evaluation/

New Mexico Loses Major Education Finance Lawsuit (with Rulings Related to Teacher Evaluation System)

Followers of this blog should be familiar with the ongoing teacher evaluation lawsuit in New Mexico. The lawsuit — American Federation of Teachers – New Mexico and the Albuquerque Federation of Teachers (Plaintiffs) v. New Mexico Public Education Department (Defendants) — is being heard by a state judge who ruled in 2015 that all consequences attached to teacher-level value-added model (VAM) scores (e.g., flagging the files of teachers with low VAM scores) were to be suspended throughout the state until the state (and/or others external to the state) could prove to the state court that the system was reliable, valid, fair, uniform, and the like. This case is set to be heard in court again this November (see more about this case from my most recent update here).

While this lawsuit has been occurring, however, it is important to note that two other very important New Mexico cases (that have since been consolidated into one) have been ongoing since around the same time (2014) — Martinez v. State of New Mexico and Yazzie v. State of New Mexico. Plaintiffs in this lawsuit, filed by the New Mexico Center on Law and Poverty and the Mexican American Legal Defense and Education Fund (MALDEF), argued that the state’s schools are inadequately funded; hence, the state is also denying New Mexico students their constitutional rights to an adequate education.

Last Friday, a different state judge presiding over this case ruled, “in a blistering, landmark decision,” that New Mexico is in fact :violating the constitutional rights of at-risk students by failing to provide them with a sufficient education.” As such, the state, its governor, and its public education department (PED) are “to establish a funding system that meets constitutional requirements by April 15 [of] next year” (see full article here).

As this case does indeed pertain to the above mentioned teacher evaluation lawsuit of interest within this blog, it is also important to note that the judge:

  • “[R]ejected arguments by [Governor] Susana Martinez’s administration that the education system is improving…[and]…that the state was doing the best with what it had” (see here).
  • Emphasized that “New Mexico children [continue to] rank at the very bottom in the country for educational achievement” (see here).
  • Added that “New Mexico doesn’t have enough teachers…[and]…New Mexico teachers are among the lowest paid in the country” (see here).
  • “[S]uggested the state teacher evaluation system ‘may be contributing to the lower quality of teachers in high-need schools…[also given]…punitive teacher evaluation systems that penalize teachers for working in high-need schools contribute to problem in this category of schools” (see here).
  • And concluded that all of “the programs being lauded by PED are not changing this [bleak] picture” (see here) and, more specifically, “offered a scathing assessment of the ways in which New Mexico has failed its children,” again, taking “particular aim at the state’s punitive teacher evaluation system” (see here).

Apparently, the state plans to appeal the decision (see a related article here).

Fired “Ineffective” Teacher Wins Battle with DC Public Schools

In November of 2013, I published a blog post about a “working paper” released by the National Bureau of Economic Research (NBER) and written by authors Thomas Dee – Economics and Educational Policy Professor at Stanford, and James Wyckoff – Economics and Educational Policy Professor at the University of Virginia. In the study titled “Incentives, Selection, and Teacher Performance: Evidence from IMPACT,” Dee and Wyckoff (2013) analyzed the controversial IMPACT educator evaluation system that was put into place in Washington DC Public Schools (DCPS) under the then Chancellor, Michelle Rhee. In this paper, Dee and Wyckoff (2013) presented what they termed to be “novel evidence” to suggest that the “uniquely high-powered incentives” linked to “teacher performance” via DC’s IMPACT initiative worked to improve the performance of high-performing teachers, and that dismissal threats worked to increase the voluntary attrition of low-performing teachers, as well as improve the performance of the students of the teachers who replaced them.

I critiqued this study in full (see both short and long versions of this critique here), ultimately asserting that the study had “fatal flaws” which compromised the exaggerated claims Dee and Wyckoff (2013) advanced. This past January (2017) they published another report, titled “Teacher Turnover, Teacher Quality, and Student Achievement in DCPS,” which was also (prematurely) released as a “working paper” by the same NBER. I also critiqued this study here).

Anyhow, a public interest story that should be of interest to followers of this blog was published two days ago in The Washington Post. The article, “I’ve Been a Hostage for Nine Years’: Fired Teacher Wins Battle with D.C. Schools,” details one fired, now 53-year old, veteran’s teachers last nine years after being one of nearly 1,000 educators fired during the tenure of Michelle Rhee. He was fired after district “leaders,” using the IMPACT system and a teacher evaluation system prior, deemed him “ineffective.” He “contested his dismissal, arguing that he was wrongly fired and that the city was punishing him for being a union activist and for publicly criticizing the school system.” That he made a significant salary at the time (2009) also likely had something to do with it in terms of cost-savings, although this is more peripherally discussed in this piece.

In short, “an arbitrator [just] ruled in favor of the fired teacher, a decision that could entitle him to hundreds of thousands of dollars in back pay and the opportunity to be a District teacher again” although, perhaps not surprisingly, he might not take them up on that  offer. As well, apparently this teacher “isn’t the only one fighting to get his job back. Other educators who were fired years ago and allege unjust dismissals [as per the IMPACT system] are waiting for their cases to be settled.” The school system can appeal this ruling.

New Mexico Teacher Evaluation Lawsuit Updates

In December of 2015 in New Mexico, via a preliminary injunction set forth by state District Judge David K. Thomson, all consequences attached to teacher-level value-added model (VAM) scores (e.g., flagging the files of teachers with low VAM scores) were suspended throughout the state until the state (and/or others external to the state) could prove to the state court that the system was reliable, valid, fair, uniform, and the like. The trial during which this evidence is to be presented by the state is currently set for this October. See more information about this ruling here.

As the expert witness for the plaintiffs in this case, I was deposed a few weeks ago here in Phoenix, given my analyses of the state’s data (supported by one of my PhD students – Tray Geiger). In short, we found and I testified during the deposition that:

  • In terms of uniformity and fairness, there seem to be 70% or so of New Mexico teachers who are ineligible to be assessed using VAMs, and this proportion held constant across the years of data analyzed. This is even more important to note knowing that when VAM-based data are to be used to make consequential decisions about teachers, issues with fairness and uniformity become even more important given accountability-eligible teachers are also those who are relatively more likely to realize the negative or reap the positive consequences attached to VAM-based estimates.
  • In terms of reliability (or the consistency of teachers’ VAM-based scores over time), approximately 40% of teachers differed by one quintile (quintiles are derived when a sample or population is divided into fifths) and approximately 28% of teachers differed, from year-to-year, by two or more quintiles in terms of their VAM-derived effectiveness ratings. These results make sense when New Mexico’s results are situated within the current literature, whereas teachers classified as “effective” one year can have a 25%-59% chance of being classified as “ineffective” the next, or vice versa, with other permutations also possible.
  • In terms of validity (i.e., concurrent related evidence of validity), and importantly as also situated within the current literature, the correlations between New Mexico teachers’ VAM-based and observational scores ranged from r = 0.153 to r = 0.210. Not only are these correlations very weak[1], they are also very weak as appropriately situated within the literature, via which it is evidenced that correlations between multiple VAMs and observational scores typically range from 0.30 ≤ r ≤ 0.50.
  • In terms of bias, New Mexico’s Caucasian teachers had significantly higher observation scores than non-Caucasian teachers implying, also as per the current research, that Caucasian teachers may be (falsely) perceived as being better teachers than non-Caucasians teachers given bias within these instruments and/or bias of the scorers observing and scoring teachers using these instruments in practice. See prior posts about observational-based bias here, here and here.
  • Also of note in terms of bias was that: (1) teachers with fewer years of experience yielded VAM scores that were significantly lower than teachers with more years of experience, with similar patterns noted across teachers’ observation scores, which could all mean, as also in line with common sense as well as the research, that teachers with more experience are typically better teachers; (2) teachers who taught English language learners (ELLs) or special education students had lower VAM scores across the board than those who did not teach such students; (3) teachers who taught gifted students had significantly higher VAM scores than non-gifted teachers which runs counter to the current research evidencing that teachers’ gifted students oft-thwart or prevent them from demonstrating growth given ceiling effects; (4) teachers in schools with lower relative proportions of ELLs, special education students, students eligible for free-or-reduced lunches, and students from racial minority backgrounds, as well as higher relative proportions of gifted students, consistently had significantly higher VAM scores. These results suggest that teachers in these schools are as a group better, and/or that VAM-based estimates might be biased against teachers not teaching in these schools, preventing them from demonstrating comparable growth.

To read more about the data and methods used, as well as other findings, please see my affidavit submitted to the court attached here: Affidavit Feb2018.

Although, also in terms of a recent update, I should also note that a few weeks ago, as per an article in the AlbuquerqueJournal, New Mexico’s teacher evaluation systems is now likely to be overhauled, or simply “expired” as early as 2019. In short, “all three Democrats running for governor and the lone Republican candidate…have expressed misgivings about using students’ standardized test scores to evaluate the effectiveness of [New Mexico’s] teachers, a key component of the current system [at issue in this lawsuit and] imposed by the administration of outgoing Gov. Susana Martinez.” All four candidates described the current system “as fundamentally flawed and said they would move quickly to overhaul it.”

While I/we will proceed our efforts pertaining to this lawsuit until further notice, this is also important to note at this time in that it seems that New Mexico’s policymakers of new are going to be much wiser than those of late, at least in these regards.

[1] Interpreting r: 0.8 ≤ r ≤ 1.0 = a very strong correlation; 0.6 ≤ r ≤ 0.8 = a strong correlation; 0.4 ≤ r ≤ 0.6 = a moderate correlation; 0.2 ≤ r ≤ 0.4 = a weak correlation; and 0.0 ≤ r ≤ 0.2 = a very weak correlation, if any at all.

 

New Mexico’s Motion for Summary Judgment, Following Houston’s Precedent-Setting Ruling

Recall that in New Mexico, just over two years ago, all consequences attached to teacher-level value-added model (VAM) scores (e.g., flagging the files of teachers with low VAM scores) were suspended throughout the state until the state (and/or others external to the state) could prove to the state court that the system was reliable, valid, fair, uniform, and the like. The trial during which this evidence was to be presented by the state was repeatedly postponed since, yet with teacher-level consequences prohibited all the while. See more information about this ruling here.

Recall as well that in Houston, just this past May, that a district judge ruled that Houston Independent School District (HISD) teachers’ who had VAM scores (as based on the Education Value-Added Assessment System (EVAAS)) had legitimate claims regarding how EVAAS use in HISD was a violation of their Fourteenth Amendment due process protections (i.e., no state or in this case organization shall deprive any person of life, liberty, or property, without due process). More specifically, in what turned out to be a huge and unprecedented victory, the judge ruled that because HISD teachers “ha[d] no meaningful way to ensure correct calculation of their EVAAS scores,” they were, as a result, “unfairly subject to mistaken deprivation of constitutionally protected property interests in their jobs.” This ruling ultimately led the district to end the use of the EVAAS for teacher termination throughout Houston. See more information about this ruling here.

Just this past week, New Mexico charged that the Houston ruling regarding Houston teachers’ Fourteenth Amendment due process protections also applies to teachers throughout the state of New Mexico.

As per an article titled “Motion For Summary Judgment Filed In New Mexico Teacher Evaluation Lawsuit,” the American Federation of Teachers and Albuquerque Teachers Federation filed a “motion for summary judgment in the litigation in our continuing effort to make teacher evaluations beneficial and accurate in New Mexico.” They, too, are “seeking a determination that the [state’s] failure to provide teachers with adequate information about the calculation of their VAM scores violated their procedural due process rights.”

“The evidence demonstrates that neither school administrators nor educators have been provided with sufficient information to replicate the [New Mexico] VAM score calculations used as a basis for teacher evaluations. The VAM algorithm is complex, and the general overview provided in the NMTeach Technical Guide is not enough to pass constitutional muster. During previous hearings, educators testified they do not receive an explanation at the time they receive their annual evaluation, and teachers have been subjected to performance growth plans based on low VAM scores, without being given any guidance or explanation as to how to raise that score on future evaluations. Thus, not only do educators not understand the algorithm used to derive the VAM score that is now part of the basis for their overall evaluation rating, but school administrators within the districts do not have sufficient information on how the score is derived in order to replicate it or to provide professional development, whether as part of a disciplinary scenario or otherwise, to assist teachers in raising their VAM score.”

For more information about this update, please click here.

Breaking News: The End of Value-Added Measures for Teacher Termination in Houston

Recall from multiple prior posts (see, for example, here, here, here, here, and here) that a set of teachers in the Houston Independent School District (HISD), with the support of the Houston Federation of Teachers (HFT) and the American Federation of Teachers (AFT), took their district to federal court to fight against the (mis)use of their value-added scores derived via the Education Value-Added Assessment System (EVAAS) — the “original” value-added model (VAM) developed in Tennessee by William L. Sanders who just recently passed away (see here). Teachers’ EVAAS scores, in short, were being used to evaluate teachers in Houston in more consequential ways than any other district or state in the nation (e.g., the termination of 221 teachers in one year as based, primarily, on their EVAAS scores).

The case — Houston Federation of Teachers et al. v. Houston ISD — was filed in 2014 and just one day ago (October 10, 2017) came the case’s final federal suit settlement. Click here to read the “Settlement and Full and Final Release Agreement.” But in short, this means the “End of Value-Added Measures for Teacher Termination in Houston” (see also here).

More specifically, recall that the judge notably ruled prior (in May of 2017) that the plaintiffs did have sufficient evidence to proceed to trial on their claims that the use of EVAAS in Houston to terminate their contracts was a violation of their Fourteenth Amendment due process protections (i.e., no state or in this case district shall deprive any person of life, liberty, or property, without due process). That is, the judge ruled that “any effort by teachers to replicate their own scores, with the limited information available to them, [would] necessarily fail” (see here p. 13). This was confirmed by the one of the plaintiffs’ expert witness who was also “unable to replicate the scores despite being given far greater access to the underlying computer codes than [was] available to an individual teacher” (see here p. 13).

Hence, and “[a]ccording to the unrebutted testimony of [the] plaintiffs’ expert [witness], without access to SAS’s proprietary information – the value-added equations, computer source codes, decision rules, and assumptions – EVAAS scores will remain a mysterious ‘black box,’ impervious to challenge” (see here p. 17). Consequently, the judge concluded that HISD teachers “have no meaningful way to ensure correct calculation of their EVAAS scores, and as a result are unfairly subject to mistaken deprivation of constitutionally protected property interests in their jobs” (see here p. 18).

Thereafter, and as per this settlement, HISD agreed to refrain from using VAMs, including the EVAAS, to terminate teachers’ contracts as long as the VAM score is “unverifiable.” More specifically, “HISD agree[d] it will not in the future use value-added scores, including but not limited to EVAAS scores, as a basis to terminate the employment of a term or probationary contract teacher during the term of that teacher’s contract, or to terminate a continuing contract teacher at any time, so long as the value-added score assigned to the teacher remains unverifiable. (see here p. 2; see also here). HISD also agreed to create an “instructional consultation subcommittee” to more inclusively and democratically inform HISD’s teacher appraisal systems and processes, and HISD agreed to pay the Texas AFT $237,000 in its attorney and other legal fees and expenses (State of Texas, 2017, p. 2; see also AFT, 2017).

This is yet another big win for teachers in Houston, and potentially elsewhere, as this ruling is an unprecedented development in VAM litigation. Teachers and others using the EVAAS or another VAM for that matter (e.g., that is also “unverifiable”) do take note, at minimum.

“Virginia SGP” Overruled

You might recall from a post I released approximately 1.5 years ago a story about how a person who self-identifies as “Virginia SGP,” who is also now known as Brian Davison — a parent of two public school students in the affluent Loudoun, Virginia area (hereafter referred to as Virginia SGP), sued the state of Virginia in an attempt to force the release of teachers’ student growth percentile (SGP) data for all teachers across the state.

More specifically, Virginia SGP “pressed for the data’s release because he thinks parents have a right to know how their children’s teachers are performing, information about public employees that exists but has so far been hidden. He also want[ed] to expose what he sa[id was] Virginia’s broken promise to begin [to use] the data to evaluate how effective the state’s teachers are.” The “teacher data should be out there,” especially if taxpayers are paying for it.

In January of 2016, a Richmond, Virginia judge ruled in Virginia SGP’s favor. The following April, a Richmond Circuit Court judge ruled that the Virginia Department of Education was to also release Loudoun County Public Schools’ SGP scores by school and by teacher, including teachers’ identifying information. Accordingly, the judge noted that the department of education and the Loudoun school system failed to “meet the burden of proof to establish an exemption’ under Virginia’s Freedom of Information Act [FOIA]” preventing the release of teachers’ identifiable information (i.e., beyond teachers’ SGP data). The court also ordered VDOE to pay Davison $35,000 to cover his attorney fees and other costs.

As per an article published last week, the Virginia Supreme Court overruled this former ruling, noting that the department of education did not have to provide teachers’ identifiable information along with teachers’ SGP data, after all.

See more details in the actual article here, but ultimately the Virginia Supreme Court concluded that the Richmond Circuit Court “erred in ordering the production of these documents containing teachers’ identifiable information.” The court added that “it was [an] error for the circuit court to order that the School Board share in [Virginia SGP’s] attorney’s fees and costs,” pushing that decision (i.e., the decision regarding how much to pay, if anything at all, in legal fees) back down to the circuit court.

Virginia SGP plans to ask for a rehearing of this ruling. See also his comments on this ruling here.

The New York Times on “The Little Known Statistician” Who Passed

As many of you may recall, I wrote a post last March about the passing of William L. Sanders at age 74. Sanders developed the Education Value-Added Assessment System (EVAAS) — the value-added model (VAM) on which I have conducted most of my research (see, for example, here and here) and the VAM at the core of most of the teacher evaluation lawsuits in which I have been (or still am) engaged (see here, here, and here).

Over the weekend, though, The New York Times released a similar piece about Sanders’s passing, titled “The Little-Known Statistician Who Taught Us to Measure Teachers.” Because I had multiple colleagues and blog followers email me (or email me about) this article, I thought I would share it out with all of you, with some additional comments, of course, but also given the comments I already made in my prior post here.

First, I will start by saying that the title of this article is misleading in that what this “little-known” statistician contributed to the field of education was hardly “little” in terms of its size and impact. Rather, Sanders and his associates at SAS Institute Inc. greatly influenced our nation in terms of the last decade of our nation’s educational policies, as largely bent on high-stakes teacher accountability for educational reform. This occurred in large part due to Sanders’s (and others’) lobbying efforts when the federal government ultimately choose to incentivize and de facto require that all states hold their teachers accountable for their value-added, or lack thereof, while attaching high-stakes consequences (e.g., teacher termination) to teachers’ value-added estimates. This, of course, was to ensure educational reform. This occurred at the federal level, as we all likely know, primarily via Race to the Top and the No Child Left Behind Waivers essentially forced upon states when states had to adopt VAMs (or growth models) to also reform their teachers, and subsequently their schools, in order to continue to receive the federal funds upon which all states still rely.

It should be noted, though, that we as a nation have been relying upon similar high-stakes educational policies since the late 1970s (i.e., for now over 35 years); however, we have literally no research evidence that these high-stakes accountability policies have yielded any of their intended effects, as still perpetually conceptualized (see, for example, Nevada’s recent legislative ruling here) and as still advanced via large- and small-scale educational policies (e.g., we are still A Nation At Risk in terms of our global competitiveness). Yet, we continue to rely on the logic in support of such “carrot and stick” educational policies, even with this last decade’s teacher- versus student-level “spin.” We as a nation could really not be more ahistorical in terms of our educational policies in this regard.

Regardless, Sanders contributed to all of this at the federal level (that also trickled down to the state level) while also actively selling his VAM to state governments as well as local school districts (i.e., including the Houston Independent School District in which teacher plaintiffs just won a recent court ruling against the Sanders value-added system here), and Sanders did this using sets of (seriously) false marketing claims (e.g., purchasing and using the EVAAS will help “clear [a] path to achieving the US goal of leading the world in college completion by the year 2020”). To see two empirical articles about the claims made to sell Sanders’s EVAAS system, the research non-existent in support of each of the claims, and the realities of those at the receiving ends of this system (i.e., teachers) as per their experiences with each of the claims, see here and here.

Hence, to assert that what this “little known” statistician contributed to education was trivial or inconsequential is entirely false. Thankfully, with the passage of the Every Student Succeeds Act” (ESSA) the federal government came around, in at least some ways. While not yet acknowledging how holding teachers accountable for their students’ test scores, while ideal, simply does not work (see the “Top Ten” reasons why this does not work here), at least the federal government has given back to the states the authority to devise, hopefully, some more research-informed educational policies in these regards (I know….).

Nonetheless, may he rest in peace (see also here), perhaps also knowing that his forever stance of “[making] no apologies for the fact that his methods were too complex for most of the teachers whose jobs depended on them to understand,” just landed his EVAAS in serious jeopardy in court in Houston (see here) given this stance was just ruled as contributing to the violation of teachers’ Fourteenth Amendment rights (i.e., no state or in this case organization shall deprive any person of life, liberty, or property, without due process [emphasis added]).

Breaking News: Another Big Victory in Court in Texas

Earlier today I released a post regarding “A Big Victory in Court in Houston,” in which I wrote about how, yesterday, US Magistrate Judge Smith ruled — in Houston Federation of Teachers et al. v. Houston Independent School District — that Houston teacher plaintiffs’ have legitimate claims regarding how their Education Value-Added Assessment System (EVAAS) value-added scores, as used (and abused) in HISD, was a violation of their Fourteenth Amendment due process protections (i.e., no state or in this case organization shall deprive any person of life, liberty, or property, without due process). Hence, on this charge, this case is officially going to trial.

Well, also yesterday, “we” won another court case on which I also served as an expert witness (I served as an expert witness on behalf of the plaintiffs alongside Jesse Rothstein in the court case noted above). As per this case — Texas State Teachers Association v. Texas Education Agency, Mike Morath in his Official Capacity as Commissioner of Education for the State of Texas (although there were three similar cases also filed – see all four referenced below) — The Honorable Lora J. Livingston ruled that the Defendants are to make revisions to 19 Tex. Admin. Code § 150.1001 that most notably include the removal of (A) student learning objectives [SLOs], (B) student portfolios, (C) pre and post test results on district level assessments; or (D) value added data based on student state assessment results. In addition, “The rules do not restrict additional factors a school district may consider…,” and “Under the local appraisal system, there [will be] no required weighting for each measure…,” although districts can chose to weight whatever measures they might choose. “Districts can also adopt an appraisal system that does not provide a single, overall summative rating.” That is, increased local control.

If the Texas Education Agency (TEA) does not adopt the regulations put forth by the court by next October, this case will continue. This does not look likely, however, in that as per a news article released today, here, Texas “Commissioner of Education Mike Morath…agreed to revise the [states’] rules in exchange for the four [below] teacher groups’ suspending their legal challenges.” As noted prior, the terms of this settlement call for the removal of the above-mentioned, state-required, four growth measures when evaluating teachers.

This was also highlighted in a news article, released yesterday, here, with this one more generally about how teachers throughout Texas will no longer be evaluated using their students’ test scores, again, as required by the state.

At the crux of this case, as also highlighted in this particular piece, and to which I testified (quite extensively), was that the value-added measures formerly required/suggested by the state did not constitute teachers’ “observable,” job-related behaviors. See also a prior post about this case here.

*****

Cases Contributing to this Ruling:

1. Texas State Teachers Association v. Texas Education Agency, Mike Morath, in his Official Capacity as Commissioner of Education for the State of Texas; in the 345th Judicial District Court, Travis County, Texas

2. Texas Classroom Teachers Association v. Mike Morath, Texas Commissioner of Education; in the 419th Judicial District Court, Travis County, Texas

3. Texas American Federation of Teachers v. Mike Morath, Commissioner of Education, in his official capacity, and Texas Education Agency; in the 201st Judicial District Court, Travis County, Texas

4. Association of Texas Professional Educators v. Mike Morath, the Commissioner of Education and the Texas Education Agency; in the 200th District Court of Travis County, Texas.

Breaking News: A Big Victory in Court in Houston

Recall from multiple prior posts (see here, here, here, and here) that a set of teachers in the Houston Independent School District (HISD), with the support of the Houston Federation of Teachers (HFT) and the American Federation of Teachers (AFT), took their district to federal court to fight against the (mis)use of their value-added scores, derived via the Education Value-Added Assessment System (EVAAS) — the “original” value-added model (VAM) developed in Tennessee by William L. Sanders who just recently passed away (see here). Teachers’ EVAAS scores, in short, were being used to evaluate teachers in Houston in more consequential ways than anywhere else in the nation (e.g., the termination of 221 teachers in just one year as based, primarily, on their EVAAS scores).

The case — Houston Federation of Teachers et al. v. Houston ISD — was filed in 2014 and just yesterday, United States Magistrate Judge Stephen Wm. Smith denied in the United States District Court, Southern District of Texas, the district’s request for summary judgment given the plaintiffs’ due process claims. Put differently, Judge Smith ruled that the plaintiffs’ did have legitimate claims regarding how EVAAS use in HISD was a violation of their Fourteenth Amendment due process protections (i.e., no state or in this case organization shall deprive any person of life, liberty, or property, without due process). Hence, on this charge, this case is officially going to trial.

This is a huge victory, and one unprecedented that will likely set precedent, trial pending, for others, and more specifically other teachers.

Of primary issue will be the following (as taken from Judge Smith’s Summary Judgment released yesterday): “Plaintiffs [will continue to] challenge the use of EVAAS under various aspects of the Fourteenth Amendment, including: (1) procedural due process, due to lack of sufficient information to meaningfully challenge terminations based on low EVAAS scores,” and given “due process is designed to foster government decision-making that is both fair and accurate.”

Related, and of most importance, as also taken directly from Judge Smith’s Summary, he wrote:

  • HISD’s value-added appraisal system poses a realistic threat to deprive plaintiffs of constitutionally protected property interests in employment.
  • HISD does not itself calculate the EVAAS score for any of its teachers. Instead, that task is delegated to its third party vendor, SAS. The scores are generated by complex algorithms, employing “sophisticated software and many layers of calculations.” SAS treats these algorithms and software as trade secrets, refusing to divulge them to either HISD or the teachers themselves. HISD has admitted that it does not itself verify or audit the EVAAS scores received from SAS, nor does it engage any contractor to do so. HISD further concedes that any effort by teachers to replicate their own scores, with the limited information available to them, will necessarily fail. This has been confirmed by plaintiffs’ expert, who was unable to replicate the scores despite being given far greater access to the underlying computer codes than is available to an individual teacher [emphasis added, as also related to a prior post about how SAS claimed that plaintiffs violated SAS’s protective order (protecting its trade secrets), that the court overruled, see here].
  • The EVAAS score might be erroneously calculated for any number of reasons, ranging from data-entry mistakes to glitches in the computer code itself. Algorithms are human creations, and subject to error like any other human endeavor. HISD has acknowledged that mistakes can occur in calculating a teacher’s EVAAS score; moreover, even when a mistake is found in a particular teacher’s score, it will not be promptly corrected. As HISD candidly explained in response to a frequently asked question, “Why can’t my value-added analysis be recalculated?”:
    • Once completed, any re-analysis can only occur at the system level. What this means is that if we change information for one teacher, we would have to re- run the analysis for the entire district, which has two effects: one, this would be very costly for the district, as the analysis itself would have to be paid for again; and two, this re-analysis has the potential to change all other teachers’ reports.
  • The remarkable thing about this passage is not simply that cost considerations trump accuracy in teacher evaluations, troubling as that might be. Of greater concern is the house-of-cards fragility of the EVAAS system, where the wrong score of a single teacher could alter the scores of every other teacher in the district. This interconnectivity means that the accuracy of one score hinges upon the accuracy of all. Thus, without access to data supporting all teacher scores, any teacher facing discharge for a low value-added score will necessarily be unable to verify that her own score is error-free.
  • HISD’s own discovery responses and witnesses concede that an HISD teacher is unable to verify or replicate his EVAAS score based on the limited information provided by HISD.
  • According to the unrebutted testimony of plaintiffs’ expert, without access to SAS’s proprietary information – the value-added equations, computer source codes, decision rules, and assumptions – EVAAS scores will remain a mysterious “black box,” impervious to challenge.
  • While conceding that a teacher’s EVAAS score cannot be independently verified, HISD argues that the Constitution does not require the ability to replicate EVAAS scores “down to the last decimal point.” But EVAAS scores are calculated to the second decimal place, so an error as small as one hundredth of a point could spell the difference between a positive or negative EVAAS effectiveness rating, with serious consequences for the affected teacher.

Hence, “When a public agency adopts a policy of making high stakes employment decisions based on secret algorithms incompatible with minimum due process, the proper remedy is to overturn the policy.”

Moreover, he wrote, that all of this is part of the violation of teaches’ Fourteenth Amendment rights. Hence, he also wrote, “On this summary judgment record, HISD teachers have no meaningful way to ensure correct calculation of their EVAAS scores, and as a result are unfairly subject to mistaken deprivation of constitutionally protected property interests in their jobs.”

Otherwise, Judge Smith granted summary judgment to the district on the other claims forwarded by the plaintiffs, including plaintiffs’ equal protection claims. All of us involved in the case — recall that Jesse Rothstein and I served as the expert witnesses on behalf of the plaintiffs, and Thomas Kane of the Measures of Effective Teaching (MET) Project and John Friedman of the infamous Chetty et al. studies (see here and here) served as the expert witnesses on behalf of the defendants — knew that all of the plaintiffs’ claims would be tough to win given all of the constitutional legal standards would be difficult for plaintiffs to satisfy (e.g., that evaluating teachers using their value-added scores was not “unreasonable” was difficult to prove, as it was in the Tennessee case we also fought and was then dismissed on similar grounds (see here)).

Nonetheless, that “we” survived on the due process claim is fantastic, especially as this is the first case like this of which we are aware across the country.

Here is the press release, released last night by the AFT:

May 4, 2017 – AFT, Houston Federation of Teachers Hail Court Ruling on Flawed Evaluation System

Statements by American Federation of Teachers President Randi Weingarten and Houston Federation of Teachers President Zeph Capo on U.S. District Court decision on Houston’s Evaluation Value-Added Assessment System (EVAAS), known elsewhere as VAM or value-added measures:

AFT President Randi Weingarten: “Houston developed an incomprehensible, unfair and secret algorithm to evaluate teachers that had no rational meaning. This is the algebraic formula: = + (Σ∗≤Σ∗∗ × ∗∗∗∗=1)+

“U.S. Magistrate Judge Stephen Smith saw that it was seriously flawed and posed a threat to teachers’ employment rights; he rejected it. This is a huge victory for Houston teachers, their students and educators’ deeply held contention that VAM is a sham.

“The judge said teachers had no way to ensure that EVAAS was correctly calculating their performance score, nor was there a way to promptly correct a mistake. Judge Smith added that the proper remedy is to overturn the policy; we wholeheartedly agree. Teaching must be about helping kids develop the skills and knowledge they need to be prepared for college, career and life—not be about focusing on test scores for punitive purposes.”

HFT President Zeph Capo: “With this decision, Houston should wipe clean the record of every teacher who was negatively evaluated. From here on, teacher evaluation systems should be developed with educators to ensure that they are fair, transparent and help inform instruction, not be used as a punitive tool.”