Alleged Violation of Protective Order in Houston Lawsuit, Overruled

Many of you will recall a post I made public in January including “Houston Lawsuit Update[s], with Summar[ies] of Expert Witnesses’ Findings about the EVAAS” (Education Value-Added Assessment System sponsored by SAS Institute Inc.). What you might not have recognized since, however, was that I pulled the post down a few weeks after I posted it. Here’s the back story.

In January 2016, the Houston Federation of Teachers (HFT) published an “EVAAS Litigation Update,” which summarized a portion of Dr. Jesse Rothstein’s expert report in which he conclude[d], among other things, that teachers do not have the ability to meaningfully verify their EVAAS scores. He wrote that “[a]t most, a teacher could request information about which students were assigned to her, and could read literature — mostly released by SAS, and not the product of an independent investigation — regarding the properties of EVAAS estimates.” On January 10, 2016, I posted the post: “Houston Lawsuit Update, with Summary of Expert Witnesses’ Findings about the EVAAS” summarizing what I considered to be the twelve key highlights of HFT’s “EVAAS Litigation Update,” in which I highlighted Rothstein’s above conclusions.

Lawyers representing SAS Institute Inc. charged that this post, along with the more detailed “EVAAS Litigation Update” I summarized within the post (authored by the Houston Federation of Teachers (HFT) to keep their members in Houston up-to-date on the progress of this lawsuit) violated a protective order that was put in place to protect SAS’s EVAAS computer source code. Even though there is/was nothing in the “EVAAS Litigation Update” or the blog post that disclosed the source code, SAS objected to both as disclosing conclusions that, SAS said, could not have been reached in the absence of a review of the source code. They threatened HFT, its lawyers, and its experts (myself and Dr. Rothstein) with monetary sanctions. HFT went to court in order to get the court’s interpretation of the protective order and to see if a Judge agreed with SAS’s position. In the meantime, I removed the prior post (which is now back up here).

The great news is that the Judge found in HFT’s favor. He found that neither the “EVAAS Litigation Update” nor the related blog post violated the protective order. Further, he found that “we” have the right to share other updates on the Houston lawsuit, which is still pending, as long as the updates do not violate the protective order still in place. This includes discussion of the conclusions or findings of experts, provided that the source code is not disclosed, either explicitly or by necessary implication.

In more specific terms, as per his ruling in his Court Order, the judge ruled that SAS Institute Inc.’s lawyers “interpret[ed] the protective order too broadly in this instance. Rothstein’s opinion regarding the inability to verify or replicate a teacher’s EVAAS score essentially mimics the allegations of HFT’s complaint. The Litigation Update made clear that Rothstein confirmed this opinion after review of the source code; but it [was] not an opinion ‘that could not have been made in the absence of [his] review’ of the source code. Rothstein [also] testified by affidavit that his opinion is not based on anything he saw in the source code, but on the extremely restrictive access permitted by SAS.” He added that “the overly broad interpretation urged by SAS would inhibit legitimate discussion about the lawsuit, among both the union’s membership and the public at large.” That, also in his words, would be an “unfortunate result” that should, in the future, be avoided.

Here, again, are the 12 key highlights of the EVAAS Litigation Update:
  • Large-scale standardized tests have never been validated for their current uses. In other words, as per my affidavit, “VAM-based information is based upon large-scale achievement tests that have been developed to assess levels of student achievement, but not levels of growth in student achievement over time, and not levels of growth in student achievement over time that can be attributed back to students’ teachers, to capture the teachers’ [purportedly] causal effects on growth in student achievement over time.”
  • The EVAAS produces different results from another VAM. When, for this case, Rothstein constructed and ran an alternative, albeit sophisticated VAM using data from HISD both times, he found that results “yielded quite different rankings and scores.” This should not happen if these models are indeed yielding indicators of truth, or true levels of teacher effectiveness from which valid interpretations and assertions can be made.
  • EVAAS scores are highly volatile from one year to the next. Rothstein, when running the actual data, found that while “[a]ll VAMs are volatile…EVAAS growth indexes and effectiveness categorizations are particularly volatile due to the EVAAS model’s failure to adequately account for unaccounted-for variation in classroom achievement.” In addition, volatility is “particularly high in grades 3 and 4, where students have relatively few[er] prior [test] scores available at the time at which the EVAAS scores are first computed.”
  • EVAAS overstates the precision of teachers’ estimated impacts on growth. As per Rothstein, “This leads EVAAS to too often indicate that teachers are statistically distinguishable from the average…when a correct calculation would indicate that these teachers are not statistically distinguishable from the average.”
  • Teachers of English Language Learners (ELLs) and “highly mobile” students are substantially less likely to demonstrate added value, as per the EVAAS, and likely most/all other VAMs. This, what we term as “bias,” makes it “impossible to know whether this is because ELL teachers [and teachers of highly mobile students] are, in fact, less effective than non-ELL teachers [and teachers of less mobile students] in HISD, or whether it is because the EVAAS VAM is biased against ELL [and these other] teachers.”
  • The number of students each teacher teaches (i.e., class size) also biases teachers’ value-added scores. As per Rothstein, “teachers with few linked students—either because they teach small classes or because many of the students in their classes cannot be used for EVAAS calculations—are overwhelmingly [emphasis added] likely to be assigned to the middle effectiveness category under EVAAS (labeled “no detectable difference [from average], and average effectiveness”) than are teachers with more linked students.”
  • Ceiling effects are certainly an issue. Rothstein found that in some grades and subjects, “teachers whose students have unusually high prior year scores are very unlikely to earn high EVAAS scores, suggesting that ‘ceiling effects‘ in the tests are certainly relevant factors.” While EVAAS and HISD have previously acknowledged such problems with ceiling effects, they apparently believe these effects are being mediated with the new and improved tests recently adopted throughout the state of Texas. Rothstein, however, found that these effects persist even given the new and improved.
  • There are major validity issues with “artificial conflation.” This is a term I recently coined to represent what is happening in Houston, and elsewhere (e.g., Tennessee), when district leaders (e.g., superintendents) mandate or force principals and other teacher effectiveness appraisers or evaluators, for example, to align their observational ratings of teachers’ effectiveness with value-added scores, with the latter being the “objective measure” around which all else should revolve, or align; hence, the conflation of the one to match the other, even if entirely invalid. As per my affidavit, “[t]o purposefully and systematically endorse the engineering and distortion of the perceptible ‘subjective’ indicator, using the perceptibly ‘objective’ indicator as a keystone of truth and consequence, is more than arbitrary, capricious, and remiss…not to mention in violation of the educational measurement field’s Standards for Educational and Psychological Testing” (American Educational Research Association (AERA), American Psychological Association (APA), National Council on Measurement in Education (NCME), 2014).
  • Teaching-to-the-test is of perpetual concern. Both Rothstein and I, independently, noted concerns about how “VAM ratings reward teachers who teach to the end-of-year test [more than] equally effective teachers who focus their efforts on other forms of learning that may be more important.”
  • HISD is not adequately monitoring the EVAAS system. According to HISD, EVAAS modelers keep the details of their model secret, even from them and even though they are paying an estimated $500K per year for district teachers’ EVAAS estimates. “During litigation, HISD has admitted that it has not performed or paid any contractor to perform any type of verification, analysis, or audit of the EVAAS scores. This violates the technical standards for use of VAM that AERA specifies, which provide that if a school district like HISD is going to use VAM, it is responsible for ‘conducting the ongoing evaluation of both intended and unintended consequences’ and that ‘monitoring should be of sufficient scope and extent to provide evidence to document the technical quality of the VAM application and the validity of its use’ (AERA Statement, 2015).
  • EVAAS lacks transparency. AERA emphasizes the importance of transparency with respect to VAM uses. For example, as per the AERA Council who wrote the aforementioned AERA Statement, “when performance levels are established for the purpose of evaluative decisions, the methods used, as well as the classification accuracy, should be documented and reported” (AERA Statement, 2015). However, and in contrast to meeting AERA’s requirements for transparency, in this district and elsewhere, as per my affidavit, the “EVAAS is still more popularly recognized as the ‘black box’ value-added system.”
  • Related, teachers lack opportunities to verify their own scores. This part is really interesting. “As part of this litigation, and under a very strict protective order that was negotiated over many months with SAS [i.e., SAS Institute Inc. which markets and delivers its EVAAS system], Dr. Rothstein was allowed to view SAS’ computer program code on a laptop computer in the SAS lawyer’s office in San Francisco, something that certainly no HISD teacher has ever been allowed to do. Even with the access provided to Dr. Rothstein, and even with his expertise and knowledge of value-added modeling, [however] he was still not able to reproduce the EVAAS calculations so that they could be verified.”Dr. Rothstein added, “[t]he complexity and interdependency of EVAAS also presents a barrier to understanding how a teacher’s data translated into her EVAAS score. Each teacher’s EVAAS calculation depends not only on her students, but also on all other students with- in HISD (and, in some grades and years, on all other students in the state), and is computed using a complex series of programs that are the proprietary business secrets of SAS Incorporated. As part of my efforts to assess the validity of EVAAS as a measure of teacher effectiveness, I attempted to reproduce EVAAS calculations. I was unable to reproduce EVAAS, however, as the information provided by HISD about the EVAAS model was far from sufficient.”

New Mexico’s Teacher Evaluation Trial Postponed Until October, w/Preliminary Injunction Still in Place

Last December in New Mexico, a Judge granted a preliminary injunction preventing consequences from being attached to the state’s teacher evaluation data as based on the state’s value-added model (VAM). More specifically, Judge David K. Thomson ruled that the state can proceed with “developing” and “improving” its teacher evaluation system, but the state is not to make any consequential decisions about New Mexico’s teachers using the data the state collects until the state (and/or others external to the state) can evidence to the court during another trial (which was set for April of 2016) that the system is reliable, valid, fair, uniform, and the like. See more details regarding Judge Thomson’s ruling in a previous post here: “Consequences Attached to VAMs Suspended Throughout New Mexico.” See more details about this specific lawsuit, sponsored by the American Federation of Teachers (AFT) New Mexico and the Albuquerque Teachers Federation (ATF), in a previous post here: “Lawsuit in New Mexico Challenging [the] State’s Teacher Evaluation System.” This is one of the cases on which I am continuing to serve as an expert witness.

Yesterday, however, and given another state-level lawsuit that is also ongoing regarding the state’s teacher evaluation system, although this one is sponsored by the National Education Association (NEA), Judge Thomson (apparently along with Judge Francis Mathew) pushed both the AFT-NM/ATF and NEA trials back to October of 2016, yielding a six month delay for the AFT-NM/ATF hearing.

According to an article published this morning in the Santa Fe New Mexican, “To date, the [New Mexico] Public Education Department [PED] has been unsuccessful in its efforts to stop either suit or combine them;” hence, yesterday in court the state requested that the court postpone both hearings so that the state could introduce its new teacher evaluation system, on March 15 of 2016, along with its specifics and rules, as also based on the state’s new Partnership for the Assessment of Readiness for College and Careers (PARCC) test data. Recall that the state’s Secretary of Education – Hanna “Skandera is new chair of PARCC test board.” It is also anticipated, however, that the state’s new system is to still “rely heavily” (i.e., 50% weight) on VAMs. See also a related post about “New Mexico Chang[ing] its Teacher Evaluation System, But Not Really.”

This window of time is also to allow for the public forums needed to review the state’s new system, but also to allow time for “the acrimony to be resolved without trials.” The preliminary injunction granted by Judge Thomson in December, though, still remains in place. See also a related article, also published this morning, in the Albuquerque Journal.

Stephanie Ly, president of the AFT-NM, said she is not happy with the trial being postponed. She called this a “stalling tactic” to give the [state] education department more time to compile student achievement data that the plaintiffs have been requesting. “We had no option but to agree because they are withholding data,” she said.

Ly and ATF President Ellen Bernstein also responded yesterday via a joint statement, pasted in full below:

March 7, 2016

Contact: John Dyrcz — 505-554-8679

“The Public Education Department and Secretary Skandera have once again willfully delayed the AFT NM/ATF lawsuit against the current value added model [VAM] evaluation system due to their purposeful refusal to reveal the data being used to evaluate our educators in New Mexico.

“In addition to this stall tactic, and during a status hearing this morning in the First District Court, lawyers for the PED revealed that new rules and regulations were to be unveiled on March 15 by the PED, and would ‘rely heavily’ on VAM as a method of evaluation for educators.

“New Mexico educators will not cease in our fight against the abusive policies of this administration. Allowing PED or districts to terminate employees based on VAM and student test scores is completely unacceptable, it is unacceptable to allow PED or districts to refuse licensure advancement based upon VAM scores, and it is unacceptable for PED or districts to place New Mexico educators on growth plans based on faulty data.

“High-performing education systems have policies 
in place which respect and support their educators and use evaluations not as punitive measures but as opportunities for improvement. Educators, unions, and administrators should oversee the evaluation process to ensure it is thorough and of high quality, as well as fair and reliable. Educators, unions, and administrators should be involved in developing, implementing and monitoring the system to ensure it reflects good teaching well, that it operates effectively, that it is tied to useful learning opportunities for teachers, and that it produces valid results.

“It is well known the PED is in a current state of crisis with several high-level staff members abandoning the Department, an on-going whistle-blower lawsuit…the failure to produce meaningful changes to education in New Mexico during her six years as Secretary, and Skandera’s constant changes to the rules is a desperate attempt to right a sinking ship,” said Ly and Bernstein.

Vergara v. California Appeal Underway: The Case that Will Yield No Winners

In June of 2014, defendants in “Vergara v. California” in Los Angeles, California lost the case. Plaintiffs included nine public school students (backed by some serious corporate reformer funds as per Students Matter) who challenged five California state statutes that supported the state’s “ironclad [teacher] tenure system.” The prosecution’s argument was that students’ rights to a good education were being violated by teachers’ job protections…protections that were making it too difficult to fire “grossly ineffective” teachers. The prosecution’s suggested replacement to the “old” way of doing this, of course, was to use value-added scores to make “better” decisions about which teachers to fire and whom to keep around, as based on teachers’ causal impacts on students’ “data.”

This week, this case is being appealed, back in Los Angeles (see a recent Education Week article on the appeal here; see also the Students Matter website for daily appeal updates here). This, accordingly, is a very important case to watch, especially as many agree that this case will eventually end up in no lesser than the state’s Supreme Court.

On this note, though, I came across a great article, also in Education Week, this morning, capturing as per the article’s title, the “Five Reasons Vergara Is Still Unwinnable.” I already tweeted this one out, but for those of you not following us on Twitter, I didn’t want you to miss this one.

The author — Charles Taylor Kerchner, Research Professor at Claremont Graduate University — puts the key pieces of the case in context as well as under a fair and appropriate light, more specifically explaining why “this is a case that the plaintiffs can’t win and the defendants will lose regardless of the outcome.” This, in other words and as per his opinion, is a case that will ultimately yield no winners.

Do read Kerchner’s full Education Week piece here, and share out as you see fit. I’ve also copied/pasted the text below (e.g., for those of you who follow via email).

*****

As the trial court arguments concluded in the spring of 2014, one of the first ‘On California’ posts argued that, “from our perspective this is a case that the plaintiffs can’t win and the defendants will lose regardless of the outcome.”  It still is.

Oral arguments on its appeal began last week, a decision is due in 90 days, and an appeal to the state Supreme Court is considered a near certainty.  Just in case you haven’t been listening to the well-oiled noise machine surrounding the case, EdWeek’s Stephen Sawchuk provides a backgrounder.

Teacher Labor Market Realities

First of all, the plaintiffs can’t win this case because they don’t understand—or willfully ignore—the realities of the teacher labor market.  The underlying problem in the supply and demand for teachers is not that young very good teachers were being fired while old sluggish ones held on to their jobs.  As the recent data on teacher shortages shows, the problem is attracting good people to teaching in the first place and holding onto them.  Most young teachers who teach in challenging schools leave because the work is too hard, not because they were laid off. 

If the plaintiffs really want to increase the quality of the teacher work force, then they should put their money behind efforts to forgive student loans or provide residency programs for novice teachers so that they are not dissuaded by the shock of stepping into a classroom without a solid grounding in the practicalities of teaching.

Value Added Testing

Second, accepting Vergara equates to accepting value added testing as a valid means of assessing teacher performance.  Value added testing began as an attempt to substitute achievement gains for the more socially biased “league table” ranking of schools.  Its early advocates used the technique to demonstrate the influence that a good teacher has on a student’s long-term academic progress and economic life chances.  The economists that argued for the Vergara plaintiffs made much of this reasoning.

Unfortunately, , value added systems are usually terrible when they are put in place. The “value” in value-added are nearly always scores on state standardized tests.  Some of these tests are not very good indicators.  For example, nearly all the state tests used by Vergara plaintiffs have been replaced by measures more aligned with the Common Core of state standards.

Most of the tests are only given in a few grades in a few subjects.  Teachers in other grades and subjects get a composite score based on how well the whole school or an entire grade performed, a score that has little to do with that teacher’s value added.

It’s nonsense to use such gross statistical artifacts as the means to dismiss a teacher, or to reward one.  (A Tennessee case featured a teacher who was denied a bonus because his value added scores didn’t make the cut.  He taught largely advanced students, who were not required to take the state tests, and thus his entire value added score rested on one class.)

Disparate Impact

Third, the case accepts the constitutional principle of “disparate impact.”  This evidentiary argument has its origins in housing discrimination cases where it has been held that a law or practice, such as a bank’s lending policy, need not be discriminatory on its face if its impact was unfairly felt. 

If one accepts that people of color are generally discriminated against, and that poor people of color are absolutely discriminated against, then any rule or regulation within the education system is vulnerable to a disparate impact challenge.  Any form of teacher tenure?  Licenses to teach?  A pension system that encourages older teachers to stay instead of making way for young, enthusiastic ones?  School district boundaries?  Civil service protections?  Because all these exist in an inherently discriminatory environment, they would all be vulnerable if Vergara were upheld.

Rich People and Simplistic Solutions

Fourth, Vergara points rich people toward simplistic solutions.  Venture philanthropy is built around the assumption that people with wealth can use their money to disrupt institutions rather than support existing ones.  Students Matter, which is bankrolling the Vergara lawsuit, is a good example. 

It tinkers with three relatively inconsequential aspects of teacher quality while ignoring the much more fundamental changes in teaching and learning that need to take place in order to create a 21st Century education system.

At least as a thought experiment, people with money ought to be required to specify where they are headed.  If public monopoly, which every high performing school system in the world uses to deliver education, is bad, then specify the alternative.  Hiding behind empty phrases such as “grossly incompetent teachers,” derived from a statistical analysis of state test scores, is no substitute for the hard intellectual work of designing a novel education system.

I’m with the so-called reformers in the belief that the education system put in place more than a century ago needs transformation, but certainly those who want to change it should be required to come up with something better than increasing the amount of time it takes to get tenure by 12 months.

Buying Bullets for Your Opponents

Fifth, Vergara has created yet another instance in which the California Teachers Association and the California Federation of Teachers can inflict damage on themselves.  I hope they prevail in this appeal.  They should.  But in winning, they lose.  They will continue to be a target of opportunity by Republicans and an object of scorn among school reformers. 

They have utterly failed to seize the opportunity for policy leadership presented by the lawsuit and the unprecedented but transitory political support they currently enjoy in Sacramento.

Rather than build on strength, a siege mentality has overtaken union leaders, as in “they’re all around us.”  If that’s the case, you’d think that the unions would quit supplying their opponents with ammunition.

I hope the appellate justices overturn Vergara, but regardless, the case will yield no winners.

 

Tennessee’s Trout/Taylor Value-Added Lawsuit Dismissed

As you may recall, one of 15 important lawsuits pertaining to teacher value-added estimates across the nation (Florida n=2, Louisiana n=1, Nevada n=1, New Mexico n=4, New York n=3, Tennessee n=3, and Texas n=1 – see more information here) was situated in Knox County, Tennessee.

Filed in February of 2015, with legal support provided by the Tennessee Education Association (TEA), Knox County teacher Lisa Trout and Mark Taylor charged that they were denied monetary bonuses after their Tennessee Value-Added Assessment System (TVAAS — the original Education Value-Added Assessment System (EVAAS)) teacher-level value-added scores were miscalculated. This lawsuit was also to contest the reasonableness, rationality, and arbitrariness of the TVAAS system, as per its intended and actual uses in this case, but also in Tennessee writ large. On this case, Jesse Rothstein (University of California – Berkeley) and I were serving as the Plaintiffs’ expert witnesses.

Unfortunately, however, last week (February 17, 2016) the Plaintiffs’ team received a Court order written by U.S. District Judge Harry S. Mattice Jr. dismissing their claims. While the Court had substantial questions about the reliability and validity of the TVAAS, the Court determined that the State satisfied the very low threshold of the “rational basis test,” at legal issue. I should note here, however, that all of the evidence that the lawyers for the Plaintiffs collected via their “extensive discovery,” including the affidavits both Jesse and I submitted on Plaintiffs’ behalves, were unfortunately not considered in Judge Mattice’s motion to dismiss. This, perhaps, makes sense given some of the assertions made by the Court, forthcoming.

Ultimately, the Court found that the TVAAS-based, teacher-level value-added policy at issue was “rationally related to a legitimate government interest.” As per the Court order itself, Judge Mattice wrote that “While the court expresses no opinion as to whether the Tennessee Legislature has enacted sound public policy, it finds that the use of TVAAS as a means to measure teacher efficacy survives minimal constitutional scrutiny. If this policy proves to be unworkable in practice, plaintiffs are not to be vindicated by judicial intervention but rather by democratic process.”

Otherwise, as per an article in the Knoxville News Sentinel, Judge Mattice was “not unsympathetic to the teachers’ claims,” for example, given the TVAAS measures “student growth — not teacher performance — using an algorithm that is not fail proof.” He inversely noted, however, in the Court order that the “TVAAS algorithms have been validated for their accuracy in measuring a teacher’s effect on student growth,” even if minimal. He also wrote that the test scores used in the TVAAS (and other models) “need not be validated for measuring teacher effectiveness merely because they are used as an input in a validated statistical model that measures teacher effectiveness.” This is, unfortunately, untrue. Nonetheless, he continued to write that even though the rational basis test “might be a blunt tool, a rational policymaker could conclude that TVAAS is ‘capable of measuring some marginal impact that teachers can have on their own students…[and t]his is all the Constitution requires.”

In the end, Judge Mattice concluded in the Court order that, overall, “It bears repeating that Plaintiff’s concerns about the statistical imprecision of TVAAS are not unfounded. In addressing Plaintiffs’ constitutional claims, however, the Court’s role is extremely limited. The judiciary is not empowered to second-guess the wisdom of the Tennessee legislature’s approach to solving the problems facing public education, but rather must determine whether the policy at issue is rationally related to a legitimate government interest.”

It is too early to know whether the prosecution team will appeal, although Judge Mattice dismissed the federal constitutional claims within the lawsuit “with prejudice.” As per an article in the Knoxville News Sentinel, this means that “it cannot be resurrected with new facts or legal claims or in another court. His decision can be appealed, though, to the 6th Circuit U.S. Court of Appeals.”

New York Teacher Sheri Lederman’s Lawsuit Update

Recall the New York lawsuit pertaining to Long Island teacher Sheri Lederman? The teacher who by all accounts other than her recent (2013-2014) 1 out of 20 growth score is a terrific 4th grade, 18 year veteran teacher. She, along with her attorney and husband Bruce Lederman, are suing the state of New York to challenge the state’s growth-based teacher evaluation system. See prior posts about Sheri’s case herehere and here. I, along with Linda Darling-Hammond (Stanford), Aaron Pallas (Columbia University Teachers College), Carol Burris (Executive Director of the Network for Public Education Foundation), Brad Lindell (Long Island Research Consultant), Sean Corcoran (New York University) and Jesse Rothstein (University of California – Berkeley) are serving as part of Sheri’s team.

Bruce Lederman just emailed me with an update, and some links re: this update (below), and he gave me permission to share all of this with you.

The judge hearing this case recently asked the lawyers on both sides of Sheri’s case to brief the court by the end of this month (February 29, 2016) on a new issue, positioned and pushed back into the court by the New York State Education Department (NYSED). The issue to be heard pertains to the state’s new “moratorium” or “emergency regulations” related to the state’s high-stakes use of its growth scores, all of which is likely related to the political reaction to the opt-out movement throughout the state of New York, the publicity pertaining to the Lederman lawsuit in and of itself, and the federal government’s adoption of the recent Every Student Succeeds Act (ESSA) given its specific provision that now permits states to decide whether (and if so how) to use teachers’ students’ test scores to hold teachers accountable for their levels of growth (in New York) or value-added.

While the federal government did not abolish such practices via its ESSA, the federal government did hand back to the states all power and authority over this matter. Accordingly, this does not mean growth models/VAMs are going to simply disappear, as states do still have the power and authority to move forward with their prior and/or their new teacher evaluation systems, based in small or large part, on growth models/VAMs. As also quite evident since President Obama’s signing of the ESSA, some states are continuing to move forward in this regard, and regardless of the ESSA, in some cases at even higher speeds than before, in support of what some state policymakers still apparently believe (despite the research) are the accountability measures that will still help them to (symbolically) support educational reform in their states. See, for example, prior posts about the state of Alabama, here, New Mexico, here, and Texas, here, which is still moving forward with its plans introduced pre-ESSA. See prior posts about New York here, here, and here, the state in which also just one year ago Governor Cuomo was promoting increased use of New York’s growth model and publicly proclaiming that it was “baloney” that more teachers were not being found “ineffective,” after which Cuomo pushed through the New York budget process amendments to the law increasing the weight of teachers’ growth scores to an approximate 50% weight in many cases.

Nonetheless, as per this case in New York, state Attorney General Eric Schneiderman, on behalf of the NYSED, offered to settle this lawsuit out of court by giving Sheri some accommodation on her aforementioned 2013-2014 score of 1 out of 20, if Sheri and Bruce dropped the challenge to the state’s VAM-based teacher evaluation system. Sheri and Bruce declined, for a number or reasons, including that under the state’s recent “moratorium,” the state’s growth model is still set to be used throughout the state of New York for the next four years, with teachers’ annual performance reviews based in part on growth scores reported to parents, newspapers (on an aggregate basis), and the like. While, again, high-stakes are not to be attached to the growth output for four years, the scores will still “count.”

Hence, Sheri and Bruce believe that because they have already “convincingly” shown that the state’s growth model does not “rationally” work for teacher evaluation purposes, and that teacher evaluations as based on the state’s growth model actually violate state law since teachers like Sheri are not capable of getting perfect scores (which is “irrational”), they will continue with this case, also on behalf of New York teachers and principals who are “demoralized” by the system, as well as New York taxpayers who are paying (millions “if not tens of millions of dollars” for the system’s (highly) unreliable and inaccurate results.

As per Bruce’s email: “Spending the next 4 years studying a broken system is a terrible idea and terrible waste of taxpayer $$s. Also, if [NYSED] recognizes that Sheri’s 2013-14 score of 1 out of 20 is wrong [which they apparently recognize given their offer to settle this suit out of court], it’s sad and frustrating that [NYSED] still wants to fight her score unless she drops her challenge to the evaluation system in general.”

“We believe our case is already responsible for the new administrative appeal process in NY, and also partly responsible for Governor Cuomos’ apparent reversal on his stand about teacher evaluations. However, at this point we will not settle and allow important issues to be brushed under the carpet. Sheri and I are committed to pressing ahead with our case.”

To read more about this case via a Politico New York article click here (registration required). To hear more from Bruce Lederman about this case via WCNY-TV, Syracuse, click here. The pertinent section of this interview starts at 22:00 minutes and ends at 36:21. It’s well worth listening!

New Mexico to Change its Teacher Evaluation System, But Not Really

As you all likely recall, the American Federation of Teachers (AFT), joined by the Albuquerque Teachers Federation (ATF), last year, filed a “Lawsuit in New Mexico Challenging [the] State’s Teacher Evaluation System.” In December 2015, state District Judge David K. Thomson granted a preliminary injunction preventing consequences from being attached to the state’s teacher evaluation data. More specifically, Judge Thomson ruled that the state can proceed with “developing” and “improving” its teacher evaluation system, but the state is not to make any consequential decisions about New Mexico’s teachers using the data the state collects until the state (and/or others external to the state) can evidence to the court that the system is reliable, valid, fair, uniform, and the like (see prior post on this ruling here).

Late Friday afternoon, New Mexico’s Public Education Department (PED) announced that they are accordingly changing their NMTEACH teacher evaluation system, and they will be issuing new regulations. Their primary goal is as follows: To (1) “Address major liabilities resulting from litigation” as these liabilities specifically pertain to the former NMTEACH system’s (a) Uniformity, (b) Transparency, and (c) Clarity. On the surface level, this is gratifying to the extent that the state is attempting to, at least theoretically, please the court. But we, and especially those in New Mexico, might refrain from celebrating too soon…given when one reads the PED announcement here, one will see this is yet another example of the state’s futile attempts to keep with a very top-down teacher evaluation system. Note, however, that a uniform teacher evaluation system is also required under state law, although the governor has the right to change state statute should those at the state (including the governor, state superintendent, and PED) decide to eventually work with local districts and schools regarding the construction of a better teacher evaluation system for the state.

As per the PED’s subsequent goals, accordingly, things do not look much different than what they did in the past, especially given why and what got the state involved in such litigation in the first place. While the state also intends to (2) Simplify processes for districts/charters and also for the PED, and this is more or less fair, the state is also to (3) Establish a timeline for providing to districts and schools more current data in that currently such data are delayed by one school year, and these data are (still) needed for the state’s Pay for Performance plans (which was considered one high-stakes consequence at issue in Judge Thompson’s ruling). A tertiary goal is also to deliver in a more timely fashion such data to teacher preparation programs, which is something also of great controversy, as (uninformed) policymakers also continue to believe that colleges of education should also be held accountable for the test scores of their graduates’ students (see why this is problematic, for example, here). In the state’s final expressed goal, they make it explicit that (4) “Moving the timeline enhances the understanding that this system isn’t being used for termination decisions.” While this is certainly good, at least for now, the performance pay program is still something that is of concern. As is the state’s continued attempts to (still) use students’ test scores to evaluate teachers, and the state’s perpetual beliefs that the data errors also exposed by the lawsuit were the fault of the school districts, not the state, which Judge Thomson also noted.

Regardless, here is the state’s “Legal Rationale,” and here is also where things go a bit more awry. As re-positioned by the state/PED, they write that “the NEA and AFT recently advanced lawsuits set on eliminating any meaningful teacher evaluation [emphasis added to highlight the language that state is using to distort the genuine purposes of these lawsuits]. These lawsuits have exposed that the flexibility provided to local authorities has created confusion and complexity. Judge Thomson used this complexity when granting an injunction in the AFT case—citing a confusing array of classifications, tags, assessments, graduated considerations, etc. Judge Thomson made clear that he views this local authority as a threat to the statutorily required uniformity of the system [emphasis added given Judge Thompson said nothing of this sort, in terms of devaluing local authority or control, but rather, he emphasized the state’s menu of options was arbitrary and not uniform, especially given the consequences the state was requiring districts to enforce].” This, again, pertains to what is written in the current state statute in terms of a uniform teacher evaluation system.

Accordingly, and unfortunately, the state’s proposed changes would: “Provide a single plan that all districts and charters would use, providing greater uniformity,” and “Simplify the model from 107 possible classifications to three.” See three other moves detailed in the PED announcement here (e.g., moving data delivery dates, eliminating all but three tests, and the fall 2016 date which all of this is to become official).

Related, see a visual of what the state’s “new and improved” teacher evaluation system, in response to said litigation, is to look like. Unfortunately, again, it really does not look much different than it did prior except, perhaps, in the proposed reductions of testing options. See also the full document from which all of this came here.

Screen Shot 2016-01-30 at 10.20.01 AM

Nonetheless, we will have to wait to see if this, again, will please the court, and Judge Thompson’s ruling that the state is not to make any consequential decisions about New Mexico’s teachers using the data the state collects until the state (and/or others external to the state) can evidence to the court that the system is reliable, valid, etc.

And as for what the President of the American Federation of Teachers (AFT) New Mexico – Stephanie Biondo-Ly – had to say in response, see her press release below. See also an article in the Las Cruces – Sun Times here, in which President Ly is cited as “denounc[ing] the changes and call[ing] them attempts to obscure deficiencies in the [state’s] evaluation system.” From her original press release, she also wrote: “We are troubled…that once again, these changes are being implemented from the top down and if the secretary [Hanna Skandera] and her [PED] staff were serious about improving student outcomes and producing a fair evaluation system, they would have involved teachers, principals, and superintendents in the process.”

here

Houston Lawsuit Update, with Summary of Expert Witnesses’ Findings about the EVAAS

Recall from a prior post that a set of teachers in the Houston Independent School District (HISD), with the support of the Houston Federation of Teachers (HFT) are taking their district to federal court to fight for their rights as professionals, and how their value-added scores, derived via the Education Value-Added Assessment System (EVAAS), have allegedly violated them. The case, Houston Federation of Teachers, et al. v. Houston ISD, is to officially begin in court early this summer.

More specifically, the teachers are arguing that EVAAS output are inaccurate, the EVAAS is unfair, that teachers are being evaluated via the EVAAS using tests that do not match the curriculum they are to teach, that the EVAAS system fails to control for student-level factors that impact how well teachers perform but that are outside of teachers’ control (e.g., parental effects), that the EVAAS is incomprehensible and hence very difficult if not impossible to actually use to improve upon their instruction (i.e., actionable), and, accordingly, that teachers’ due process rights are being violated because teachers do not have adequate opportunities to change as a results of their EVAAS results.

The EVAAS is the one value-added model (VAM) on which I’ve conducted most of my research, also in this district (see, for example, here, here, here, and here); hence, I along with Jesse Rothstein – Professor of Public Policy and Economics at the University of California – Berkeley, who also conducts extensive research on VAMs – are serving as the expert witnesses in this case.

What was recently released regarding this case is a summary of the contents of our affidavits, as interpreted by authors of the attached “EVAAS Litigation UPdate,” in which the authors declare, with our and others’ research in support, that “Studies Declare EVAAS ‘Flawed, Invalid and Unreliable.” Here are the twelve key highlights, again, as summarized by the authors of this report and re-summarized, by me, below:

  1. Large-scale standardized tests have never been validated for their current uses. In other words, as per my affidavit, “VAM-based information is based upon large-scale achievement tests that have been developed to assess levels of student achievement, but not levels of growth in student achievement over time, and not levels of growth in student achievement over time that can be attributed back to students’ teachers, to capture the teachers’ [purportedly] causal effects on growth in student achievement over time.”
  2. The EVAAS produces different results from another VAM. When, for this case, Rothstein constructed and ran an alternative, albeit sophisticated VAM using data from HISD both times, he found that results “yielded quite different rankings and scores.” This should not happen if these models are indeed yielding indicators of truth, or true levels of teacher effectiveness from which valid interpretations and assertions can be made.
  3. EVAAS scores are highly volatile from one year to the next. Rothstein, when running the actual data, found that while “[a]ll VAMs are volatile…EVAAS growth indexes and effectiveness categorizations are particularly volatile due to the EVAAS model’s failure to adequately account for unaccounted-for variation in classroom achievement.” In addition, volatility is “particularly high in grades 3 and 4, where students have relatively few[er] prior [test] scores available at the time at which the EVAAS scores are first computed.”
  4. EVAAS overstates the precision of teachers’ estimated impacts on growth. As per Rothstein, “This leads EVAAS to too often indicate that teachers are statistically distinguishable from the average…when a correct calculation would indicate that these teachers are not statistically distinguishable from the average.”
  5. Teachers of English Language Learners (ELLs) and “highly mobile” students are substantially less likely to demonstrate added value, as per the EVAAS, and likely most/all other VAMs. This, what we term as “bias,” makes it “impossible to know whether this is because ELL teachers [and teachers of highly mobile students] are, in fact, less effective than non-ELL teachers [and teachers of less mobile students] in HISD, or whether it is because the EVAAS VAM is biased against ELL [and these other] teachers.”
  6. The number of students each teacher teaches (i.e., class size) also biases teachers’ value-added scores. As per Rothstein, “teachers with few linked students—either because they teach small classes or because many of the students in their classes cannot be used for EVAAS calculations—are overwhelmingly [emphasis added] likely to be assigned to the middle effectiveness category under EVAAS (labeled “no detectable difference [from average], and average effectiveness”) than are teachers with more linked students.”
  7. Ceiling effects are certainly an issue. Rothstein found that in some grades and subjects, “teachers whose students have unusually high prior year scores are very unlikely to earn high EVAAS scores, suggesting that ‘ceiling effects‘ in the tests are certainly relevant factors.” While EVAAS and HISD have previously acknowledged such problems with ceiling effects, they apparently believe these effects are being mediated with the new and improved tests recently adopted throughout the state of Texas. Rothstein, however, found that these effects persist even given the new and improved.
  8. There are major validity issues with “artificial conflation.” This is a term I recently coined to represent what is happening in Houston, and elsewhere (e.g., Tennessee), when district leaders (e.g., superintendents) mandate or force principals and other teacher effectiveness appraisers or evaluators, for example, to align their observational ratings of teachers’ effectiveness with value-added scores, with the latter being the “objective measure” around which all else should revolve, or align; hence, the conflation of the one to match the other, even if entirely invalid. As per my affidavit, “[t]o purposefully and systematically endorse the engineering and distortion of the perceptible ‘subjective’ indicator, using the perceptibly ‘objective’ indicator as a keystone of truth and consequence, is more than arbitrary, capricious, and remiss…not to mention in violation of the educational measurement field’s Standards for Educational and Psychological Testing” (American Educational Research Association (AERA), American Psychological Association (APA), National Council on Measurement in Education (NCME), 2014).
  9. Teaching-to-the-test is of perpetual concern. Both Rothstein and I, independently, noted concerns about how “VAM ratings reward teachers who teach to the end-of-year test [more than] equally effective teachers who focus their efforts on other forms of learning that may be more important.”
  10. HISD is not adequately monitoring the EVAAS system. According to HISD, EVAAS modelers keep the details of their model secret, even from them and even though they are paying an estimated $500K per year for district teachers’ EVAAS estimates. “During litigation, HISD has admitted that it has not performed or paid any contractor to perform any type of verification, analysis, or audit of the EVAAS scores. This violates the technical standards for use of VAM that AERA specifies, which provide that if a school district like HISD is going to use VAM, it is responsible for ‘conducting the ongoing evaluation of both intended and unintended consequences’ and that ‘monitoring should be of sufficient scope and extent to provide evidence to document the technical quality of the VAM application and the validity of its use’ (AERA Statement, 2015).
  11. EVAAS lacks transparency. AERA emphasizes the importance of transparency with respect to VAM uses. For example, as per the AERA Council who wrote the aforementioned AERA Statement, “when performance levels are established for the purpose of evaluative decisions, the methods used, as well as the classification accuracy, should be documented and reported” (AERA Statement, 2015). However, and in contrast to meeting AERA’s requirements for transparency, in this district and elsewhere, as per my affidavit, the “EVAAS is still more popularly recognized as the ‘black box’ value-added system.”
  12. Related, teachers lack opportunities to verify their own scores. This part is really interesting. “As part of this litigation, and under a very strict protective order that was negotiated over many months with SAS [i.e., SAS Institute Inc. which markets and delivers its EVAAS system], Dr. Rothstein was allowed to view SAS’ computer program code on a laptop computer in the SAS lawyer’s office in San Francisco, something that certainly no HISD teacher has ever been allowed to do. Even with the access provided to Dr. Rothstein, and even with his expertise and knowledge of value-added modeling, [however] he was still not able to reproduce the EVAAS calculations so that they could be verified.”Dr. Rothstein added, “[t]he complexity and interdependency of EVAAS also presents a barrier to understanding how a teacher’s data translated into her EVAAS score. Each teacher’s EVAAS calculation depends not only on her students, but also on all other students with- in HISD (and, in some grades and years, on all other students in the state), and is computed using a complex series of programs that are the proprietary business secrets of SAS Incorporated. As part of my efforts to assess the validity of EVAAS as a measure of teacher effectiveness, I attempted to reproduce EVAAS calculations. I was unable to reproduce EVAAS, however, as the information provided by HISD about the EVAAS model was far from sufficient.”

Another Take on New Mexico’s Ruling on the State’s Teacher Evaluation/VAM System

John Thompson, a historian and teacher, wrote a post just published in Diane Ravitch’s blog (here) in which he took a closer look at the New Mexico court decision of which I was a part and which I covered a few weeks ago (here). This is the case in which state District Judge David K. Thomson, who presided over the five-day teacher-evaluation lawsuit in New Mexico, granted a preliminary injunction preventing consequences from being attached to the state’s teacher evaluation data.

Historian/Teacher John Thompson adds another, and also independent take on this ruling, again here, also having read through Judge Thomson’s entire ruling. Here’s what he wrote:

New Mexico District Judge David K. Thomson granted a preliminary injunction preventing consequences from being attached to the state’s teacher evaluation data. As Audrey Amrein-Beardsley explains, “can proceed with ‘developing’ and ‘improving’ its teacher evaluation system, but the state is not to make any consequential decisions about New Mexico’s teachers using the data the state collects until the state (and/or others external to the state) can evidence to the court during another trial (set for now, for April) that the system is reliable, valid, fair, uniform, and the like.”

This is wonderful news. As the American Federation of Teachers observes, “Superintendents, principals, parents, students and the teachers have spoken out against a system that is so rife with errors that in some districts as many as 60 percent of evaluations were incorrect. It is telling that the judge characterizes New Mexico’s system as a ‘policy experiment’ and says that it seems to be a ‘Beta test where teachers bear the burden for its uneven and inconsistent application.’”

A close reading of the ruling makes it clear that this case is an even greater victory over the misuse of test-driven accountability than even the jubilant headlines suggest. It shows that Judge Thomson made the right ruling on the key issues for the right reasons, and he seems to be predicting that other judges will be following his legal logic. Litigation over value-added teacher evaluations is being conducted in 14 states, and the legal battleground is shifting to the place where corporate reformers are weakest. No longer are teachers being forced to prove that there is no rational basis for defending the constitutionality of value-added evaluations. Now, the battleground is shifting to the actual implementation of those evaluations and how they violate state laws.

Judge Thomson concludes that the state’s evaluation systems don’t “resemble at all the theory” they were based on. He agreed with the district superintendent who compared it to the Wizard of Oz, where “the guy is behind the curtain and pulling levers and it is loud.” Some may say that the Wizard’s behavior is “understandable,” but that is not the judge’s concern. The Court must determine whether the consequences are assessed by a system that is “objective and uniform.” Clearly, it has been impossible in New Mexico and elsewhere for reformers to meet the requirements they mandated, and that is the legal terrain where VAM proponents must now fight.

The judge thus concludes, “New Mexico’s evaluation system is less like a [sound] model than a cafeteria-style evaluation system where the combination of factors, data, and elements are not easily determined and the variance from school district to school district creates conflicts with the [state] statutory mandate.”

The state of New Mexico counters by citing cases in Florida and Tennessee as precedents. But, Judge Thomson writes that those cases ruled against early challenges based on equal protection or constitutional issues, as they have also cited practical concerns in implementation. He writes of the Florida (Cook) case, “The language in the Cook case could be lifted from the Court findings in this case.” That state’s judge decided “‘The unfairness of this system is not lost on this Court.’” Judge Thomson also argues, “The (Florida) Court in fact seemed to predict the type of legal challenge that could result …‘The individual plaintiffs have a separate remedy to challenge an evaluation on procedural due process grounds if an evaluation is actually used to deprive the teacher of an evaluation right.’”

The question in Florida and Tennessee had been whether there was “a conceivable rational basis” for proceeding with the teacher evaluation policy experiment. Below are some of the more irrational results of those evaluations. The facts in the New Mexico case may be somewhat more absurd than those in other places that have implemented VAMs but, given the inherent flaws in those evaluations, I doubt they are qualitatively worse. In fact, Audrey Amrein-Beardsley testified about a similar outcome in Houston which was as awful as the New Mexico travesties and led to about 1/4th of their teachers subject to those evaluations being subject to “growth plans.”

As has become common across the nation, New Mexico teachers have been evaluated on students who aren’t in the teachers’ classrooms. They have been held accountable for test results from subjects that the teacher didn’t teach. Science teachers might be evaluated on a student taught in 2011, based on how that student scored in 2013.

The judge cited testimony regarding a case where 50% of the teachers rated Minimally Effective had missing data due to reassignment to a wrong group. One year, a district questioned the state’s data, and immediately it saw an unexplained 11% increase in effective teachers. The next year, also without explanation, the state’s original numbers on effectiveness were reduced by 6%.

One teacher taught 160 students but was evaluated on scores of 73 of them and was then placed on a plan for improvement. Because of the need to quantify the effectiveness of teachers in Group B and Group C, who aren’t subject to state End of Instruction tests, there are 63 different tests being used in one district to generate high-stakes data. And, when changing tests to the Common Core PARCC test, the state has to violate scientific protocol, and mix and match test score results in an indefensible manner. Perhaps just as bad, in 2014-15, 76% of teachers were still being evaluated on less than three years of data.

The Albuquerque situation seems exceptionally important because it serves 25% of the state’s students, and it is the type of high-poverty system where value-added evaluations are likely to be most unreliable and invalid. It had 1728 queries about data and 28% of its teachers ranked below the Effective level. The judge noted that if you teach a core subject, you are twice as likely as a French teacher to be judged Ineffective. But, that was not the most shocking statistic. In Albuquerque, Group A elementary teachers (where VAMs play a larger role) are five times more likely to be rated below Effective than their colleagues in Group B. In Roswell, Group B teachers are three times more likely to be rated below Effective than Group C teachers.

Curiously, VAM advocate Tom Kane testified, but he did so in a way the made it unclear whether he saw himself as a witness for the defense or the plaintiffs. When asked about Amrein-Beardsley’s criticism of using tests that weren’t designed for evaluating teachers, Kane countered that the Gates Foundation MET study used random samples and concluded that differing tests could be used in a way that was “useful in evaluating teachers” and valid predictors of student achievement. Kane also replied that he could estimate the state’s error rate “on average,” but he couldn’t estimate error rates for individual teachers. He did not address the judge’s real concern about whether New Mexico’s use of VAMs was uniform and objective.

I am not a lawyer but I have years of experience as a legal historian. Although I have long been disappointed that the legal profession did not condemn value-added evaluations as a violation of our democracy’s fundamental principles, I also knew that the first wave of lawsuits challenging VAMs would face an uphill battle. Using teachers as guinea pigs in a risky experiment, where non-educators imposed their untested opinions on public schools, was always bad policy. Along with their other sins, value-added evaluations would mean collective punishment of some teachers merely for teaching in schools and classes where it is harder to meet dubious test score growth targets. But, many officers of the court might decide that they did not have the grounds to overrule new teacher evaluation laws. They might have to hold their noses while ruling in favor of laws that make a mockery of our tenets of fairness in a constitutional democracy.

During the last few years, rather than force those who would destroy the hard-earned legal rights of teachers to meet the legal standard of “strict scrutiny,” those who would fire teachers without proving that their data was reliable and valid have mostly had to show that their policies were not irrational. Now that their policies are being implemented, reformers must defend the ways that their VAMs are actually being used. Corporate reformers and the Duncan administration were able to coerce almost all of the states into writing laws requiring quantitative components in teacher evaluations. Not surprisingly, it has often proven impossible to implement their schemes in a rational manner.

In theory, corporate reformers could have won if they required the high-stakes use of flawed metrics while maintaining the message discipline that they are famous for. School administrators could have been trained to say that they were merely enforcing the law when they assessed consequences based on metrics. Their job would have been to recite the standard soundbite when firing teachers – saying that their metrics may or may not reflect the actual performance of the teacher in question – but the law required that practice. Life’s not fair, they could have said, and whether or not the individual teacher was being unfairly sacrificed, the administrators who enforced the law were just following orders. It was the will of the lawmakers that the firing of the teachers with the lowest VAMs – regardless of whether the metric reflected actually effectiveness – would make schools more like corporations, so practitioners would have to accept it. But, this is one more case where reformers ignored the real world, did not play out the education policy and legal chess game, and [did or did not] anticipate that rulings such as Judge Thomson’s would soon be coming.

Real world, VAM advocates had to claim that its results represented the actual effectiveness of teachers and that, somehow, their scheme would someday improve schools. This liberated teachers and administrators to fight back in the courts. Moreover, top-down reformers set out to impose the same basic system on every teacher, in every type of class and school, in our diverse nation. When this top-down micromanaging met reality, proponents of test-driven evaluations had to play so many statistical games, create so many made-up metrics, and improvise in so many bizarre ways, that the resulting mess would be legally indefensible.

And, that is why the cases in Florida and Tennessee might soon be seen as the end of the beginning of the nation’s repudiation of value-added evaluations. The New Mexico case, along with the renewal of the federal ESEA and the departure of Arne Duncan, is clearly the beginning of the end. Had VAM proponents objectively briefed attorneys on the strengths and weaknesses of their theories, they could have thought through the inevitable legal process. On the other hand, I doubt that Kane and his fellow economists knew enough about education to be able to anticipate the inevitable, unintended results of their theories on schools. In numerous conversations with VAM true believers, rarely have I met one who seemed to know enough about the nuts and bolts about schools to be able to brief legal advisors, much less anticipate the inevitable results that would eventually have to be defended in court.

Victory in Court: Consequences Attached to VAMs Suspended Throughout New Mexico

Great news for New Mexico and New Mexico’s approximately 23,000 teachers, and great news for states and teachers potentially elsewhere, in terms of setting precedent!

Late yesterday, state District Judge David K. Thomson, who presided over the ongoing teacher-evaluation lawsuit in New Mexico, granted a preliminary injunction preventing consequences from being attached to the state’s teacher evaluation data. More specifically, Judge Thomson ruled that the state can proceed with “developing” and “improving” its teacher evaluation system, but the state is not to make any consequential decisions about New Mexico’s teachers using the data the state collects until the state (and/or others external to the state) can evidence to the court during another trial (set for now, for April) that the system is reliable, valid, fair, uniform, and the like.

As you all likely recall, the American Federation of Teachers (AFT), joined by the Albuquerque Teachers Federation (ATF), last year, filed a “Lawsuit in New Mexico Challenging [the] State’s Teacher Evaluation System.” Plaintiffs charged that the state’s teacher evaluation system, imposed on the state in 2012 by the state’s current Public Education Department (PED) Secretary Hanna Skandera (with value-added counting for 50% of teachers’ evaluation scores), is unfair, error-ridden, spurious, harming teachers, and depriving students of high-quality educators, among other claims (see the actual lawsuit here).

Thereafter, one scheduled day of testimonies turned into five in Santa Fe, that ran from the end of September through the beginning of October (each of which I covered here, here, here, here, and here). I served as the expert witness for the plaintiff’s side, along with other witnesses including lawmakers (e.g., a state senator) and educators (e.g., teachers, superintendents) who made various (and very articulate) claims about the state’s teacher evaluation system on the stand. Thomas Kane served as the expert witness for the defendant’s side, along with other witnesses including lawmakers and educators who made counter claims about the system, some of which backfired, unfortunately for the defense, primarily during cross-examination.

See articles released about this ruling this morning in the Santa Fe New Mexican (“Judge suspends penalties linked to state’s teacher eval system”) and the Albuquerque Journal (“Judge curbs PED teacher evaluations).” See also the AFT’s press release, written by AFT President Randi Weingarten, here. Click here for the full 77-page Order written by Judge Thomson (see also, below, five highlights I pulled from this Order).

The journalist of the Santa Fe New Mexican, though, provided the most detailed information about Judge Thomson’s Order, writing, for example, that the “ruling by state District Judge David Thomson focused primarily on the complicated combination of student test scores used to judge teachers. The ruling [therefore] prevents the Public Education Department [PED] from denying teachers licensure advancement or renewal, and it strikes down a requirement that poorly performing teachers be placed on growth plans.” In addition, the Judge noted that “the teacher evaluation system varies from district to district, which goes against a state law calling for a consistent evaluation plan for all educators.”

The PED continues to stand by its teacher evaluation system, calling the court challenge “frivolous” and “a legal PR stunt,” all the while noting that Judge Thomson’s decision “won’t affect how the state conducts its teacher evaluations.” Indeed it will, for now and until the state’s teacher evaluation system is vetted, and validated, and “the court” is “assured” that the system can actually be used to take the “consequential actions” against teachers, “required” by the state’s PED.

Here are some other highlights that I took directly from Judge Thomson’s ruling, capturing what I viewed as his major areas of concern about the state’s system (click here, again, to read Judge Thomson’s full Order):

  • Validation Needed: “The American Statistical Association says ‘estimates from VAM should always be accompanied by measures of precision and a discussion of the assumptions and possible limitations of the model. These limitations are particularly relevant if VAM are used for high stake[s] purposes” (p. 1). These are the measures, assumptions, limitations, and the like that are to be made transparent in this state.
  • Uniformity Required: “New Mexico’s evaluation system is less like a [sound] model than a cafeteria-style evaluation system where the combination of factors, data, and elements are not easily determined and the variance from school district to school district creates conflicts with the [state] statutory mandate” (p. 2)…with the existing statutory framework for teacher evaluations for licensure purposes requiring “that the teacher be evaluated for ‘competency’ against a ‘highly objective uniform statewide standard of evaluation’ to be developed by PED” (p. 4). “It is the term ‘highly objective uniform’ that is the subject matter of this suit” (p. 4), whereby the state and no other “party provided [or could provide] the Court a total calculation of the number of available district-specific plans possible given all the variables” (p. 54). See also the Judge’s points #78-#80 (starting on page 70) for some of the factors that helped to “establish a clear lack of statewide uniformity among teachers” (p. 70).
  • Transparency Missing: “The problem is that it is not easy to pull back the curtain, and the inner workings of the model are not easily understood, translated or made accessible” (p. 2). “Teachers do not find the information transparent or accurate” and “there is no evidence or citation that enables a teacher to verify the data that is the content of their evaluation” (p. 42). In addition, “[g]iven the model’s infancy, there are no real studies to explain or define the [s]tate’s value-added system…[hence, the consequences and decisions]…that are to be made using such system data should be examined and validated prior to making such decisions” (p. 12).
  • Consequences Halted: “Most significant to this Order, [VAMs], in this [s]tate and others, are being used to make consequential decisions…This is where the rubber hits the road [as per]…teacher employment impacts. It is also where, for purposes of this proceeding, the PED departs from the statutory mandate of uniformity requiring an injunction” (p. 9). In addition, it should be noted that indeed “[t]here are adverse consequences to teachers short of termination” (p. 33) including, for example, “a finding of ‘minimally effective’ [that] has an impact on teacher licenses” (p. 41). These, too, are to be halted under this injunction Order.
  • Clarification Required: “[H]ere is what this [O]rder is not: This [O]rder does not stop the PED’s operation, development and improvement of the VAM in this [s]tate, it simply restrains the PED’s ability to take consequential actions…until a trial on the merits is held” (p. 2). In addition, “[a] preliminary injunction differs from a permanent injunction, as does the factors for its issuance…’ The objective of the preliminary injunction is to preserve the status quo [minus the consequences] pending the litigation of the merits. This is quite different from finally determining the cause itself” (p. 74). Hence, “[t]he court is simply enjoining the portion of the evaluation system that has adverse consequences on teachers” (p. 75).

The PED also argued that “an injunction would hurt students because it could leave in place bad teachers.” As per Judge Thomson, “That is also a faulty argument. There is no evidence that temporarily halting consequences due to the errors outlined in this lengthy Opinion more likely results in retention of bad teachers than in the firing of good teachers” (p. 75).

Finally, given my involvement in this lawsuit and given the team with whom I was/am still so fortunate to work (see picture below), including all of those who testified as part of the team and whose testimonies clearly proved critical in Judge Thomson’s final Order, I want to thank everyone for all of their time, energy, and efforts in this case, thus far, on behalf of the educators attempting to (still) do what they love to do — teach and serve students in New Mexico’s public schools.

IMG_0123

Left to right: (1) Stephanie Ly, President of AFT New Mexico; (2) Dan McNeil, AFT Legal Department; (3) Ellen Bernstein, ATF President; (4) Shane Youtz, Attorney at Law; and (5) me 😉

Houston’s “Split” Decision to Give Superintendent Grier $98,600 in Bonuses, Pre-Resignation

States of attention on this blog, and often of (dis)honorable mention as per their state-level policies bent on value-added models (VAMs), include Florida, New York, Tennessee, and New Mexico. As for a quick update about the latter state of New Mexico, we are still waiting to hear the final decision from the judge who recently heard the state-level lawsuit still pending on this matter in New Mexico (see prior posts about this case here, here, here, here, and here).

Another locale of great interest, though, is the Houston Independent School District. This is the seventh largest urban school district in the nation, and the district that has tied more high-stakes consequences to their value-added output than any other district/state in the nation. These “initiatives” were “led” by soon-to-resign/retire Superintendent Terry Greir who, during his time in Houston (2009-2015), implemented some of the harshest consequences ever attached to teacher-level value-added output, as per the district’s use of the Education Value-Added Assessment System (EVAAS) (see other posts about the EVAAS here, here, and here; see other posts about Houston here, here, and here).

In fact, the EVAAS is still used throughout Houston today to evaluate all EVAAS-eligible teachers, to also “reform” the district’s historically low-performing schools, by tying teachers’ purported value-added performance to teacher improvement plans, merit pay, nonrenewal, and termination (e.g., 221 Houston teachers were terminated “in large part” due to their EVAAS scores in 2011). However, pending litigation (i.e., this is the district in which the American and Houston Federation of Teachers (AFT/HFT) are currently suing the district for their wrongful use of, and over-emphasis on this particular VAM; see here), Superintendent Grier and the district have recoiled on some of the high-stakes consequences they formerly attached to the EVAAS  This particular lawsuit is to commence this spring/summer.

Nonetheless, my most recent post about Houston was about some of its future school board candidates, who were invited by The Houston Chronicle to respond to Superintendent Grier’s teacher evaluation system. For the most part, those who responded did so unfavorably, especially as the evaluation systems was/is disproportionately reliant on teachers’ EVAAS data and high-stakes use of these data in particular (see here).

Most recently, however, as per a “split” decision registered by Houston’s current school board (i.e., 4:3, and without any new members elected last November), Superintendent Grier received a $98,600 bonus for his “satisfactory evaluation” as the school district’s superintendent. See more from the full article published in The Houston Chronicle. As per the same article, Superintendent “Grier’s base salary is $300,000, plus $19,200 for car and technology allowances. He also is paid for unused leave time.”

More importantly, take a look at the two figures below, taken from actual district reports (see references below), highlighting Houston’s performance (declining, on average, in blue) as compared to the state of Texas (maintaining, on average, in black), to determine for yourself whether Superintendent Grier, indeed, deserved such a bonus (not to mention salary).

Another question to ponder is whether the district’s use of the EVAAS value-added system, especially since Superintendent Grier’s arrival in 2009, is actually reforming the school district as he and other district leaders have for so long now intended (e.g., since his Superintendent appointment in 2009).

Figure 1

Figure 1. Houston (blue trend line) v. Texas (black trend line) performance on the state’s STAAR tests, 2012-2015 (HISD, 2015a)

Figure 2

Figure 2. Houston (blue trend line) v. Texas (black trend line) performance on the state’s STAAR End-of-Course (EOC) tests, 2012-2015 (HISD, 2015b)

References:

Houston Independent School District (HISD). (2015a). State of Texas Assessments of Academic Readiness (STAAR) performance, grades 3-8, spring 2015. Retrieved here.

Houston Independent School District (HISD). (2015b). State of Texas Assessments of Academic Readiness (STAAR) end-of-course results, spring 2015. Retrieved here.