One Score and Seven Policy Iterations Ago…

I just read what might be one of the best articles I’ve read in a long time on using test scores to measure teacher effectiveness, and why this is such a bad idea. Not surprisingly, unfortunately, this article was written 20 years ago (i.e., 1986) by – Edward Haertel, National Academy of Education member and recently retired Professor at Stanford University. If the name sounds familiar, it should as Professor Emeritus Haertel is one of the best on the topic of, and history behind VAMs (see prior posts about his related scholarship here, here, and here). To access the full article, please scroll to the reference at the bottom of this post.

Heartel wrote this article when at the time policymakers were, like they still are now, trying to hold teachers accountable for their students’ learning as measured on states’ standardized test scores. Although this article deals with minimum competency tests, which were in policy fashion at the time, about seven policy iterations ago, the contents of the article still have much relevance given where we are today — investing in “new and improved” Common Core tests and still riding on unsinkable beliefs that this is the way to reform the schools that have been in despair and (still) in need of major repair since 20+ years ago.

Here are some of the points I found of most “value:”

  • On isolating teacher effects: “Inferring teacher competence from test scores requires the isolation of teaching effects from other major influences on student test performance,” while “the task is to support an interpretation of student test performance as reflecting teacher competence by providing evidence against plausible rival hypotheses or interpretation.” While “student achievement depends on multiple factors, many of which are out of the teacher’s control,” and many of which cannot and likely never will be able to be “controlled.” In terms of home supports, “students enjoy varying levels of out-of-school support for learning. Not only may parental support and expectations influence student motivation and effort, but some parents may share directly in the task of instruction itself, reading with children, for example, or assisting them with homework.” In terms of school supports, “[s]choolwide learning climate refers to the host of factors that make a school more than a collection of self-contained classrooms. Where the principal is a strong instructional leader; where schoolwide policies on attendance, drug use, and discipline are consistently enforced; where the dominant peer culture is achievement-oriented; and where the school is actively supported by parents and the community.” This, all, makes isolating the teacher effect nearly if not wholly impossible.
  • On the difficulties with defining the teacher effect: “Does it include homework? Does it include self-directed study initiated by the student? How about tutoring by a parent or an older sister or brother? For present purposes, instruction logically refers to whatever the teacher being evaluated is responsible for, but there are degrees of responsibility, and it is often shared. If a teacher informs parents of a student’s learning difficulties and they arrange for private tutoring, is the teacher responsible for the student’s improvement? Suppose the teacher merely gives the student low marks, the student informs her parents, and they arrange for a tutor? Should teachers be credited with inspiring a student’s independent study of school subjects? There is no time to dwell on these difficulties; others lie ahead. Recognizing that some ambiguity remains, it may suffice to define instruction as any learning activity directed by the teacher, including homework….The question also must be confronted of what knowledge counts as achievement. The math teacher who digresses into lectures on beekeeping may be effective in communicating information, but for purposes of teacher evaluation the learning outcomes will not match those of a colleague who sticks to quadratic equations.” Much if not all of this cannot and likely never will be able to be “controlled” or “factored” in or our, as well.
  • On standardized tests: The best of standardized tests will (likely) always be too imperfect and not up to the teacher evaluation task, no matter the extent to which they are pitched as “new and improved.” While it might appear that these “problem[s] could be solved with better tests,” they cannot. Ultimately, all that these tests provide is “a sample of student performance. The inference that this performance reflects educational achievement [not to mention teacher effectiveness] is probabilistic [emphasis added], and is only justified under certain conditions.” Likewise, these tests “measure only a subset of important learning objectives, and if teachers are rated on their students’ attainment of just those outcomes, instruction of unmeasured objectives [is also] slighted.” Like it was then as it still is today, “it has become a commonplace that standardized student achievement tests are ill-suited for teacher evaluation.”
  • On the multiple choice formats of such tests: “[A] multiple-choice item remains a recognition task, in which the problem is to find the best of a small number of predetermined alternatives and the cri- teria for comparing the alternatives are well defined. The nonacademic situations where school learning is ultimately ap- plied rarely present problems in this neat, closed form. Discovery and definition of the problem itself and production of a variety of solutions are called for, not selection among a set of fixed alternatives.”
  • On students and the scores they are to contribute to the teacher evaluation formula: “Students varying in their readiness to profit from instruction are said to differ in aptitude. Not only general cognitive abilities, but relevant prior instruction, motivation, and specific inter- actions of these and other learner characteristics with features of the curriculum and instruction will affect academic growth.” In other words, one cannot simply assume all students will learn or grow at the same rate with the same teacher. Rather, they will learn at different rates given their aptitudes, their “readiness to profit from instruction,” the teachers’ instruction, and sometimes despite the teachers’ instruction or what the teacher teaches.
  • And on the formative nature of such tests, as it was then: “Teachers rarely consult standardized test results except, perhaps, for initial grouping or placement of students, and they believe that the tests are of more value to school or district administrators than to themselves.”

Sound familiar?

Reference: Haertel, E. (1986). The valid use of student performance measures for teacher evaluation. Educational Evaluation and Policy Analysis, 8(1), 45-60.

The Late Stephen Jay Gould on IQ Testing (with Implications for Testing Today)

One of my doctoral students sent me a YouTube video I feel compelled to share with you all. It is an interview with one of my all time favorite and most admired academics — Stephen Jay Gould. Gould, who passed away at age 60 from cancer, was a paleontologist, evolutionary biologist, and scientist who spent most of his academic career at Harvard. He was “one of the most influential and widely read writers of popular science of his generation,” and he was also the author of one of my favorite books of all time: The Mismeasure of Man (1981).

In The Mismeasure of Man Gould examined the history of psychometrics and the history of intelligence testing (e.g., the methods of nineteenth century craniometry, or the physical measures of peoples’ skulls to “objectively” capture their intelligence). Gould examined psychological testing and the uses of all sorts of tests and measurements to inform decisions (which is still, as we know, uber-relevant today) as well as “inform” biological determinism (i.e., “the view that “social and economic differences between human groups—primarily races, classes, and sexes—arise from inherited, inborn distinctions and that society, in this sense, is an accurate reflection of biology). Gould also examined in this book the general use of mathematics and “objective” numbers writ large to measure pretty much anything, as well as to measure and evidence predetermined sets of conclusions. This book is, as I mentioned, one of the best. I highly recommend it to all.

In this seven-minute video, you can get a sense of what this book is all about, as also so relevant to that which we continue to believe or not believe about tests and what they really are or are not worth. Thanks, again, to my doctoral student for finding this as this is a treasure not to be buried, especially given Gould’s 2002 passing.

Another Oldie but Still Very Relevant Goodie, by McCaffrey et al.

I recently re-read an article in full that is now 10 years old, or 10 years out, as published in 2004 and, as per the words of the authors, before VAM approaches were “widely adopted in formal state or district accountability systems.” Unfortunately, I consistently find it interesting, particularly in terms of the research on VAMs, to re-explore/re-discover what we actually knew 10 years ago about VAMs, as most of the time, this serves as a reminder of how things, most of the time, have not changed.

The article, “Models for Value-Added Modeling of Teacher Effects,” is authored by Daniel McCaffrey (Educational Testing Service [ETS] Scientist, and still a “big name” in VAM research), J. R. Lockwood (RAND Corporation Scientists),  Daniel Koretz (Professor at Harvard), Thomas Louis (Professor at Johns Hopkins), and Laura Hamilton (RAND Corporation Scientist).

At the point at which the authors wrote this article, besides the aforementioned data and data base issues, were issues with “multiple measures on the same student and multiple teachers instructing each student” as “[c]lass groupings of students change annually, and students are taught by a different teacher each year.” Authors, more specifically, questioned “whether VAM really does remove the effects of factors such as prior performance and [students’] socio-economic status, and thereby provide[s] a more accurate indicator of teacher effectiveness.”

The assertions they advanced, accordingly and as relevant to these questions, follow:

  • Across different types of VAMs, given different types of approaches to control for some of the above (e.g., bias), teachers’ contribution to total variability in test scores (as per value-added gains) ranged from 3% to 20%. That is, teachers can realistically only be held accountable for 3% to 20% of the variance in test scores using VAMs, while the other 80% to 97% of the variance (stil) comes from influences outside of the teacher’s control. A similar statistic (i.e., 1% to 14%) was similarly and recently highlighted in the recent position statement on VAMs released by the American Statistical Association.
  • Most VAMs focus exclusively on scores from standardized assessments, although I will take this one-step further now, noting that all VAMs now focus exclusively on large-scale standardized tests. This I evidenced in a recent paper I published here: Putting growth and value-added models on the map: A national overview).
  • VAMs introduce bias when missing test scores are not missing completely at random. The missing at random assumption, however, runs across most VAMs because without it, data missingness would be pragmatically insolvable, especially “given the large proportion of missing data in many achievement databases and known differences between students with complete and incomplete test data.” The really only solution here is to use “implicit imputation of values for unobserved gains using the observed scores” which is “followed by estimation of teacher effect[s] using the means of both the imputed and observe gains [together].”
  • Bias “[still] is one of the most difficult issues arising from the use of VAMs to estimate school or teacher effects…[and]…the inclusion of student level covariates is not necessarily the solution to [this] bias.” In other words, “Controlling for student-level covariates alone is not sufficient to remove the effects of [students’] background [or demographic] characteristics.” There is a reason why bias is still such a highly contested issue when it comes to VAMs (see a recent post about this here).
  • All (or now most) commonly-used VAMs assume that teachers’ (and prior teachers’) effects persist undiminished over time. This assumption “is not empirically or theoretically justified,” either, yet it persists.

These authors’ overall conclusion, again from 10 years ago but one that in many ways still stands? VAMs “will often be too imprecise to support some of [its] desired inferences” and uses including, for example, making low- and high-stakes decisions about teacher effects as produced via VAMs. “[O]btaining sufficiently precise estimates of teacher effects to support ranking [and such decisions] is likely to [forever] be a challenge.”

No More EVAAS for Houston: School Board Tie Vote Means Non-Renewal

Recall from prior posts (here, here, and here) that seven teachers in the Houston Independent School District (HISD), with the support of the Houston Federation of Teachers (HFT), are taking HISD to federal court over how their value-added scores, derived via the Education Value-Added Assessment System (EVAAS), are being used, and allegedly abused, while this district that has tied more high-stakes consequences to value-added output than any other district/state in the nation. The case, Houston Federation of Teachers, et al. v. Houston ISD, is ongoing.

But just announced is that the HISD school board, in a 3:3 split vote late last Thursday night, elected to no longer pay an annual $680K to SAS Institute Inc. to calculate the district’s EVAAS value-added estimates. As per an HFT press release (below), HISD “will not be renewing the district’s seriously flawed teacher evaluation system, [which is] good news for students, teachers and the community, [although] the school board and incoming superintendent must work with educators and others to choose a more effective system.”

here

Apparently, HISD was holding onto the EVAAS, despite the research surrounding the EVAAS in general and in Houston, in that they have received (and are still set to receive) over $4 million in federal grant funds that has required them to have value-added estimates as a component of their evaluation and accountability system(s).

While this means that the federal government is still largely in favor of the use of value-added model (VAMs) in terms of its funding priorities, despite their prior authorization of the Every Student Succeeds Act (ESSA) (see here and here), this also means that HISD might have to find another growth model or VAM to still comply with the feds.

Regardless, during the Thursday night meeting a board member noted that HISD has been kicking this EVAAS can down the road for 5 years. “If not now, then when?” the board member asked. “I remember talking about this last year, and the year before. We all agree that it needs to be changed, but we just keep doing the same thing.” A member of the community said to the board: “VAM hasn’t moved the needle [see a related post about this here]. It hasn’t done what you need it to do. But it has been very expensive to this district.” He then listed the other things on which HISD could spend (and could have spent) its annual $680K EVAAS estimate costs.

Soon thereafter, the HISD school board called for a vote, and it ended up being a 3-3 tie. Because of the 3-3 tie vote, the school board rejected the effort to continue with the EVAAS. What this means for the related and aforementioned lawsuit is still indeterminate at this point.

Massachusetts Also Moving To Remove Growth Measures from State’s Teacher Evaluation Systems

Since the passage of the Every Student Succeeds Act (ESSA) last January, in which the federal government handed back to states the authority to decide whether to evaluate teachers with or without students’ test scores, states have been dropping the value-added measure (VAM) or growth components (e.g., the Student Growth Percentiles (SGP) package) of their teacher evaluation systems, as formerly required by President Obama’s Race to the Top initiative. See my most recent post here, for example, about how legislators in Oklahoma recently removed VAMs from their state-level teacher evaluation system, while simultaneously increasing the state’s focus on the professional development of all teachers. Hawaii recently did the same.

Now, it seems that Massachusetts is the next at least moving in this same direction.

As per a recent article in The Boston Globe (here), similar test-based teacher accountability efforts are facing increased opposition, primarily from school district superintendents and teachers throughout the state. At issue is whether all of this is simply “becoming a distraction,” whether the data can be impacted or “biased” by other statistically uncontrollable factors, and whether all teachers can be evaluated in similar ways, which is an issue with “fairness.” Also at issue is “reliability,” whereby a 2014 study released by the Center for Educational Assessment at the University of Massachusetts Amherst, in which researchers examined student growth percentiles, found the “amount of random error was substantial.” Stephen Sireci, one of the study authors and UMass professor, noted that, instead of relying upon the volatile results, “You might as well [just] flip a coin.”

Damian Betebenner, a senior associate at the National Center for the Improvement of Educational Assessment Inc. in Dover, N.H. who developed the SGP model in use in Massachusetts, added that “Unfortunately, the use of student percentiles has turned into a debate for scapegoating teachers for the ills.” Isn’t this the truth, to the extent that policymakers got a hold of these statistical tools, after which they much too swiftly and carelessly singled out teachers for unmerited treatment and blame.

Regardless, and recently, stakeholders in Massachusetts lobbied the Senate to approve an amendment to the budget that would no longer require such test-based ratings in teachers’ professional evaluations, while also passing a policy statement urging the state to scrap these ratings entirely. “It remains unclear what the fate of the Senate amendment will be,” however. “The House has previously rejected a similar amendment, which means the issue would have to be resolved in a conference committee as the two sides reconcile their budget proposals in the coming weeks.”

Not surprisingly, Mitchell Chester, Massachusetts Commissioner for Elementary and Secondary Education, continues to defend the requirement. It seems that Chester, like others, is still holding tight to the default (yet still unsubstantiated) logic helping to advance these systems in the first place, arguing, “Some teachers are strong, others are not…If we are not looking at who is getting strong gains and those who are not we are missing an opportunity to upgrade teaching across the system.”

Oklahoma Eliminates VAM, and Simultaneously Increases Focus on Professional Development

Approximately two weeks ago, House leaders in the state of Oklahoma unanimously passed House Bill 2957, in which the state’s prior requirement to use value-added model (VAM) based estimates for teacher evaluation and accountability purposes, as written into the state’s prior Teacher and Leader Effectiveness (TLE) evaluation system, was eliminated. The new bill has been sent to Oklahoma’s Governor Fallin for her final signature.

As per the State’s Superintendent of Public Instruction, Joy Hofmeister: “Amid this difficult budget year when public education has faced a variety of challenges, House Bill 2957 is a true bright spot of this year’s legislative session…By giving districts the option of removing the quantitative portion of teacher evaluations, we not only increase local control but lift outcomes by supporting our teachers while strengthening their professional development and growth in the classroom.”

As per the press release issued by one of the bill’s sponsors, State Representative Michael Rogers, the bill is to “retain the qualitative measurements, which evaluate teachers based on classroom instruction and learning environment. The measure also creates a professional development component to be used as another qualitative tool in the evaluation process. The Department of Education will create the professional development component to be introduced during the 2018-2019 school year. “Local school boards are in the best position to evaluate what tools their districts should be using to evaluate teachers and administrators,” he said. “This bill returns that to our local schools and removes the ‘one-size-fits-all’ approach dictated by government bureaucrats. This puts the focus back to the education of our students where it belongs.” School districts will still have the option of continuing to use VAMs or other numerically-based student growth measures when evaluating teachers, however, if they choose to do so, and agree to also pay for the related expenses.

Oklahoma State Representative Scooter Park said that “HB2957 is a step in the right direction – driven by the support of Superintendents across the state, we can continue to remove the costly and time-consuming portions of the TLE system such as unnecessary data collection requirements as well as open the door for local school districts to develop their own qualitative evaluation system for their teachers according to their choice of a valid, reliable, research based and evidence-based qualitative measure.”

Oklahoma State Senator John Ford, added that this bill was proposed, and this decision was made, “After gathering input from a variety of stakeholders through a lengthy and thoughtful review process.”

I am happy to say that I was a contributor during this review process, presenting twice to legislators, educators, and others at the Oklahoma State Capitol this past fall. See one picture of these presentations here.

OK_Picture

See more here, and a related post on Diane Ravitch’s blog here. See here more information about the actual House Bill 2957. See also a post about Hawaii recently passing similar legislation in the blog, “Curmudgucation,” here. See another post about other states moving in similar directions here.

What ESSA Means for Teacher Evaluation and VAMs

Within a prior post, I wrote in some detail about what the Every Student Succeeds Act (ESSA) means for the U.S., as well as states’ teacher evaluation systems as per the federally mandated adoption and use of growth and value-added models (VAMs) across the U.S., after President Obama signed it into law in December.

Diane Ravitch recently covered, in her own words, what ESSA means for teacher evaluations systems as well, in what she called Part II of a nine Part series on all key sections of ESSA (see Parts I-IX here). I thought Part II was important to share with you all, especially given this particular post captures that in which followers of this blog are most interested, although I do recommend that you all also see what the ESSA means for other areas of educational progress and reform in terms of the Common Core, teacher education, charter schools, etc. in her Parts I-IX.

Here is what she captured in her Part II post, however, copied and pasted here from her original post:

The stakes attached to testing: will teachers be evaluated by test scores, as Duncan demanded and as the American Statistical Association rejected? Will teachers be fired because of ratings based on test scores?

Short Answer:

The federal mandate on teacher evaluation linked to test scores, as created in the waivers, is eliminated in ESSA.

States are allowed to use federal funds to continue these programs, if they choose, or completely change their strategy, but they will no longer be required to include these policies as a condition of receiving federal funds. In fact, the Secretary is explicitly prohibited from mandating any aspect of a teacher evaluation system, or mandating a state conduct the evaluation altogether, in section 1111(e)(1)(B)(iii)(IX) and (X), section 2101(e), and section 8401(d)(3) of the new law.

Long Answer:

Chairman Alexander has been a long advocate of the concept, as he calls it, of “paying teachers more for teaching well.” As governor of Tennessee he created the first teacher evaluation system in the nation, and believes to this day that the “Holy Grail” of education reform is finding fair ways to pay teachers more for teaching well.

But he opposed the idea of creating or continuing a federal mandate and requiring states to follow a Washington-based model of how to establish these types of systems.

Teacher evaluation is complicated work and the last thing local school districts and states need is to send their evaluation system to Washington, D.C., to see if a bureaucrat in Washington thinks they got it right.

ESSA ends the waiver requirements on August 2016 so states or districts that choose to end their teacher evaluation system may. Otherwise, states can make changes to their teacher evaluation systems, or start over and start a new system. The decision is left to states and school districts to work out.

The law does continue a separate, competitive funding program, the Teacher and School Leader Incentive Fund, to allow states, school districts, or non-profits or for-profits in partnership with a state or school district to apply for competitive grants to implement teacher evaluation systems to see if the country can learn more about effective and fair ways of linking student performance to teacher performance.

Victory in New Mexico’s Lawsuit, Again

My most recent post about the state of New Mexico (here) included an explanation of a New Mexico Judge’s ruling to postpone New Mexico’s state-wide teacher evaluation trial until October 2016, with the state’s December 2015 preliminary injunction (described here) in place until (at least) then.

New Mexico’s Public Education Department (PED) recently, however, also tried to appeal the Judge’s October 2016 injunction, and took it to New Mexico’s Court of Appeals for an emergency review of the Judge’s injunction order.

The state and its PED lost, again. Here is the court order, which essentially says that the appeal was denied, and pasted below is the press release, released by the American Federation of Teachers New Mexico and Albuquerque Teachers Federation (i.e., the plaintiffs in this case).

Also here is an article just released in the Santa Fe New Mexican about this ruling, also about how the “Appeals court reject[ed the state’s] request to intervene in [this] teacher evaluation case.”

PRESS RELEASE, FOR IMMEDIATE RELEASE

Court Denies Request from Public Education Department; Keeps Case in District Court

March 16, 2016

Contact: John Dyrcz
505-554-8679

Albuquerque – American Federation of Teachers New Mexico (AFT NM) President Stephanie Ly and Albuquerque Teachers Federation (ATF) President Ellen Bernstein released the following statement:

“We are not surprised by today’s decision of the New Mexico Court of Appeals denying the New Mexico Public Education Department’s request for an interlocutory – or emergency – review of District Court Judge David Thomson’s injunction order. The December 2015 injunction preventing the PED from using its faulty evaluation system to penalize educators was well reasoned and the product of a fair and lengthy series of hearings over four months.

“We have maintained throughout this process that while the PED has every right to pursue all legal options under our judicial system, these frequent attempts at disrupting the progress of this case are nothing more than an attempt to stall the momentum of our efforts to seek relief for New Mexico’s education community.

“With this order, the case returns to Judge Thomson for final testimony from our expert witnesses, and we are pleased that the temporary injunction granted in December of 2015 will remain in place until at least October of 2016, when AFT NM and ATF will seek to make the injunction permanent,” said Ly and Bernstein.

VAMs: A Global Perspective by Tore Sørensen

Tore Bernt Sørensen is a PhD student currently studying at the University of Bristol in England, he is an emerging global educational policy scholar, and he is a future colleague whom I am to meet this summer during an internationally-situated talk on VAMs. Just last week he released a paper published by Education International (Belgium) in which he discusses VAMs, and their use(s) globally. It is rare that I read or have the opportunities to write about what is happening with VAMs worldwide; hence, I am taking this opportunity to share with you all some of the global highlights from his article. I have also attached his article to this post here for those of you who want to give the full document a thorough read (see also the article’s full reference below).

First is that the US is “leading” the world in terms of its adoption of VAMs as an educational policy tool. While I did know this prior given my prior attempts to explore what was happening in the universe of VAMs outside of the US, as per Sørensen, our nation’s ranking in this case is still in place. In fact, “in the US the use of VAM as a policy instrument to evaluate schools and teachers has been taken exceptionally far [emphasis added] in the last 5 years, [while] most other high-income countries remain [relatively more] cautious towards the use of VAM;” this, “as reflected in OECD [Organisation for Economic Co-operation and Development] reports on the [VAM] policy instrument” (p. 1).

The second country most exceptionally using VAMs, so far, is England. Their national school inspection system in England, run by England’s Office for Standards in Education, Children’s Services and Skills (OFSTED), for example, now has VAM as its central standard and accountability indicator.

These two nations are the most invested in VAMs, thus far, primarily because they have similar histories with the school effectiveness movement that emerged in the 1970s. In addition, both countries are also both highly engaged in what Pasi Sahlberg in his 2011 book Finnish Lessons termed the Global Educational Reform Movement (GERM). GERM, in place since the 1980s, has “radically altered education sectors throughout the world with an
 agenda of evidence-based policy based on the [same] school effectiveness paradigm…[as it]…combines the centralised formulation of objectives and standards, and [the] monitoring of data, with the decentralisation to schools concerning decisions around how they seek to meet standards and maximise performance in their day-to-day running” (p. 5).

“The Chilean education system has [also recently] been subject to one of the more radical variants of GERM and there is [now] an interest [also there] in calculating VAM scores for teachers” (p. 6). In  Denmark and Sweden state authorities have begun to compare predicted versus actual performance of schools, not teachers, while taking into consideration “the contextual factors of parents’ educational background, gender, and student origin” (i.e, “context value added”) (p. 7). In Uganda and Delhi, in “partnership” with an England based, international school development company ARK, they are looking to gear up their data systems so they can run VAM trials and analyses to assess their schools’ effects, and also likely continue to scale up and out.

The US-based World Bank is also backing such international moves, as is the US-based Pearson testing corporation via its Learning Curve Project, which is relying on the input from some of the most prominent VAM advocates including Eric Hanushek (see prior posts on Hanushek here and here) and Raj Chetty (see prior posts on Chetty here and here) to promote itself as a player in the universe of VAMs. This makes sense, “[c]onsidering Pearson’s aspirations to be a global education company… particularly in low-income countries” (p. 7). On that note, also as per Sørensen, “education systems in low-income countries might prove [most] vulnerable in the coming years as international donors and for-profit enterprises appear to be endorsing VAM as a means to raise school and teacher quality” in such educationally struggling nations (p. 2).

See also a related blog post about Sørensen’s piece here, as written by him on the Education in Crisis blog, which is also sponsored by Education International. In this piece he also discusses the use of data for political purposes, as is too often the case with VAMs when “the use of statistical tools as policy instruments is taken too far…towards bounded rationality in education policy.”

In short, “VAM, if it has any use at all, must expose the misleading use of statistical mumbo jumbo that effectively #VAMboozles [thanks for the shout out!!] teachers, schools and society. This could help to spark some much needed reflection on the basic propositions of school effectiveness, the negative effects of putting too much trust in numbers, and lead us to start holding policy-makers to account for their misuse of data in policy formation.

Reference: Sørensen, T. B. (2016). Value-added measurement or modelling (VAM). Brussels, Belgium: Education International. Retrieved from http://download.ei-ie.org/Docs/WebDepot/2016_EI_VAM_EN_final_Web.pdf

Tennessee’s Trout/Taylor Value-Added Lawsuit Dismissed

As you may recall, one of 15 important lawsuits pertaining to teacher value-added estimates across the nation (Florida n=2, Louisiana n=1, Nevada n=1, New Mexico n=4, New York n=3, Tennessee n=3, and Texas n=1 – see more information here) was situated in Knox County, Tennessee.

Filed in February of 2015, with legal support provided by the Tennessee Education Association (TEA), Knox County teacher Lisa Trout and Mark Taylor charged that they were denied monetary bonuses after their Tennessee Value-Added Assessment System (TVAAS — the original Education Value-Added Assessment System (EVAAS)) teacher-level value-added scores were miscalculated. This lawsuit was also to contest the reasonableness, rationality, and arbitrariness of the TVAAS system, as per its intended and actual uses in this case, but also in Tennessee writ large. On this case, Jesse Rothstein (University of California – Berkeley) and I were serving as the Plaintiffs’ expert witnesses.

Unfortunately, however, last week (February 17, 2016) the Plaintiffs’ team received a Court order written by U.S. District Judge Harry S. Mattice Jr. dismissing their claims. While the Court had substantial questions about the reliability and validity of the TVAAS, the Court determined that the State satisfied the very low threshold of the “rational basis test,” at legal issue. I should note here, however, that all of the evidence that the lawyers for the Plaintiffs collected via their “extensive discovery,” including the affidavits both Jesse and I submitted on Plaintiffs’ behalves, were unfortunately not considered in Judge Mattice’s motion to dismiss. This, perhaps, makes sense given some of the assertions made by the Court, forthcoming.

Ultimately, the Court found that the TVAAS-based, teacher-level value-added policy at issue was “rationally related to a legitimate government interest.” As per the Court order itself, Judge Mattice wrote that “While the court expresses no opinion as to whether the Tennessee Legislature has enacted sound public policy, it finds that the use of TVAAS as a means to measure teacher efficacy survives minimal constitutional scrutiny. If this policy proves to be unworkable in practice, plaintiffs are not to be vindicated by judicial intervention but rather by democratic process.”

Otherwise, as per an article in the Knoxville News Sentinel, Judge Mattice was “not unsympathetic to the teachers’ claims,” for example, given the TVAAS measures “student growth — not teacher performance — using an algorithm that is not fail proof.” He inversely noted, however, in the Court order that the “TVAAS algorithms have been validated for their accuracy in measuring a teacher’s effect on student growth,” even if minimal. He also wrote that the test scores used in the TVAAS (and other models) “need not be validated for measuring teacher effectiveness merely because they are used as an input in a validated statistical model that measures teacher effectiveness.” This is, unfortunately, untrue. Nonetheless, he continued to write that even though the rational basis test “might be a blunt tool, a rational policymaker could conclude that TVAAS is ‘capable of measuring some marginal impact that teachers can have on their own students…[and t]his is all the Constitution requires.”

In the end, Judge Mattice concluded in the Court order that, overall, “It bears repeating that Plaintiff’s concerns about the statistical imprecision of TVAAS are not unfounded. In addressing Plaintiffs’ constitutional claims, however, the Court’s role is extremely limited. The judiciary is not empowered to second-guess the wisdom of the Tennessee legislature’s approach to solving the problems facing public education, but rather must determine whether the policy at issue is rationally related to a legitimate government interest.”

It is too early to know whether the prosecution team will appeal, although Judge Mattice dismissed the federal constitutional claims within the lawsuit “with prejudice.” As per an article in the Knoxville News Sentinel, this means that “it cannot be resurrected with new facts or legal claims or in another court. His decision can be appealed, though, to the 6th Circuit U.S. Court of Appeals.”