Florida’s Released Albeit “Flawed” VAM Data

The Florida Times-Union’s Lead Op-Ed Letter on Monday was about why the value-added data recently released by the Florida Department of Education has, at best, made it “clearer than ever that the data is meaningless,” made even more unfortunate by the nearly four years and millions of dollars (including human resource dollars) spent on perfecting the state’s VAM and its advanced-as-accurate estimates.

In the letter, Andy Ford who is the current president of the Florida Education Association (yes, the union), writes, in sum and among other key points:

  • “The lists released by the DOE are a confusing mess.”
  • “Throughout the state, teachers who have been honored as among the best in their districts received low VAM marks.”
  • “Band teachers, physical education teachers and guidance counselors received VAM ratings despite not teaching subjects that are tested.”
  • “Teachers who worked at their schools for a few weeks received VAM scores as did teachers who retired three years ago.”
  • “A given teacher may appear to have differential effectiveness from class to class, from year to year and from test to test. Ratings are most unstable at the upper and lower ends where the ratings are most likely to be used to determine high or low levels of effectiveness…Most researchers agree that VAM is not appropriate as a primary measure for evaluating individual teachers. Reviews of research on value-added methods have concluded that they are too unstable and too vulnerable to many sources of error to be used for teacher evaluation.”

“Once again the state of Florida has proven that it puts test scores above everything else in public education. And once again it provided false data that misleads more than informs…When will our political leaders and the DOE stop chasing these flawed data models and begin listening to the teachers, education staff professionals, administrators and parents of Florida?”

The union “fully supports teacher accountability. But assessments of teachers, like assessments of students, must be valid, transparent and multi-faceted. These value-added model calculations are none of these.”

VAMboozled! in Indianapolis

Last week I was in Indianapolis for the annual conference of the American Association of Colleges of Teacher Education (AACTE) during which Diane Ravitch gave the keynote speech for the conference’s Welcoming Session. While the video of her live talk is not yet posted online, she talked mainly about the many hoaxes surrounding America’s public education system, all of which were framed by her main question that she revealed from underneath her jacket a t-shirt on which it was written: “Where’s the evidence?”

The key part of her speech, as most relevant here but captured by another author at Education News, went as follows:

In a far-reaching speech in which [Diane Ravitch] lambasted a variety of players on the educational landscape — from self-styled education “reformers” to Teach for America to U.S. Secretary of Education Arne Duncan — Ravitch said American schoolchildren are being shortchanged by an inordinate emphasis on testing. She took particular aim at teacher evaluation models known as value-added models, or VAMs, that seek to measure teacher effectiveness by the test scores of the students they teach. “For nearly five years, states and districts have been trying to evaluate teachers by test scores, and it hasn’t worked anywhere,” Ravitch said. “It makes testing too important, promotes teaching to the test, and gaming of the system. “We know all know this. But the policymakers don’t. Where’s the evidence? It’s a hoax.

She also gave a “shout-out” to our VAMboozled! encouraging audience members to follow this blog and what we, yes we are doing on this blog as I cannot do this without the support of those closest to me but also those of you from across the country who continue to write in with their stories.

On that note, one of my former doctoral students – Taryl Hansen – who was sitting in the audience at the time Diane gave scholarly praise, was also inspired by Diane’s words. An artist, she drafted this thereafter and sent it to me via email, after which I asked for her permission to share out with the rest of you, as a wonderful visual image of many of “the issues” discussed in Diane’s speech and also on this blog.

Note the VAMpire references in the picture?…the VAM/empire, in other words. This, too, will soon be an article or post in the future, I’m sure.

VAMboozled

 

 

 

Research Study: Missing Data and VAM-Based Bias

A new Assistant Professor here at ASU, from outside the College of Education but in the College of Mathematical and Natural Sciences also specializes in value-added modeling (and statistics). Her name is Jennifer Broatch, she is a rising star in this area of research, and she just sent me an article I missed, just read, and certainly found worth sharing with you all.

The peer-reviewed article, published in Statistics and Public Policy this past November, is fully cited and linked below so that you all can read it in full. But in terms of its CliffsNotes version, researchers evidenced the following two key findings:

First, researchers found that, “VAMs that include shorter test score histories perform fairly well compared to those with longer score histories.” The current thinking is that we need at least two if not three years of data to yield reliable estimates, or estimates that are consistent over time (which they should be). These authors argue that with three years of data the amount of data that go missing are not worth shooting for that target. Rather, again they argue, this is an issue of trade-offs. This is certainly something to consider, as long as we continue to understand that all of this is about “tinkering towards a utopia” (Tyack & Cuban, 1997) that I’m not at all certain exists in terms of VAMs and VAM-based accuracy.

Second, researchers found that, “the decision about whether to control for student covariates [or background/demographic variables] and schooling environments, and how to control for this information, influences [emphasis added] which types of schools and teachers are identified as top and bottom performers. Models that are less aggressive in controlling for student characteristics and schooling environments systematically identify schools and teachers that serve more advantaged students as providing the most value-added, and correspondingly, schools and teachers that serve more disadvantaged students as providing the least.”

This certainly adds evidence to the research on VAM-based bias. While there are many researchers who still claim that controlling for student background variables is unnecessary when using VAMs, and if anything bad practice because controlling for such demographics causes perverse effects (e.g., if teachers focus relatively less on such students who are given such statistical accommodations or boosts), this study adds more evidence that “to not control” for such demographics does indeed yield biased estimates. The authors do not disclose, however, how much bias is still “left over” after the controls are used; hence, this is still a very serious point of contention. Whether the controls, even if used, function appropriately is still something to be taken in earnest, particularly when consequential decisions are to be tied to VAM-based output (see also “The Random Assignment of Students into Elementary Classrooms: Implications for Value-Added Analyses and Interpretations”).

Citation: Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. (2013, November). The sensitivity of value-added estimates to specification adjustments: Evidence from school- and teacher-level models in Missouri. Statistics and Public Policy, 1(1), 19-27. doi: 10.1080/2330443X.2013.856152

The Houston Chronicle, also on the TVAAS in Tennessee

The Houston Chronicle also recently published an article on the Tennessee Value-Added Assessment System (TVAAS) and what is happening and still ongoing in Tennessee. This is certainly interesting and worth a read, particularly because the Houston Independent School District is using essentially the same value-added model (i.e., the EVAAS) critiqued in this piece about Tennessee.

From the top of the article: “When Tennessee was competing for a half-billion dollars in federal education money, teachers agreed to allow the state to ramp up its use of student test scores for evaluating educators. But since winning the $500 million Race to the Top competition in 2010, teachers say the state has gone too far in using student test scores to assess their performance. They are now calling for legislation to place a moratorium on the use of so-called TVA[A]S scores until a special committee can review them.

Maybe what is being argued and debated in Tennessee will have some carry over effects in Houston as well. We shall see.

What is also interesting to point out, though, is another interesting trend. As explained by a teacher in the article: “She said she’s actually benefited from changes to the teacher evaluation system, such as more constructive feedback because of the increased number of observations.” Almost always it is the case when a counterpoint is needed for an article such as this, that a teacher says they see “value” in the system, but almost if not every time it is because of the increased professional observations of teacher practice, not the value-added component or the value-added data derived. The “formative” or “informative” aspects of these systems have yet to be realized.

Please Refrain from “Think[ing] of VAMs Like an Oak Tree”

It happened again. In the Tampa Bay Times a journalist encouraged his readers to, as per the title of his article, “Think of VAMs Like an Oak Tree” as folks in Florida are now beginning to interpret and consume Florida teachers’ “value-added” data. It even seems that folks there are “pass[ing] around the University of Wisconsin’s ‘[O]ak [T]ree [A]nalogy,” to help others understand, unfortunately, what is a very over-simplistic and overoptimistic version of the very complex realities surrounding VAMs.

He, and others, obviously missed the memo.

So, I am redirecting current and future readers to Stanford Professor Edward Haertel’s deconstruction of the “Oak Tree Analogy,” so that we all might better spread the word about this faulty analogy.

I have also re-pasted Professor Haertel’s critique below:

The Value-Added Research Center’s ‘Oak Tree’ analogy is helpful in conveying the theory [emphasis added] behind value-added models. To compare the two gardeners, we adjust away various influences that are out of the gardeners’ control, and then, as with value added, we just assume that whatever is left over must have been due to the gardener.  But, we can draw some important lessons from this analogy in addition to those highlighted in the presentation.

In the illustration, the overall effect of rainfall was an 8-inch difference in annual growth (+3 inches for one gardener’s location; -5 for the other). Effects of soil and temperature, in one direction or the other, were 5 inches and 13 inches. But the estimated effect of the gardeners themselves was only a 4-inch difference. 

As with teaching, the value-added model must sort out a small “signal” from a much larger amount of “noise” in estimating the effects of interest. It follows that the answer obtained may depend critically on just what influences are adjusted for. Why adjust for soil condition? Couldn’t a skillful gardener aerate the soil or amend it with fertilizer? If we adjust only for rainfall and temperature then Gardener B wins. If we add in the soil adjustment, then Gardener A wins. Teasing apart precisely those factors for which teachers justifiably should be held accountable versus those beyond their control may be well-nigh impossible, and if some adjustments are left out, the results will change. 

Another message comes from the focus on oak tree height as the outcome variable.  The savvy gardener might improve the height measure by removing lower limbs to force growth in just one direction, just as the savvy teacher might improve standardized test scores by focusing instruction narrowly on tested content. If there are stakes attached to these gardener comparisons, the oak trees may suffer.

The oak tree height analogy also highlights another point. Think about the problem of measuring the exact height of a tree—not a little sketch on a PowerPoint slide, but a real tree. How confidently could you say how tall it was to the nearest inch?  Where, exactly, would you put your tape measure? Would you measure to the topmost branch, the topmost twig, or the topmost leaf? On a sunny day, or at a time when the leaves and branches were heavy with rain?

The oak tree analogy does not discuss measurement error. But one of the most profound limitations of value-added models, when used for individual decision making, is their degree of error, referred to technically as low reliability. Simply put, if we compare the same two gardeners again next year, it’s anyone’s guess which of the two will come out ahead.”

A Tennessee Teacher, On the TVAAS and Other Issues of Concern

Check out this 5-minute video to hear from a teacher in Tennessee – the state recognized for bringing to the country value-added models and VAM-based teacher accountability – as she explains who things are going in her state of Tennessee.

Diane Ravitch, in her call to all of us to share out this and other videos/stories such as these, writes that we should help this video, now with over 100,000 views, reach every parent and teacher across the country. “We can be the change,” and social media can help us counter the nonsense expressed so well herein.