One Score and Seven Policy Iterations Ago…

I just read what might be one of the best articles I’ve read in a long time on using test scores to measure teacher effectiveness, and why this is such a bad idea. Not surprisingly, unfortunately, this article was written 20 years ago (i.e., 1986) by – Edward Haertel, National Academy of Education member and recently retired Professor at Stanford University. If the name sounds familiar, it should as Professor Emeritus Haertel is one of the best on the topic of, and history behind VAMs (see prior posts about his related scholarship here, here, and here). To access the full article, please scroll to the reference at the bottom of this post.

Heartel wrote this article when at the time policymakers were, like they still are now, trying to hold teachers accountable for their students’ learning as measured on states’ standardized test scores. Although this article deals with minimum competency tests, which were in policy fashion at the time, about seven policy iterations ago, the contents of the article still have much relevance given where we are today — investing in “new and improved” Common Core tests and still riding on unsinkable beliefs that this is the way to reform the schools that have been in despair and (still) in need of major repair since 20+ years ago.

Here are some of the points I found of most “value:”

  • On isolating teacher effects: “Inferring teacher competence from test scores requires the isolation of teaching effects from other major influences on student test performance,” while “the task is to support an interpretation of student test performance as reflecting teacher competence by providing evidence against plausible rival hypotheses or interpretation.” While “student achievement depends on multiple factors, many of which are out of the teacher’s control,” and many of which cannot and likely never will be able to be “controlled.” In terms of home supports, “students enjoy varying levels of out-of-school support for learning. Not only may parental support and expectations influence student motivation and effort, but some parents may share directly in the task of instruction itself, reading with children, for example, or assisting them with homework.” In terms of school supports, “[s]choolwide learning climate refers to the host of factors that make a school more than a collection of self-contained classrooms. Where the principal is a strong instructional leader; where schoolwide policies on attendance, drug use, and discipline are consistently enforced; where the dominant peer culture is achievement-oriented; and where the school is actively supported by parents and the community.” This, all, makes isolating the teacher effect nearly if not wholly impossible.
  • On the difficulties with defining the teacher effect: “Does it include homework? Does it include self-directed study initiated by the student? How about tutoring by a parent or an older sister or brother? For present purposes, instruction logically refers to whatever the teacher being evaluated is responsible for, but there are degrees of responsibility, and it is often shared. If a teacher informs parents of a student’s learning difficulties and they arrange for private tutoring, is the teacher responsible for the student’s improvement? Suppose the teacher merely gives the student low marks, the student informs her parents, and they arrange for a tutor? Should teachers be credited with inspiring a student’s independent study of school subjects? There is no time to dwell on these difficulties; others lie ahead. Recognizing that some ambiguity remains, it may suffice to define instruction as any learning activity directed by the teacher, including homework….The question also must be confronted of what knowledge counts as achievement. The math teacher who digresses into lectures on beekeeping may be effective in communicating information, but for purposes of teacher evaluation the learning outcomes will not match those of a colleague who sticks to quadratic equations.” Much if not all of this cannot and likely never will be able to be “controlled” or “factored” in or our, as well.
  • On standardized tests: The best of standardized tests will (likely) always be too imperfect and not up to the teacher evaluation task, no matter the extent to which they are pitched as “new and improved.” While it might appear that these “problem[s] could be solved with better tests,” they cannot. Ultimately, all that these tests provide is “a sample of student performance. The inference that this performance reflects educational achievement [not to mention teacher effectiveness] is probabilistic [emphasis added], and is only justified under certain conditions.” Likewise, these tests “measure only a subset of important learning objectives, and if teachers are rated on their students’ attainment of just those outcomes, instruction of unmeasured objectives [is also] slighted.” Like it was then as it still is today, “it has become a commonplace that standardized student achievement tests are ill-suited for teacher evaluation.”
  • On the multiple choice formats of such tests: “[A] multiple-choice item remains a recognition task, in which the problem is to find the best of a small number of predetermined alternatives and the cri- teria for comparing the alternatives are well defined. The nonacademic situations where school learning is ultimately ap- plied rarely present problems in this neat, closed form. Discovery and definition of the problem itself and production of a variety of solutions are called for, not selection among a set of fixed alternatives.”
  • On students and the scores they are to contribute to the teacher evaluation formula: “Students varying in their readiness to profit from instruction are said to differ in aptitude. Not only general cognitive abilities, but relevant prior instruction, motivation, and specific inter- actions of these and other learner characteristics with features of the curriculum and instruction will affect academic growth.” In other words, one cannot simply assume all students will learn or grow at the same rate with the same teacher. Rather, they will learn at different rates given their aptitudes, their “readiness to profit from instruction,” the teachers’ instruction, and sometimes despite the teachers’ instruction or what the teacher teaches.
  • And on the formative nature of such tests, as it was then: “Teachers rarely consult standardized test results except, perhaps, for initial grouping or placement of students, and they believe that the tests are of more value to school or district administrators than to themselves.”

Sound familiar?

Reference: Haertel, E. (1986). The valid use of student performance measures for teacher evaluation. Educational Evaluation and Policy Analysis, 8(1), 45-60.

Help Florida Teacher Luke Flint “Tell His Story” about His VAM Scores

This is a great (although unfortunate) YouTube video capturing Indian River County, Florida teacher Luke Flint’s “Story” about the VAM scores he just received from the state as based on the state’s value-added formula.

This is a must watch, and a must share, as his “Story” has potential to “add value” in the best of ways, that is, in terms of further informing debates about how these VAMs actually “work” in practice.

American Statistical Association (ASA) Position Statement on VAMs

Inside my most recent post, about the Top 14 research-based articles about VAMs, there was a great research-based statement that was released just last week by the American Statistical Association (ASA), titled the “ASA Statement on Using Value-Added Models for Educational Assessment.”

It is short, accessible, easy to understand, and hard to dispute, so I wanted to be sure nobody missed it as this is certainly a must read for all of you following this blog, not to mention everybody else dealing/working with VAMs and their related educational policies. Likewise, this represents the current, research-based evidence and thinking of probably 90% of the educational researchers and econometricians (still) conducting research in this area.

Again, the ASA is the best statistical organization in the U.S. and likely one of if not the best statistical associations in the world. Some of the most important parts of their statement, taken directly from their full statement as I see them, follow:

  1. VAMs are complex statistical models, and high-level statistical expertise is needed to
    develop the models and [emphasis added] interpret their results.
  2. Estimates from VAMs should always be accompanied by measures of precision and a discussion of the assumptions and possible limitations of the model. These limitations are particularly relevant if VAMs are used for high-stakes purposes.
  3. VAMs are generally based on standardized test scores, and do not directly measure
    potential teacher contributions toward other student outcomes.
  4. VAMs typically measure correlation, not causation: Effects – positive or negative –
    attributed to a teacher may actually be caused by other factors that are not captured in the model.
  5. Under some conditions, VAM scores and rankings can change substantially when a
    different model or test is used, and a thorough analysis should be undertaken to
    evaluate the sensitivity of estimates to different models.
  6. VAMs should be viewed within the context of quality improvement, which distinguishes aspects of quality that can be attributed to the system from those that can be attributed to individual teachers, teacher preparation programs, or schools.
  7. Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions. Ranking teachers by their VAM scores can have unintended consequences that reduce quality.
  8. Attaching too much importance to a single item of quantitative information is counter-productive—in fact, it can be detrimental to the goal of improving quality.
  9. When used appropriately, VAMs may provide quantitative information that is relevant for improving education processes…[but only if used for descriptive/description purposes]. Otherwise, using VAM scores to improve education requires that they provide meaningful information about a teacher’s ability to promote student learning…[and they just do not do this at this point, as there is no research evidence to support this ideal].
  10. A decision to use VAMs for teacher evaluations might change the way the tests are viewed and lead to changes in the school environment. For example, more classroom time might be spent on test preparation and on specific content from the test at the exclusion of content that may lead to better long-term learning gains or motivation for students. Certain schools may be hard to staff if there is a perception that it is harder for teachers to achieve good VAM scores when working in them. Overreliance on VAM scores may foster a competitive environment, discouraging collaboration and efforts to improve the educational system as a whole.

Also important to point out is that included in the report the ASA makes recommendations regarding the “key questions states and districts [yes, practitioners!] should address regarding the use of any type of VAM.” These include, although they are not limited to questions about reliability (consistency), validity, the tests on which VAM estimates are based, and the major statistical errors that always accompany VAM estimates, but are often buried and often not reported with results (i.e., in terms of confidence
intervals or standard errors).

Also important is the purpose for ASA’s statement, as written by them: “As the largest organization in the United States representing statisticians and related professionals, the American Statistical Association (ASA) is making this statement to provide guidance, given current knowledge and experience, as to what can and cannot reasonably be expected from the use of VAMs. This statement focuses on the use of VAMs for assessing teachers’ performance but the issues discussed here also apply to their use for school or principal accountability. The statement is not intended to be prescriptive. Rather, it is intended to enhance general understanding of the strengths and limitations of the results generated by VAMs and thereby encourage the informed use of these results.”

Do give the position statement a read and use it as needed!

Another Lawsuit in Tennessee

As per Diane Ravitch’s blog, “The Tennessee Education Association filed a second lawsuit against the use if value-added assessment (called TVAAS in Tennessee), this time including extremist Governor Haslam and ex-TFA state commissioner Huffman in their suit.

As per a more detailed post about this lawsuit, “The state’s largest association for teachers filed a second lawsuit on behalf of a Knox County teacher, calling the use of the Tennessee Value-Added Assessment System (TVAAS), which uses students’ growth on state assessments to evaluate teachers, unconstitutional.

Farragut Middle School eighth grade science teacher Mark Taylor believes he was unfairly denied a bonus after his value-added estimate was based on the standardized test scores of 22 of his 142 students. “Mr. Taylor teaches four upper-level physical science courses and one regular eighth grade science class,” said Richard Colbert, TEA general counsel, in a press release. “The students in the upper-level course take a locally developed end-of-course test in place of the state’s TCAP assessment. As a result, those high-performing students were not included in Mr. Taylor’s TVAAS estimate.”

Taylor received ‘exceeding expectations’ classroom observation scores, but a low value-added estimate reduced his final evaluation score below the requirement to receive the bonus.

The lawsuit includes six counts against the governor, commissioner and local school board.

TEA’s general counsel argues the state has violated Taylor’s 14th Amendment right to equal protection from “irrational state-imposed classifications” by using a small fraction of his students to determine his overall effectiveness.

TEA filed its first TVAAS lawsuit last month on behalf of Knox County teacher Lisa Trout, who was denied the district’s bonus. The lawsuit also cites the arbitrariness of TVAAS estimates that use test results of only a small segment of a teacher’s students to estimate her overall effectiveness.

TEA says it expects additional lawsuits to be filed so long as the state continues to tie more high-stakes decisions to TVAAS estimates.”

A VAM Shame, again, from Florida

Another teacher from Florida wrote a blog post for Diane Ravitch, and I just came across it and am re-posting it here. Be sure to give it a good read as you will see that what is happening in her state right now and why it is a VAM shame!

She writes:

I conducted a very unscientific study and concluded that I might possibly have the worst VAM score at my school. Today I conducted a slightly more scientific analysis and now I can confidently proclaim myself to be the worst teacher at my school, the 14th worst teacher in Dade County, and the 146th worst (out of 120,000) in the state of Florida! There were 4,800 pages of teachers ranked highest to lowest on the Florida Times Union website and my VAM was on page 4,795. Gosh damn! That’s a bad VAM!  I always feared I might end up at the low end of the spectrum due to the fact that I teach gifted students that score high already and have no room to grow, but 146th out of 120,000?!?! That’s not “needs improvement.” That’s “you really stink and should immediately have your teaching license revoked before you do anymore harm to innocent children” bad. That’s, “your odds are so bad you better hope you don’t get eaten by a shark or struck by lightening” bad.  This is the reason I don’t play the lotto or gamble in Vegas. And to think some other Florida teacher had the nerve to write a blog post declaring herself to be one of the worst teachers in the state and her VAM was only -3%! Negative 3 percent is the best you got honey? I’ll meet your negative 3 percent and raise you another negative 146 percentage points! (Actually I enjoyed her blog post [see also our coverage of this teacher’s story here] and I hope more teachers come out of their VAM closets soon).

Speaking of coming out of the VAM closet, I managed to hunt down the emails of about ten other bottom dwellers as posted by the Florida Times Union. I was attempting to conduct a minor survey of what types of teachers end up getting slammed by VAM. Did they have anything in common? What types of students did they teach? As of this moment, none of them have returned my emails. I really wanted to get in touch with “The Worst Teacher in the State of Florida” according to VAM. After a little cyber stalking, it turns out she’s my teaching twin. She also teaches ninth grade world history to gifted students in a preIB program.  The runner up for “Worst Teacher in the State of Florida” teaches at an arts magnet school.  Are we really to believe that teachers selected to teach in an IB program or magnet school are the very worst the state of Florida has to offer? Let me tell you a little something about teaching gifted students. They are the first kids to nark out a bad teacher because they don’t think anyone is good enough to teach them. First they’ll let you know to your face that they’re smarter than you and you stink at teaching. Then they’ll tell their parents and the gifted guidance counselor who will nark you out to the Principal. If you suck as a gifted teacher, you won’t last long.

I don’t want to ignore the poor teachers that get slammed by VAM on the opposite end of the spectrum either. Although there appeared to be many teachers of high achievers who scored poorly under VAM, there also seemed to be an abundance of special education teachers as well.  These poor educators are often teaching children with horrible disabilities who will never show any learning gains on a standardized test. Do we really want to create a system that penalizes and fires the teachers whose positions we struggle the hardest to fill? Is it any wonder that teachers who teach the very top performers and teachers who teach the lowest performers would come out looking the worst in an algorithm measuring learning gains? I suck at math and this was immediately obvious to me.

Another interesting fact garnered from my amateur and cursory analysis of Florida VAM data is the fact that high school teachers overwhelming populated the bottom of the VAM rankings. Of the 148 teachers who scored lower than me, 136 were high school teachers. Ten were middle school teachers, and only two elementary school teachers.  All of this directly contradicts the testimony of Ms. Kathy Hebda, Deputy Chancellor for Educator Quality, in front of the Florida lawmakers last year regarding the Florida VAM.

“Hebda presented charts to the House K-12 Education Subcommittee that show almost zero correlation between teachers’ evaluation scores and the percentages of their students who are poor, nonwhite, gifted, disabled or English language learners. Teachers similarly didn’t get any advantage or disadvantage based on what grade levels they teach.

“Those things didn’t seem to factor in,” Hebda said. “You can’t tell for a teacher’s classroom by the way the value-added scores turned out whether she had zero percent students on free and reduced price lunch or 100 percent.”

Hebda’s 2013 testimony in two public hearings was intended to assure policymakers that everything was just swell with VAM as an affirmation that the merit pay provision of the 2011 Student Success Act (SB736) was going to be ready for prime time in the scheduled 2015 roll-out. No wonder the FLDOE didn’t want actual VAM date released as data completely contradicts Hebda’s assurances that “the model did its job.”

I certainly have been a little disappointed with the media coverage of the FLDOE losing its lawsuit and being forced to release Florida teacher VAM data this week.  The Florida Times Union considers this data to be a treasure trove of information but they haven’t dug very deep into the data they fought so hard to procure. The Miami Herald barely acknowledged that anything noteworthy happened in education news this week.  You would think some other journalist would have thought to cover a story about “The Worst Teacher in Florida.” I write this blog to cover teacher stories that major media outlets don’t seem interested in telling (that, and I am trying to stave off early dementia while on maternity leave).  One journalist bothered to dig up the true story behind the top ten teachers in Florida. But no one has bothered telling the stories of the bottom ten. Those are the teachers who are most likely to be fired and have their teaching licenses revoked by the state. Let those stories be told. Let the public see what kinds of teachers they are at risk of losing to this absurd excuse of an “objective measure of teacher effectiveness” before it’s too late.

Why VAMs & Merit Pay Aren’t Fair

An “oldie” (i.e., published about one year ago), but a goodie! This one is already posted in the video gallery of this site, but it recently came up again as a good, short-at-three minutes, video version, that captures some of the main issues.
Check it out and share as (so) needed!

Six Reasons Why VAMs and Merit Pay Aren’t Fair