Correction: Make the “Top 13” VAM Articles the “Top 14”

As per my most recent post earlier today, about the Top 13 research-based articles about VAMs, low and behold another great research-based statement was just this week released by the American Statistical Association (ASA), titled the “ASA Statement on Using Value-Added Models for Educational Assessment.”

So, let’s make the Top 13 the Top 14 and call it a day. I say “day” deliberately; this is such a hot and controversial topic it is often hard to keep up with the literature in this area, on literally a daily basis.

As per this outstanding statement released by the ASA – the best statistical organization in the U.S. and one of if not the best statistical associations in the world – some of the most important parts of their statement, taken directly from their full statement as I see them, follow:

  1. VAMs are complex statistical models, and high-level statistical expertise is needed to
    develop the models and [emphasis added] interpret their results.
  2. Estimates from VAMs should always be accompanied by measures of precision and a discussion of the assumptions and possible limitations of the model. These limitations are particularly relevant if VAMs are used for high-stakes purposes.
  3. VAMs are generally based on standardized test scores, and do not directly measure
    potential teacher contributions toward other student outcomes.
  4. VAMs typically measure correlation, not causation: Effects – positive or negative –
    attributed to a teacher may actually be caused by other factors that are not captured in the model.
  5. Under some conditions, VAM scores and rankings can change substantially when a
    different model or test is used, and a thorough analysis should be undertaken to
    evaluate the sensitivity of estimates to different models.
  6. VAMs should be viewed within the context of quality improvement, which distinguishes aspects of quality that can be attributed to the system from those that can be attributed to individual teachers, teacher preparation programs, or schools.
  7. Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions. Ranking teachers by their VAM scores can have unintended consequences that reduce quality.
  8. Attaching too much importance to a single item of quantitative information is counter-productive—in fact, it can be detrimental to the goal of improving quality.
  9. When used appropriately, VAMs may provide quantitative information that is relevant for improving education processes…[but only if used for descriptive/description purposes]. Otherwise, using VAM scores to improve education requires that they provide meaningful information about a teacher’s ability to promote student learning…[and they just do not do this at this point, as there is no research evidence to support this ideal].
  10. A decision to use VAMs for teacher evaluations might change the way the tests are viewed and lead to changes in the school environment. For example, more classr
    oom time might be spent on test preparation and on specific content from the test at the exclusion of content that may lead to better long-term learning gains or motivation for students. Certain schools may be hard to staff if there is a perception that it is harder for teachers to achieve good VAM scores when working in them. Overreliance on VAM scores may foster a competitive environment, discouraging collaboration and efforts to improve the educational system as a whole.

Also important to point out is that included in the report the ASA makes recommendations regarding the “key questions states and districts [yes, practitioners!] should address regarding the use of any type of VAM.” These include, although they are not limited to questions about reliability (consistency), validity, the tests on which VAM estimates are based, and the major statistical errors that always accompany VAM estimates, but are often buried and often not reported with results (i.e., in terms of confidence
intervals or standard errors).

Also important is the purpose for ASA’s statement, as written by them: “As the largest organization in the United States representing statisticians and related professionals, the American Statistical Association (ASA) is making this statement to provide guidance, given current knowledge and experience, as to what can and cannot reasonably be expected from the use of VAMs. This statement focuses on the use of VAMs for assessing teachers’ performance but the issues discussed here also apply to their use for school or principal accountability. The statement is not intended to be prescriptive. Rather, it is intended to enhance general understanding of the strengths and limitations of the results generated by VAMs and thereby encourage the informed use of these results.”

If you’re going to choose one article to read and review, this week or this month, and one that is thorough and to the key points, this is the one I recommend you read…at least for now!

“Evaluation Systems that are Byzantine at Best and At Worst, Draconian”

As per the New Year, Valerie Strauss at The Washington Post recently released the top 11 education-related “The Answer Sheet’s” articles of the year, the top article focused on a letter to the Post explaining why he finally decided to leave the teaching “profession.” For multiple reasons, he writes of both his sadness and discontent, and most pertinently here, given the nature of this blog, he writes (as highlighted in the title of this post): “Evaluation Systems that are Byzantine at Best and At Worst, Draconian”

He writes: “My profession is being demeaned by a pervasive atmosphere of distrust, dictating that teachers cannot be permitted to develop and administer their own quizzes and tests (now titled as generic “assessments”) or grade their own students’ examinations. The development of plans, choice of lessons and the materials to be employed are increasingly expected to be common [i.e., the common core] to all teachers in a given subject. This approach not only strangles creativity, it smothers the development of critical thinking in our students and assumes a one-size-fits-all mentality more appropriate to the assembly line than to the classroom [i.e., value-added and its inputs and outputs].”

He continues: “[D]ata driven” education seeks only conformity, standardization, testing and a zombie-like adherence to the shallow and generic Common Core…Creativity, academic freedom, teacher autonomy, experimentation and innovation are being stifled in a misguided effort to fix what is not broken in our system of public education.”

He then concludes: “After writing all of this I realize that I am not leaving my profession, in truth, it has left me. It no longer exists.”

Take a read for yourself, as there is much more to read not directly related to the teacher evaluation systems he protests. And perhaps take a read, with a New Year’s resolution to help this not [continue to] happen to others.

 

 

 

Stanford Professor, Dr. Edward Haertel, on VAMs

In a recent speech and subsequent paper written by Dr. Edward Haertel – National Academy of Education member and Professor at Stanford University – he writes about VAMs and the extent to which VAMs, being based on student test scores, can be used to make reliable and valid inferences about teachers and teacher effectiveness. This is a must-read, particularly for those out there who are new to the research literature in this area. Dr. Haertel is certainly an expert here, actually one of the best we have, and in this piece he captures the major issues well.

Some of the issues highlighted include concerns about the tests used to model value-added and how their scales (falsely assumed to be as objective and equal as units on a measuring stick) complicate and distort VAM-based estimates. He also discusses the general issues with the tests almost if not always used when modeling value-added (i.e., the state-level tests mandated as per No Child Left Behind in 2002).

He discusses why VAM estimates are least trustworthy, and most volatile and error prone, when used to compare teachers who work in very different schools with very different student populations – students who do not attend schools in randomized patterns and who are rarely if ever randomly assigned to classrooms. The issues with bias, as highlighted by Dr. Haertel and also in a recent VAMboozled! post with a link to a new research article here, are probably the most major VAM-related, problems/issues going. As captured in his words, “VAMs will not simply reward or penalize teachers according to how well or poorly they teach. They will also reward or penalize teachers according to which students they teach and which schools they teach in” (Haertel, 2013, p. 12-13).

He reiterates issues with reliability, or a lack thereof. As per one research study he cites, researchers found that “a minimum of 10% of the teachers in the bottom fifth of the distribution one year were in the top fifth the next year, and conversely. Typically, only about a third of 1 year’s top performers were in the top category again the following year, and likewise, only about a third of 1 year’s lowest performers were in the lowest category again the following year. These findings are typical [emphasis added]…[While a] few studies have found reliabilities around .5 or a little higher…this still says that only half the variation in these value-added estimates is signal, and the remainder is noise [and/or error, which makes VAM estimates entirely invalid about half of the time]” (Haertel, 2013, p. 18).

Dr. Haertel also discusses other correlations among VAM estimates and teacher observational scores, VAM estimates and student evaluation scores, and VAM estimates taken from the same teachers at the same time but using different tests, all of which also yield abysmally (and unfortunately) low correlations, similar to those mentioned above.

His bottom line? “VAMs are complicated, but not nearly so complicated as the reality they are intended to represent” (Haertel, 2013, p. 12). They just do not measure well what so many believe they measure so very well.

Again, to find out more reasons and more in-depth explanations as to why, click here for the full speech and subsequent paper.

VAMmunition

While “Top Ten Lists” have become a recurrent trend in periodicals, magazines, blogs, and the like, one “Top Ten List,” presented here, should satisfy readers’ of this blog and hopefully other educators’ needs for VAMmunition, or rather, ammunition practitioners need to protect themselves against the unfair implementation and use of VAMs (i.e., value-added models).

Likewise, as “Top Ten Lists” typically serve reductionistic purposes, in the sense that they often reduce highly complex phenomenon into easy-to-understand, easy-to-interpret, and easy-to-use strings of information, this approach is more than suitable here whereas those who are trying to ward off the unfair implementation and use of VAMs do not have the VAMmunition they need to defend themselves in research-based ways. Hopefully this list will satisfy at least some of these needs.

Accordingly, I present here the “Top Ten Bits of VAMmunition” research-based reasons, listed in no particular order, that all public school educators should be able to use to defend themselves against VAMs.

  1. VAM estimates should not be used to assess teacher effectiveness. The standardized achievement tests on which VAM estimates are based, have always been, and continue to be, developed to assess levels of student achievement and not levels growth in student achievement nor growth in achievement that can be attributed to teacher effectiveness. The tests on which VAM estimates are based (among other issues) were never designed to estimate teachers’ causal effects.
  2. VAM estimates are often unreliable. Teachers who should be (more or less) consistently effective are being classified in sometimes highly inconsistent ways over time. A teacher classified as “adding value” has a 25 to 50% chance of being classified as “subtracting value” the following year(s), and vice versa. This sometimes makes the probability of a teacher being identified as effective no different than the flip of a coin.
  3. VAM estimates are often invalid. Without adequate reliability, as reliability is a qualifying condition for validity, valid VAM-based interpretations are even more difficult to defend. Likewise, very limited evidence exists to support that teachers who post high- or low-value added scores are effective using at least one other correlated criterion (e.g., teacher observational scores, teacher satisfaction surveys). The correlations being demonstrated across studies are not nearly high enough to support valid interpretation or use.
  4. VAM estimates can be biased. Teachers of certain students who are almost never randomly assigned to classrooms have more difficulties demonstrating value-added than their comparably effective peers. Estimates for teachers who teach inordinate proportions of English Language Learners (ELLs), special education students, students who receive free or reduced lunches, and students retained in grade, are more adversely impacted by bias. While bias can present itself in terms of reliability (e.g., when teachers post consistently high or low levels of value-added over time), the illusion of consistency can sometimes be due, rather, to teachers being consistently assigned more homogenous sets of students.
  5. Related, VAM estimates are fraught with measurement errors that negate their levels of reliability and validity, and contribute to issues of bias. These errors are caused by inordinate amounts of inaccurate or missing data that cannot be easily replaced or disregarded; variables that cannot be statistically “controlled for;” differential summer learning gains and losses and prior teachers’ residual effects that also cannot be “controlled for;” the effects of teaching in non-traditional, non-isolated, and non-insular classrooms; and the like.
  6. VAM estimates are unfair. Issues of fairness arise when test-based indicators and their inference-based uses impact some more than others in consequential ways. With VAMs, only teachers of mathematics and reading/language arts with pre and post-test data in certain grade levels (e.g., grades 3-8) are typically being held accountable. Across the nation, this is leaving approximately 60-70% of teachers, including entire campuses of teachers (e.g., early elementary and high school teachers), as VAM-ineligible.
  7. VAM estimates are non-transparent. Estimates must be made transparent in order to be understood, so that they can ultimately be used to “inform” change and progress in “[in]formative” ways. However, the teachers and administrators who are to use VAM estimates accordingly do not typically understand the VAMs or VAM estimates being used to evaluate them, particularly enough so to promote such change.
  8. Related, VAM estimates are typically of no informative, formative, or instructional value. No research to date suggests that VAM-use has improved teachers’ instruction or student learning and achievement.
  9. VAM estimates are being used inappropriately to make consequential decisions. VAM estimates do not have enough consistency, accuracy, or depth to satisfy that which VAMs are increasingly being tasked, for example, to help make high-stakes decisions about whether teachers receive merit pay, are rewarded/denied tenure, or are retained or inversely terminated. While proponents argue that because of VAMs’ imperfections, VAM estimates should not be used in isolation of other indicators, the fact of the matter is that VAMs are so imperfect they should not be used for much of anything unless largely imperfect decisions are desired.
  10. The unintended consequences of VAM use are continuously going unrecognized, although research suggests they continue to exits. For example, teachers are choosing not to teach certain students, including those who teachers deem as the most likely to hinder their potentials to demonstrate value-added. Principals are stacking classes to make sure certain teachers are more likely to demonstrate “value-added,” or vice versa, to protect or penalize certain teachers, respectively. Teachers are leaving/refusing assignments to grades in which VAM-based estimates matter most, and some teachers are leaving teaching altogether out of discontent or in protest. About the seriousness of these and other unintended consequences, weighed against VAMs’ intended consequences or the lack thereof, proponents and others simply do not seem to give a VAM.