US Secretary of Education Duncan “Loves Him Some VAM Sauce”

Share Button

US Secretary of Education “Arne [Duncan] loves him some VAM sauce, and it is a love that simply refuses to die,” writes Peter Greene in a recent Huffington Post post. Duncan’s (simple-mind) loves it because, indeed, the plan is (too) overly simplistic. All that the plan requires are two simple ingredients: “1) A standardized test that reliably and validly measures how much students know 2) A super-sciency math algorithm that will reliably and validly strip out all influences except that of the teacher.”

Sticking with the cooking metaphor, however, Green writes “VAM is no spring chicken, and perhaps when it was fresh and young some affection for it could be justified. After all, lots of folks, including non-reformy folks, like the idea of recognizing and rewarding teachers for being excellent. But how would we identify these pillars of excellence? That was the puzzler for ages until VAM jumped up to say, “We can do it! With Science!!” We’ll give some tests and then use super-sciency math to filter out every influence that’s Not a Teacher and we’ll know exactly how much learnin’ that teacher poured into that kid.”

“Unfortunately, we don’t have either,” and we likely never will. Why this is the case is also highlighted in this post, with Greene explicitly citing three main sources for support: the recent oppositional statement released by the National Association of Secondary School Principals, the oppositional statement released this past summer by the American Statistical Association, and our mostly-oppositional blog Vamboozled! (by Audrey Amrein-Beardsley). Hopefully getting the research into the hands of educational practitioners, school board members, the general public, and the like is indeed “adding value” in the purest sense of this phrase’s meaning. I sure hope so!

Anyhow, in this post Greene also illustrates and references a nice visual (with a side of sarcasm) explaining the complexity behind VAMs in pretty clear terms. I also paste this illustration here, which Greene references as originally coming from a blog post from Daniel Katz, Ph.D. but I have seen similar versions elsewhere and prior (e.g., a New York Times article here).


Greene ultimately asks why Duncan is still staying so fixated on a policy, disproportionally loaded and ever-increasingly rejected and unsupported?

Greene’s answer: ‘[I]f Duncan were to admit that his beloved VAM is a useless tool…then all his other favorite [reform-based] programs would collapse” around him…Why do we give the Big Test? To measure teacher effectiveness. How do we rank and evaluate our schools? By looking at teacher effectiveness. How do we find the teachers that we are going to move around so that every classroom has a great teacher? With teacher effectiveness ratings. How do we institute merit pay and a career ladder? By looking at teacher effectiveness. How do we evaluate every single program instituted in any school? By checking to see how it affects teacher effectiveness. How do we prove that centralized planning (such as Common Core) is working? By looking at teacher effectiveness. How do we prove that corporate involvement at every stage is a Good Thing? By looking at teacher effectiveness. And by “teacher effectiveness,” we always mean VAM (because we [i.e., far-removed educational reformers] don’t know any other way, at all).”

If Duncan’s “magic VAM sauce, is a sham and a delusion and a big bowl of nothing,” his career would literally fold in.

To read more from Greene, do click here to read his post in full.

Share Button

On Rating The Effectiveness of Colleges of Education Using VAMs

Share Button

A VAMboozled! follower – Terry Ward (El Paso, TX), a retired writer and statistician married to a veteran and also current Title I music teacher – sent this to me via email after reading a recent article in The New York Times. The article, released around Thanksgiving, was about how the “U.S. Wants Teacher Training Programs to Track How [College of Education] Graduates’ Students Perform.” This, of course, using in part value-added models (VAMs).

I thought it important to share with you all, what Terry wrote in response, with his permission:

It has recently been proposed that colleges of education be rated and evaluated on the basis of how the students of their graduates perform on standardized tests.  As, they say, the devil, however, is in the details. Let’s look at how this might work in the case of my wife — a teacher of some 40+ years experience who graduated from a college of education over 40 years ago.

Problem 1: Standardized student scores are problematic enough for individual teachers. Remember, the American Statistical Association (ASA) estimates that the teacher influence explains somewhere between one to fourteen percent of the score variation in test scores. The college is even further removed from the student taking the test, so the question becomes how small the college’s contribution must be?

Problem 2: What is the decay function for college influence? Simply put, my wife graduated with her initial teaching degree over forty years ago. Any influence of the college upon her teaching is, therefore, minimal to non-existent. One assumes such an influence fades over time, so what is the shape for this decay and how will the US Department of Education (DOE) evaluators measure it? What is the half-life of educational influence?

Problem 3: If we assume that the easiest year to measure college influence is the first year of teaching, how might the DOE extract college of education factors from the basic issues of first-year inexperience in real-world teaching?

Problem 4: What happens when additional schooling is factored in? My wife has a Master’s degree. Does that influence her teaching and how is the DOE evaluation to split between her very distant B.A. degree (40 years ago) and the slightly more recent Master’s degree (30 years ago)?

Problem 5: What of non-degree but still credentialed education? My wife is also a graduate of a National Writing Project summer writing camp with graduate hours in writing and writing pedagogy. Who is to get credit if this has (also) improved her students’ test scores? And, how is the DOE to determine or tell who is to get their appropriate credit?

Problem 6: When a teacher changes schools (perhaps to teach impoverished youth), her student scores are likely to change dramatically. Does the DOE propose to re-evaluate such a teacher’s college experience and downgrade them as well given where a teacher decides to teach?

I am sure the reader can come up with other absurd problems with the DOE proposal. I am simply reminded of the old saying that “for every complex problem, there exists a good sounding simple solution that is completely wrong!” This certainly seems to be the case here.

Share Button

Rethinking Value-Added Models (VAMs): A (Short) YouTube Version

Share Button

Following up on a recent post about a recently released review of my book “Rethinking Value-Added Models in Education: Critical Perspectives on Tests and Assessment-Based Accountability,” I thought it important to share with you all a condensed, video- and cartoon-based version of the (very general) points highlighted within my book.

This YouTube video, also titled “Rethinking Value-Added Models in Education,” was created by one of my former doctoral students and one of the most amazing artists I know – Dr. Taryl Hargens.

Do give it a watch, and of course feel free to share out with others!!

Share Button

VAM Updates from An Important State to Watch: Tennessee

Share Button

The state of Tennessee, the state in which our beloved education-based VAMs were born (see here and here), has been one state we have been following closely on VAMboozled! throughout the last year’s blog posts.

Throughout this period we have heard from concerned administrators and teachers in Tennessee (see, for example, here and here). We have written about how something called subject area bias also exists, unheard of in the VAM-related literature until a Tennessee administrator sent us a lead, and we analyzed Tennessee’s data (see here and here, and also an article also written by my graduate student and now Dr. Jessica Holloway-Libell forthcoming in the esteemed Teachers College Record). We followed closely the rise and recent fall of the career of Tennessee’s Education Commissioner Kevin Huffman (see, for example, here and here, respectively). And we have watched how the Tennessee Board of Education and other leaders in the state have met, attempted to rescind, and actually rescinded some of the policy requirements that tie teachers’ to their VAM scores, again as determined by teachers’ students’ performance as calculated by the familiar Tennessee Education Value-Added Assessment System (TVAAS), and its all-too-familiar mother-ship, the Education Value-Added Assessment System (EVAAS).

Now, following the (in many ways celebrated) exit of Commissioner Huffman, it seems the state is taking an even more reasonable stance towards VAMs and their use(s) for teacher accountability…at least for now.

As per a recent article in the Tennessee Education Report (see also an article in The Tennessean here) Governor Bill Haslam announced this week that “he will be proposing changes to the state’s teacher evaluation process in the 2015 legislative session,” the most significant change being “to reduce the weight of value-added data on teacher evaluations during the transition [emphasis added] to a new test for Tennessee students.” New tests are to be developed in 2016, which is unlikely to be part of the Common Core, and rather significantly informed by teachers in consultation with Measurement Inc.

Anyhow, as per Governor Haslam’s press release (as also cited in this article), he intends to do the following three things:

  1. Adjust the weighting of student growth data in a teacher’s evaluation so that the new state assessments in ELA [English/language arts] and math will count 10 percent of the overall evaluation in the first year of administration (2016), 20 percent in year two (2017) and 35 percent in year three (2018). Currently 35 percent of an educator’s evaluation is comprised of student achievement data based on student growth;
  2. Lower the weight of student achievement growth for teachers in non-tested grades and subjects from 25 percent to 15 percent;
  3. And make explicit local school district discretion in both the qualitative teacher evaluation model that is used for the observation portion of the evaluation as well as the specific weight student achievement growth in evaluations will play in personnel
    decisions made by the district.

Obviously, the latter two points (i.e., #2 and #3) demonstrate steps in the right direction: #2 to be a bit more reasonable about whether teachers who don’t actually teach students in the subject areas should be held accountable for out-of-subject scores (although I’d vote for 0% weight here) and #3 to handover to districts more local discretion and control over how their teacher evaluations are conducted (although VAMs still must be a part).

I’m less optimistic about the first intended change, however, as “the proposal does not go as far as some have proposed” (e.g., the American Statistical Association (ASA) as per some of their key points in their position statement on VAMs). This first change still supports what is still a “heroic” assumption that VAMs do work, and in this case will get better over time (i.e., 2016, 2017, 2018) with “new and improved tests,” so that the weights in place now (i.e., 35%) might be more appropriate then, and hence reached, or just reached regardless of appropriateness at that time…

Share Button

The Nation’s High School Principals (Working) Position Statement on VAMs

Share Button

The Board of Directors of the National Association of Secondary School Principals (NASSP) officially released a working position announcement on VAMs, that was also recently referenced in an article in Education Week here (“Principals’ Group Latest to Criticize ‘Value Added’ for Teacher Evaluations“) and a Washington Post post here (“Principals Reject ‘Value-Added’ Assessment that Links Test Scores to Educators’ Jobs“).

I have pasted this statement below, but also link to it here as well. The position’s highlights follow, as also summarized in the above links and the position statement itself:

  • “[T]est-score-based algorithms for measuring teacher quality aren’t appropriate.”
  • “[T]he timing for using [VAMs] comes at a a terrible time, just as schools adjust to demands from the Common Core State Standards and other difficult new expectations for K-12 students.”
  • “Principals are concerned that the new evaluation systems are eroding trust and are detrimental to building a culture of collaboration and continuous improvement necessary to successfully raise student performance to college and career-ready levels.”
  • “Value-added systems, the statement concludes, should be used to measure school improvement and help determine the effectivness of some programs and instructional methods; they could even be used to tailor professional development. But they shouldn’t be used to make “key personel decisions” about individual teachers.”
  • “[P]rincipals often don’t use value-added data even where it exists, largely because a lot of them don’t trust it.”
  • The position statement also quotes Mel Riddile, a former National Principal of the Year and chief architect of the NASSP statement, who says: “We are using value-added measurement in a way that the science does not yet support. We have to make it very clear to policymakers that using a flawed measurement both misrepresents student growth and does a disservice to the educators who live the work each day.”

See also two other great blog posts re: the potential impact the NASSP’s working statement might/should have, also, on America’s current VAM-situation. The first external post comes from the blog “curmudgucation” and discusses in great detail the highlights of the NASSP’s post. The second external post comes from a guest post on Diane Ravitch’s blog.

Below, again, is the full post as per the website of the NASSP:


To determine the efficacy of the use of data from student test scores, particularly in the form of Value-Added Measures (VAMs), to evaluate and to make key personnel decisions about classroom teachers.


Currently, a number of states either are adopting or have adopted new or revamped teacher evaluation systems, which are based in part on data from student test scores in the form of value-added measures (VAM). Some states mandate that up to fifty percent of the teacher evaluation must be based on data from student test scores. States and school districts are using the evaluation systems to make key personnel decisions about retention, dismissal and compensation of teachers and principals.

At the same time, states have also adopted and are implementing new, more rigorous college- and career standards. These new standards are intended to raise the bar from having every student earn a high school diploma to the much more ambitious goal of having every student be on-target for success in post-secondary education and training.

The assessments accompanying these new standards depart from the old, much less expensive, multiple-choice style tests to assessments, which include constructed responses. These new assessments demand higher-order thinking and up to a two-year increase in expected reading and writing skills. Not surprisingly, the newness of the assessments combined with increased rigor has resulted in significant drops in the number of students reaching “proficient” levels on assessments aligned to the new standards.

Herein lies the challenge for principals and school leaders. New teacher evaluation systems demand the inclusion of student data at a time when scores on new assessments are dropping. The fears accompanying any new evaluation system have been magnified by the inclusion of data that will get worse before it gets better. Principals are concerned that the new evaluation systems are eroding trust and are detrimental to building a culture of collaboration and continuous improvement necessary to successfully raise student performance to college and career-ready levels.

Specific question have arisen about using value-added measurement (VAM) to retain, dismiss, and compensate teachers. VAMs are statistical measures of student growth. They employ complex algorithms to figure out how much teachers contribute to their students’ learning, holding constant factors such as demographics. And so, at first glance, it would appear reasonable to use VAMs to gauge teacher effectiveness. Unfortunately, policy makers have acted on that impression over the consistent objections of researchers who have cautioned against this inappropriate use of VAM.

In a 2014 report, the American Statistical Association urged states and school districts against using VAM systems to make personnel decisions. A statement accompanying the report pointed out the following:

  • “VAMs are generally based on standardized test scores, and do not directly measure potential teacher contributions toward other student outcomes.
  • VAMs typically measure correlation, not causation: Effects – positive or negative – attributed to a teacher may actually be caused by other factors that are not captured in the model.
  • Under some conditions, VAM scores and rankings can change substantially when a different model or test is used, and a thorough analysis should be undertaken to evaluate the sensitivity of estimates to different models.
  • VAMs should be viewed within the context of quality improvement, which distinguishes aspects of quality that can be attributed to the system from those that can be attributed to individual teachers, teacher preparation programs, or schools.
  • Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions. Ranking teachers by their VAM scores can have unintended consequences that reduce quality.”

Another peer-reviewed study funded by the Gates Foundation and published by the American Educational Research Association (AERA) stated emphatically, “Value-Added Performance Measures Do Not Reflect the Content or Quality of Teachers’ Instruction.” The study found that “state tests and these measures of evaluating teachers don’t really seem to be associated with the things we think of as defining good teaching.” It further found that some teachers who were highly rated on student surveys, classroom observations by principals and other indicators of quality had students who scored poorly on tests. The opposite also was true. “We need to slow down or ease off completely for the stakes for teachers, at least in the first few years, so we can get a sense of what do these things measure, what does it mean,” the researchers admonished. “We’re moving these systems forward way ahead of the science in terms of the quality of the measures.”

Researcher Bruce Baker cautions against using VAMs even when test scores count less than fifty percent of a teacher’s final evaluation.  Using VAM estimates in a parallel weighting system with other measures like student surveys and principal observations “requires that VAM be considered even in the presence of a likely false positive. NY legislation prohibits a teacher from being rated highly if their test-based effectiveness estimate is low. Further, where VAM estimates vary more than other components, they will quite often be the tipping point – nearly 100% of the decision even if only 20% of the weight.”

Stanford’s Edward Haertel takes the objection for using VAMs for personnel decisions one step further: “Teacher VAM scores should emphatically not be included as a substantial factor with a fixed weight in consequential teacher personnel decisions. The information they provide is simply not good enough to use in that way. It is not just that the information is noisy. Much more serious is the fact that the scores may be systematically biased for some teachers and against others, and major potential sources of bias stem from the way our school system is organized. No statistical manipulation can assure fair comparisons of teachers working in very different schools, with very different students, under very different conditions.”

Still other researchers believe that VAM is flawed at its very foundation. Linda Darling-Hammond et al. point out that the use of test scores via VAMs assumes “that student learning is measured by a given test, is influenced by the teacher alone, and is independent from the growth of classmates and other aspects of the classroom context. None of these assumptions is well supported by current evidence.” Other factors including class size, instructional time, home support, peer culture, summer learning loss impact student achievement. Darling-Hammond points out that VAMs are inconsistent from class to class and year to year. VAMs are based on the false assumption that students are randomly assigned to teachers. VAMs cannot account for the fact that “some teachers may be more effective at some forms of instruction…and less effective in others.”

Guiding Principles

  • As instructional leader, “the principal’s role is to lead the school’s teachers in a process of learning to improve teaching, while learning alongside them about what works and what doesn’t.”
  • The teacher evaluation system should aid the principal in creating a collaborative culture of continuous learning and incremental improvement in teaching and learning.
  • Assessment for learning is critical to continuous improvement of teachers.
  • Data from student test scores should be used by schools to move students to mastery and a deep conceptual understanding of key concepts as well as to inform instruction, target remediation, and to focus review efforts.
  • NASSP supports recommendations for the use of “multiple measures” to evaluate teachers as indicated in the 2014 “Standards for Educational and Psychological Testing” measurement standards released by leading professional organizations in the area of educational measurement, including the American Educational Research Association (AERA), American Psychological Association (APA), and National Council on Measurement in Education (NCME).


  • Successful teacher evaluation systems should employ “multiple classroom observations across the year by expert evaluators looking to multiple sources of data, and they provide meaningful feedback to teachers.”
  • Districts and States should encourage the use of Peer Assistance and Review (PAR) programs, which use expert mentor teachers supporting novice teachers and struggling veteran teachers have been proven to be an effective system for improving instruction.
  • States and Districts should allow the use of teacher-constructed portfolios of student learning, which are being successfully used as a part of teacher evaluation systems in a number of jurisdictions.
  • VAMs should be used by principals to measure school improvement and to determine the effectiveness of programs and instructional methods.
  • VAMs should be used by principals to target professional development initiatives.
  • VAMs should not to be used to make key personnel decisions about individual teachers.
  • States and Districts should provide ongoing training for Principals in the appropriate use student data and VAMs.
  • States and Districts should make student data and VAMs available to principals at a time when decisions about school programs are being made.
  • States and Districts should provide resources and time principals need in order to make the best use of data.
Share Button

A (Great) Review of My Book on Value-Added Models (VAMs)…by Rachael Gabriel

Share Button

My book, “Rethinking Value-Added Models in Education: Critical Perspectives on Tests and Assessment-Based Accountability,” was just externally reviewed by Rachael Gabriel, an Assistant Professor at the University of Connecticut, in Education Review: A Multilingual Journal of Book Reviews. To read the (fantastic!!) review, click here or read what I’ve pasted below directly from Dr. Gabriel’s review. For those still interested in the book, you can order it on Amazon here. To see also other reviews, click here.


Dr. Amrein-Beardsley’s recent book, Rethinking Value-Added Models in Education (Routledge, 2014), is the single most comprehensive resource on the uses and abuses of Value-added Measurement (VAM) in U.S. education policy. As the centerpiece of several new generation teacher evaluation policies, VAM has been the subject of a firestorm of media attention, position statements, journal articles, policy briefs and a wide range of scholarly debate.

Oddly, or perhaps thankfully, for all its dramatic heralding, only twelve states have recently adopted VAM as part of their teacher evaluation or compensation policies – due in no small part to the criticism Amrein-Beardsley and others have made available to the voting public (e.g., Amrein-Beardsley & Collins, 2012; Baker et al., 2010; Baker, Oduwale, Greene, 2013). Though only twelve states have VAM written into their teacher evaluation policies, others use it at the district or school level, for purposes as varied as program evaluation, teacher compensation, educational research and legal challenges. Therefore, this book isn’t just for the hundreds of thousands of teachers who live and work in those twelve unlucky states (e.g., New York, Florida, Tennessee, Ohio, etc.), it’s about a measurement tool that captured America’s imagination – convincing us that we could accurately and reliably measure a teachers’ impact on their students – against her better judgment.

There is something for everyone in this text: from definitions for those with no background in statistics to thorough discussions of the reining positions and debates between researchers. It therefore serves as a primer on education policy as well as a deep dive into the specifics of VAM. The text rises far above other options for understanding VAM because it successfully combines technical detail with social activism. This not only displays the range of Amrein-Beardsley’s thinking about the subject, but also the depth of her understanding of VAM’s technical merits and social implications.

For example, Amrein-Beardsley uses the introductory chapter to establish the place of VAM in the history of social policy and recent education policy. VAM is framed as one example in a long history of social engineering experiments, in line with what she calls the “Measure & Punish (M&P) Theory of Change” that characterizes most contemporary education reform efforts. Within the introduction, Amrein-Beardsley also crafts a refreshingly unapologetic statement of her own position on the issue – linking it to her background as a mathematics teacher and educational researcher. This transparency and context allows her to relinquish claims to neutral objectivity and replace it with clearly argued, but passionate outrage, which makes the story of VAM both compelling and urgent.

Historicizing VAM and its surrounding controversies allows the reader to assume some analytic distance, though these debates are still very real and very raw. Within the 211 pages, she addresses VAM as both a statistical tool with specific features and assumptions, and a policy tool with socially constructed meanings and implications. This dual treatment, and the belief that data does not and never could “speak for itself” constructs a version of VAM in which the tool takes on a life of its own, with people and policies positioned and defined in response to its divisive construction (Gabriel & Lester, 2013). This characterization of VAM is also what makes this text so readable. VAM itself becomes a fascinating character in a larger story about education policy. Its role in recent policies, public debates and lawsuits is nothing short of operatic in quality. The sweeping nature of the story of VAM mixed with details on its technical merits creates a text that is encyclopedic in scope: with entire chapters devoted to discussions of VAM’s assumptions, reliability and biases.

Amrein-Beardsley’s emotion as a writer (sometimes anger, sometimes passion) simmers tangibly below the surface of her prose, and explodes once in a while in a long, complex, exclamation-pointed sentence. It seems that when you know as much as she does about VAM, and have to face the general ignorance and perverted rhetoric on the subject, neutrality isn’t an option. This treatment of the topic as one of urgency and consequence is not only rooted in a moral sense of outrage, but a deeply personal sense that the public and its teachers are being misled and mischaracterized. As Amrein-Beardsley points out, the fact that the public has been “VAMboozled” is not unique in the history of education or other social policy. It is, however, avoidable, if the public has access to a fuller understanding of what’s involved and what’s at stake when policies include VAM. This book represents one of many steps towards such access.

As such, each chapter read like transcripts of a Ted Talk, with multi-part lists (e.g., “the eight things that are now evident about VAM”), explained with tight examples and logic checks along the way. The use of multi-part lists, extensive citations and the added feature of a list of “top ten assertions” at the end of each chapter allow readers to pick and choose the sections of greatest interest. This is important because the combined depth and breadth of information about VAM can make some chapters feel skimmable. For example, those who are most interested in how sorting bias occurs, are not likely to need the explicit definitions of terms like reliability and validity. Those who are interested in the history of VAM may not need the context in the larger scheme of education reform, or the history of government involvement in social services.   Still, since so many arguments about VAM are based on popular, vague and/or mixed understandings of its foundational concepts and tools of logic, this thorough treatment is a welcome addition to the literature. The end-of-chapter assertion boxes also demonstrate Amrein-Beardsley’s unique ability to lay it all out there: the good, bad and the normally obfuscated “truth” about VAM.

The crowning feature of this book is its final chapter, in which alternatives and solutions to VAM are presented. Though Amrein-Beardsley buys into the current fascination with using “multiple measures” for determining teacher effectiveness, she is quick to point out that combining the strengths and weaknesses of imperfect indicators “does not increase overall levels of reliability and validity” (p. 211). Instead, she advocates for the strategic pursuit of face validity, in the absence of any plausible construct validity, by suggesting that individuals’ professional judgment and local definitions of effectiveness be weighed as heavily as statistical markers. Her unique solution, one that has not yet been empirically validated or even widely discussed in policy circles, is essentially a panel of supervisors and peers evaluating and rating based on local definitions of effectiveness, and informed by local and/or research-based measures. She argues that multiple local stakeholders are the most powerful tool for interpreting observable indicators effectiveness.

For those things that we cannot see – the outcomes or outputs of effective teaching (e.g., test scores, graduation rates, etc.) – both local and externally validated tools should be used. This proposal is unique in its insistence on multiplicity and its faith in the importance of human judgment. People (not a single representative of states or districts as collectives) should be responsible for selecting and designing (not one or the other) the measures of effectiveness. And people, (not observation tools or even single observers) should be responsible for discussing and analyzing the inputs and processes of effectiveness in teaching.

By way of a summary of this suggested solution, Amrein-Beardsley writes:

This solution does not rely solely on mathematics and the allure of numbers or grandeur of objectivity that too often comes along with numerical representation, especially in the social sciences. This solution does not trust the test scores too often (and wrongly) used to assess teacher quality, simply because the test output is already available (and paid for) and these data can be represented numerically, mathematically, and hence objectively. This solution does not marginalize human judgment, but rather embraces human judgment for what it is worth, as positioned and operationalized within a more professional, democratically-based, and sound system of judgment, decision-making, and support.

Current teacher evaluation policies that claim to rely on “multiple measures of effectiveness” may still only require one observer’s opinion, and the combination of one-of-each type of measure for student achievement, student growth and/or other outcomes. This combination of single probes stands in contrast Amrein-Beardsley’s proposed combination of multiple probes for each aspect of an evaluation (inputs, processes, outputs).

Also unique to Amrein-Beardsley’s solution is the faith in individual teachers and supervisors to act as professionals – a faith that many researchers question as study after study points out the weak correlations between principal ratings and student achievement scores. For the most part, media outlets and even researchers have chalked up discrepancies to human error – blaming individual judgment for any discrepancy between observation scores and VAM ratings.   Far from being surprised or discouraged by these discrepancies, Amrein-Beardsley suggests just the opposite: that VAM ratings are the data point that needs revision and oversight, and the professional judgment of educators is what should be held as the gold standard.

This faith that individuals act in the best interest of students and the profession is also reflected in how Amrein-Beardsley recounts the stories of Houston area teachers who were fired based on VAM ratings. Their value as professionals is never called into question. Rather the value of VAM as a policy tool is put on trial, and it loses miserably. This respect for teachers and belief in people-not-numbers pervades other examples and arguments across the eight chapters. It is clear that Amrein-Beardsley is writing to an audience she believes in, and one that deserves to be equipped with all the facts.

The depth and breadth that characterizes this text is in itself a testament to Amrein-Beardsley’s belief in the power of people in plural. She sets out to define and explain a topic that is known for being shrouded in political spectacle (Gabriel & Allington, 2011). Despite the incredible volume of reporting and editorializing VAM has seen in the news, she examines still-unexamined assumptions at length, assuming that there are those in the world who want and need to have them explained. In other words, she dares to believe that people want and need to know the whole story, and that knowing this could change something about how we view teachers, testing, measurement and policy.

Finally, this belief in people is underscored by the book’s dedication page. A page in the front matter shows a picture of the Cambodian orphanage to which proceeds from the book will be donated. If nothing else good comes of VAM’s ignominious presence in education research literature, it is good that we have the whole story laid out here. And it is good that what we invest in reading and sharing this text will mean something beyond its audience of US readers. From Cambodia to Houston, this text represents one more step in a mission to bring a collective intelligence, compassion and truth to bear when issues policy become threats to equity, despite our better judgment.

  • Amrein-Beardsley, A. & Collins, C. (2012). The SAS Education Value-Added Assessment System (SAS® EVAAS®) in the Houston Independent School District (HISD): Intended and unintended consequences. Education Policy Analysis Archives, 20(12).  Retrieved from
  • Baker, E., Barton, P., Darling-Hammond, L., Haertel, E., Ladd, H., Linn, R., et al. (2010). Problems with the Use of student test scores to evaluate teachers. Washington, DC: Economic Policy Institute. Retrieved from
  • Baker, B.; Oluwole, J.; Greene, P. (2013). The legal consequences of mandating high stakes decisions based on low quality information: Teacher evaluation in the race-to-the-top era. Education Policy Analysis Archives, 21. Retrieved from:
  • Gabriel, R., & Lester, J. N. (2013) The romance quest of education reform: A discourse analysis of The LA Times’ reports on value-added measurement teacher effectiveness. Teacher’s College Record, 115(12). Retrieved from
  • Gabriel, R. & Allington, R (April, 2011). Teacher effectiveness research and the spectacle of effectiveness policy. Paper presented at the annual convention of the American Educational Research Association (AERA), New Orleans, LA.
Share Button

The 24 Articles Published about VAMs in All American Educational Research Association (AERA) Journals

Share Button

For some time now on VAMboozled!, we have made available a set of reading lists for you all to read and consume as you wish. These lists include what I consider to be the “Top 15″ suggested research articles (here), the “Top 25″ suggested research articles (here), all suggested research articles, books, etc. (here), and also, as pertinent for this post, a list of all 24 VAM articles ever published in all peer-reviewed journals sponsored by the esteemed American Educational Research Association (AERA) here.

It seems this idea has caught on…

AERA recently released their list of articles, with links to said articles, here. However, they only referenced all VAM articles published in AERA journals since 2009. VAMboozled!’s list of 24 (again here) includes all articles ever published in AERA journals without a time constraint, given the first articles published on this topic were first released in 2003.

Here’s how AERA justified their post: “Over the past decade, the use of value‐added models (VAM) in teacher and administrator evaluation has grown nationally, while becoming one of education’s most controversial issues. Research evidence on the reliability and validity of VAM, and the consequences of using such indicators in educator evaluation, is still accumulating. In recent years, AERA’s journals have examined many aspects of VAM…” and these articles are, again, published here.

Enjoy (or not) should you take the time to peruse.

Share Button

A New Paradigm for Accountability

Share Button

Diane Ravitch recently published in the Huffington Post a really nice piece about what she views as a much better paradigm for accountability — one based on much better indicators than large scale standardized test scores. This does indeed offer a much better and much more positive and supportive accountability alternative to that with which we have been “dealing” for the last, really, 30 years.

The key components of this new paradigm, as taken from the full post titled, “A New Paradigm for Accountability: The Joy of Learning,” are pasted below. Although I would recommend giving this article a full read, instead or in addition, as the way Diane frames her reasoning around this list is also important to understand. Click here to see the full article on the Huffington Post website. Otherwise, here’s her paradigm:

The new accountability system would be called No Child Left Out. The measures would be these:

  • How many children had the opportunity to learn to play a musical instrument?
  • How many children had the chance to play in the school band or orchestra?
  • How many children participated in singing, either individually or in the chorus or a glee club or other group?
  • How many public performances did the school offer?
  • How many children participated in dramatics?
  • How many children produced documentaries or videos?
  • How many children engaged in science experiments? How many started a project in science and completed it?
  • How many children learned robotics?
  • How many children wrote stories of more than five pages, whether fiction or nonfiction?
  • How often did children have the chance to draw, paint, make videos, or sculpt?
  • How many children wrote poetry? Short stories? Novels? History research papers?
  • How many children performed service in their community to help others?
  • How many children were encouraged to design an invention or to redesign a common item?
  • How many students wrote research papers on historical topics?

Can you imagine an accountability system whose purpose is to encourage and recognize creativity, imagination, originality, and innovation? Isn’t this what we need more of?

Well, you can make up your own metrics, but you get the idea. Setting expectations in the arts, in literature, in science, in history, and in civics can change the nature of schooling. It would require far more work and self-discipline than test prep for a test that is soon forgotten.

My paradigm would dramatically change schools from Gradgrind academies to halls of joy and inspiration, where creativity, self-discipline, and inspiration are nurtured, honored, and valued.

This is only a start. Add your own ideas. The sky is the limit. Surely we can do better than this era of soul-crushing standardized testing.

Share Button

“Dear Teacher, You Are Not the Most Important Thing in the Universe”

Share Button
Gene Glass (Regents’ Professor Emeritus from ASU) just posted this post below this morning, here, and I thought it important to share will all of you.
The Arizona Republic has a very conservative Editorial Board for a very conservative newspaper in a very conservative state. So when they address the subject of teacher preparation, it’s no surprise that they parrot folk wisdom about schools and teachers.In addressing Arne Duncan’s new guidelines on teachers colleges, the Editorial Board strikes its closing notes by perpetrating one of the more pernicious myths about teachers and schools.

Plenty of research has come to a common-sense conclusion: Nothing is more important to the success of a student than a highly qualified teacher. But we don’t have enough of them, nor will we as long as teacher colleges are not held accountable.

Now that’s a statement that packs a big load of deceit into just 43 words. First, it’s highly doubtful that the Arizona Republic Editorial Board has made itself familiar with “plenty of research” about education. Second, in their review of “plenty of research,” apparently their faith in the ability of test scores to hold teachers colleges “accountable” was never shaken?* But worst of all is the repeat of that tired wheeze that nothing is more important than a teacher.What makes the All-Important-Teacher myth so pernicious is that teachers themselves occasionally and the general public usually take it as a compliment when in fact it is an attack on teacher tenure and professional autonomy.The facts of the matter are that teachers are not the most important thing determining what a child gets out of school. What a child brings to school is much more important. Jim Coleman showed this in 1966 in Equality of Educational Opportunity, and though he softened his position slightly in 1972 when he accorded a bit more important to schooling that he had 6 years prior, out-of-school influences remained dominant in determining how much kids learned during their years in school. Parents, home and neighborhood conditions, physical health, language use and language complexity in the home, whether the student lives in a psychologically and physically healthy environment with access to competent medical care, access to books, games and activities that prepare the student for school, and even genetic endowment can greatly contribute to or restrict a child’s development. What walks in the door on Day #1 has more to do with what leaves on Day #2340 (180 X 13) than what transpires during the few hours of students’ lives that they are in the classroom, attentive, and capable of absorbing what that teacher is talking about.

Teachers are wonderful human beings. For many children, teachers are the most caring and competent individual whom they will encounter during their lifetime. But teachers cannot undo the damage inflicted on youngsters by a society in which nearly half of all births are to unwed mothers and in which more than 20% of children live below the poverty level (income below $23,000 for a family of 4).

So, my fellow teachers, beware. Don’t fall for the false compliment that you are so important — so important that you should be fired if your students’ test scores are lagging behind, so important that your school’s graduation rate is a moral and a civil rights issue, so important that you should be replaced by an inexperienced liberal arts major on a two-year resume building junket.

*Just take a look at Bruce Baker’s analysis of the absurdity of judging teachers by their students’ test scores.

Share Button

Surveys + Observations for Measuring Value-Added

Share Button

Following up on a recent post about the promise of Using Student Surveys to Evaluate Teachers using a more holistic definition of a teacher’s valued added, I just read a chapter written by Ronald Ferguson — the creator of the Tripod student survey instrument and Tripod’s lead researcher — and written along with Charlotte Danielson — the creator of the Framework for Teaching and founder of The Danielson Group (see a prior post about this instrument here). Both instruments are “research-based,” both are used nationally and internationally, both are (increasingly being) used as key indicators to evaluate teachers across the U.S., and both were used throughout the Bill & Melinda Gates Foundation’s ($43 million worth of) Measures of Effective Teaching (MET) studies.

The chapter titled, “How Framework for Teaching and Tripod 7Cs Evidence Distinguish Key Components of Effective Teaching,” was recently published in a book all about the MET studies, titled “Designing Teacher Evaluation Systems: New Guidance from the Measures of Effective Teaching Project” written by Thomas Kane, Kerri Kerr, and Robert Pianta. The chapter is about whether and how data derived via the Tripod student survey instrument (i.e., as built on 7Cs: challenging students, control of the classroom, teacher caring, teachers confer with students, teachers captivate their students, teachers clarify difficult concepts, teachers consolidate students’ concerns) align with the data derived via Danielson’s Framework for Teaching, to collectively capture teacher effectiveness.

Another purpose for this chapter is to examine how both indicators also align with teacher level-value-added. Ferguson (and Danielson) find that:

  • Their two measures (i.e., the Tripod and the Framework for Teaching) are more reliable (and likely more valid) than value-added measures. The over-time, teacher-level classroom correlations, cited in this chapter, are r = 0.38 for value-added (which is comparable with the correlations noted in plentiful studies elsewhere), r = 0.42 for the Danielson Framework, and r = 0.61 for the Tripod student survey component. These “clear correlations,” while not strong particularly in terms of value-added, do indicate there is some common signal that the indicators are capturing, some stronger than the others (as should be obvious given the above numbers).
  • Contrary to what some (softies) might think, classroom management, not caring (i.e., the extent to which teachers care about their students and what their students learn and achieve), is the strongest predictor of a teachers’ value-added. However, the correlation (i.e., the strongest of the bunch) is still quite “weak” at an approximate r = 0.26, even though it is statistically significant. Caring, rather, is the strongest predictor of whether students are happy in their classrooms with their teachers.
  • In terms of “predicting” teacher-level value-added, and of the aforementioned 7Cs, the things that also matter “most” next to classroom management (although none of the coefficients are as strong as we might expect [i.e., r < 0.26]) include: the extent to which teachers challenge their students and have control over their classrooms.
  • Value-added in general is more highly correlated with teachers at the extremes in terms of their student survey and observational composite indicators.

In the end, while the authors of this chapter do not disclose the actual correlations between their two measures and value-added, specifically (although from the appendix one can infer that the correlation between value-added and Tripod output is around r = 0.45 as based on an unadjusted r-squared), and I should mention this is a HUGE shortcoming of this chapter (one that would not have passed peer review should this chapter have been submitted to a journal for publication), the authors do mention that “the conceptual overlap between the frameworks is substantial and that empirical patterns in the data show similarities.” Unfortunately again, however, they do not quantify the strength of said “similarities.” This only leaves us to assume that since they were not reported the actual strength of the similarities empirically observed between was likely low (as is also evidenced in many other studies, although not as often with student survey indicators as opposed to observational indicators.)

The final conclusion the authors of this chapter make is that educators “cross-walk” the two frameworks (i.e., the Tripod and the Danielson Framework) and use both frameworks when reflecting on teaching. I must say I’m concerned about these recommendations, as well, mainly given this recommendation will cost states and districts more $$$, and the returns or “added value” (using the grandest definition of this term) of doing so and engaging in such an approach does not have the necessary evidence I would say one might use to adequately justify such recommendations.

Share Button