The Gates Foundation’s Expensive ($335 Million) Teacher Evaluation Missteps

The header of an Education Week article released last week (click here) was that “[t]he Bill & Melinda Gates Foundation’s multi-million-dollar, multi-year effort aimed at making teachers more effective largely fell short of its goal to increase student achievement-including among low-income and minority students.”

An evaluation of Gates Foundation’s Intensive Partnerships for Effective Teaching initiative funded at $290 million, an extension of its Measures of Effective Teaching (MET) project funded at $45 million, was the focus of this article. The MET project was lead by Thomas Kane (Professor of Education and Economics at Harvard, former leader of the MET project, and expert witness on the defendant’s side of the ongoing lawsuit supporting New Mexico’s MET project-esque statewide teacher evaluation system; see here and here), and both projects were primarily meant to hold teachers accountable using their students test scores via growth or value-added models (VAMs) and financial incentives. Both projects were tangentially meant to improve staffing, professional development opportunities, improve the retention of the teachers of “added value,” and ultimately lead to more-effective teaching and student achievement, especially in low-income schools and schools with higher relative proportions of racial minority students. The six-year evaluation of focus in this Education Week article was conducted by the RAND Corporation and the American Institutes for Research, and the evaluation was also funded by the Gates Foundation (click here for the evaluation report, see below for the full citation of this study).

Their key finding was that Intensive Partnerships for Effective Teaching district/school sites (see them listed here) implemented new measures of teaching effectiveness and modified personnel policies, but they did not achieve their goals for students.

Evaluators also found (see also here):

  • The sites succeeded in implementing measures of effectiveness to evaluate teachers and made use of the measures in a range of human-resource decisions.
  • Every site adopted an observation rubric that established a common understanding of effective teaching. Sites devoted considerable time and effort to train and certify classroom observers and to observe teachers on a regular basis.
  • Every site implemented a composite measure of teacher effectiveness that included scores from direct classroom observations of teaching and a measure of growth in student achievement.
  • Every site used the composite measure to varying degrees to make decisions about human resource matters, including recruitment, hiring, placement, tenure, dismissal, professional development, and compensation.

Overall, the initiative did not achieve its goals for student achievement or graduation, especially for low-income and racial minority students. With minor exceptions, student achievement, access to effective teaching, and dropout rates were also not dramatically better than they were for similar sites that did not participate in the intensive initiative.

Their recommendations were as follows (see also here):

  • Reformers should not underestimate the resistance that could arise if changes to teacher-evaluation systems have major negative consequences.
  • A near-exclusive focus on teacher evaluation systems such as these might be insufficient to improve student outcomes. Many other factors might also need to be addressed, ranging from early childhood education, to students’ social and emotional competencies, to the school learning environment, to family support. Dramatic improvement in outcomes, particularly for low-income and racial minority students, will likely require attention to many of these factors as well.
  • In change efforts such as these, it is important to measure the extent to which each of the new policies and procedures is implemented in order to understand how the specific elements of the reform relate to outcomes.


Stecher, B. M., Holtzman, D. J., Garet, M. S., Hamilton, L. S., Engberg, J., Steiner, E. D., Robyn, A., Baird, M. D., Gutierrez, I. A., Peet, E. D., de los Reyes, I. B., Fronberg, K., Weinberger, G., Hunter, G. P., & Chambers, J. (2018). Improving teaching effectiveness: Final report. The Intensive Partnerships for Effective Teaching through 2015–2016. Santa Monica, CA: The RAND Corporation. Retrieved from

  1. What a waste of time and money. The concept of effectiveness in teaching was tied to one size fits all data gathering, scores on flawed observation rubrics, achievement measures tied to standardized tests, student surveys loaded to reward teachers who conformed to a narrow vision of effective teaching (e.g., as sage on the stage and giver and checker of homework assignments), implicitly rewarding students for correct answers to questions asked by others and so on. Teacher effectiveness, as measured by economists in charge of the MET project, is a variant of measures of worker productivity in a factory. The whole venture also negates the value of studies in subjects for which correct answers are less important than nuanced judgments. It pains me to know that the Gates Foundation is now tinkering with curriculum and supporting governance stuctures designed to suppress public voice in public education.

  2. I’d add this—from pg. 73 of the report: “Although sites designed and implemented their observation systems differently, most used Danielson’s FFT as a starting point. All but one of the sites developed rubrics based on the Danielson framework, which meant that these sites emphasized a constructivist approach to pedagogy that involves high levels of student engagement and communication.”

    To me, this report is just as damning of the Danielson framework and its efforts to push engagement over other elements of teaching/learning. It no longer deserves the label of “gold standard” in teacher eval rubrics.

  3. Ditto (what Laura said)

    I guess (?) you always need research but a little more than frustrating when one considers the millions of wasted dollars (that I could have used 🙂 and when the results turn out exactly the way teachers on the front-line in the trenches have been saying for years: it [these so-called “reforms”] won’t work! (because the issues go beyond effective teaching, they are multi-varied, dynamic and extremely complex).

    I think there definitely are solutions– not pie-in-the-sky 100% student achievement solutions, mind you– but solutions none the less as informed by cognitive science.

    But unfortunately few people seem to want to face the consequences and hard work of it: e.g. the only scientifically known way to “catch-up” students who are behind and close/narrow the achievement gaps is the same for anyone who is behind: more long hours of work/education than their peers- equal classroom time only widens the achievement gap (as explained more below).

    Did action research to narrow special ed-regular ed student achievement gap but at a high price of extreme resistance along with the obligatory blood, sweat and tears.

    Used combo blended learning (not the only way to accomplish) + mastery learning that is done the right way (a key component) to hold students accountable for learning (another key component) via online learning system that required students to continually repeat the content learning until demonstrated mastery….

    Average student could master content in one chapter online in about 2 hours while some special ed took as long as 10 hours. It was brutal but barring irreparable organic disorders that severely limit learning progress those students who stuck with it through blood, sweat and literal tears progressively improved from 10 to 6 to 4 to 2 hours per chapter and some even ended up excelling beyond their “regular” ed peers and the increase in classroom participation, understanding, interconnections, critical thinking and boost to personal confidence they acquired was worth it.

    It’s all about learning curve: cognitively the more you know on a subject (for example, learning a foreign language or even a skill such as learning to play a musical instrument) the easier to learn, acquire new domain specific knowledge and think critically on that same subject (due to freed up “space” in working memory among other things, which we can empirically and quantitatively measure) than for someone brand new or behind who has to struggle through the steep learning curve just to get jump-started with the “basics”–a barrier that unfortunately many students give up and/or experience initial failure that reinforces feelings of “why-try” inadequacy…

    Many students simply don’t understand how truly hard you have to work especially in the beginning and the students who most need to spend this enormous amount of extra learning time are also the ones least likely to put in the time (e.g. reinforced failure per above; demographics/Maslow’s hierarchy of needs- some students such as in poverty situations are understandably less concerned about education- seen as a luxury even if it is a potential long-term solution- and more concerned about day-to-day practical survival)….

    Unless….there is an assured system of accountability in place AND support for students/families* (although see note below).

    But we lack adequate support systems and few people at least in the US (teachers, parents, admin) seem to want to hold students to such high levels of accountability (teachers and unions make easier scapegoats) and some even think it’s “cruel” and unreasonable to do so. Plus, the teachers who try but lack sufficient support will find themselves exhausted by the student resistance encountered, parent complaints (which are reversed to praise for those who stick it out and trust you/what you are saying needs to be done), admin breathing down your neck if anything simply from the fact they don’t like to be bothered by parent complaints (who does?), and in extreme cases job loss, so who wants to risk that….And on top of this are the many important philosophical, pedagogical and practical questions raised from change agents and psychology of introducing new status-quo-challenging changes in ways that promote buy-in to student choice in learning to traditional education to work ethic and more.

    Anything worth having, though, usually comes at the cost of much blood, sweat and tears (ask any PhD-earner or world-class expert in anything how they got there)–that’s often what gives it value and makes it worth having.

    *Note/caveat: you won’t get 100% achievement though because of the same learning curve, because of different individual priorities and because students who outright refuse or simply don’t want to can find ways around–I remember one student who still found a way around not learning even with parent & teacher employing “stand-over-your-shoulder” type accountability.

