This one (should) explain itself (i.e., not reliable, and accordingly, not valid either). When high-stakes consequences are to be attached, this matters even more.
The state of Florida is another one of our state’s to watch in that, even since the passage of the Every Student Succeeds Act (ESSA) last January, the state is still moving forward with using its VAMs for high-stakes accountability reform. See my most recent post about one district in Florida here, after the state ordered it to dismiss a good number of its teachers as per their low VAM scores when this school year started. After realizing this also caused or contributed to a teacher shortage in the district, the district scrambled to hire Kelly Services contracted substitute teachers to replace them, after which the district also put administrators back into the classroom to help alleviate the bad situation turned worse.
In a recent post released by The Ledger, teachers from the same Polk County School District (size = 100K students) added much needed details and also voiced concerns about all of this in the article that author Madison Fantozzi titled “Polk teachers: We are more than value-added model scores.”
Throughout this piece Fantozzi covers the story of Elizabeth Keep, a teacher who was “plucked from” the middle school in which she taught for 13 years, after which she was involuntarily placed at a district high school “just days before she was to report back to work.” She was one of 35 teachers moved from five schools in need of reform as based on schools’ value-added scores, although this was clearly done with no real concern or regard of the disruption this would cause these teachers, not to mention the students on the exiting and receiving ends. Likewise, and according to Keep, “If you asked students what they need, they wouldn’t say a teacher with a high VAM score…They need consistency and stability.” Apparently not. In Keep’s case, she “went from being the second most experienced person in [her middle school’s English] department…where she was department chair and oversaw the gifted program, to a [new, and never before] 10th- and 11th-grade English teacher” at the new high school to which she was moved.
As background, when Polk County School District officials presented turnaround plans to the State Board of Education last July, school board members “were most critical of their inability to move ‘unsatisfactory’ teachers out of the schools and ‘effective’ teachers in.” One board member, for example, expressed finding it “horrendous” that the district was “held hostage” by the extent to which the local union was protecting teachers from being moved as per their value-added scores. Referring to the union, and its interference in this “reform,” he accused the unions of “shackling” the districts and preventing its intended reforms. Note that the “effective” teachers who are to replace the “ineffective” ones can earn up to $7,500 in bonuses per year to help the “turnaround” the schools into which they enter.
Likewise, the state’s Commissioner of Education concurred saying that she also “wanted ‘unsatisfactory’ teachers out and ‘highly effective’ teachers in,” again, with effectiveness being defined by teachers’ value-added or lack thereof, even though (1) the teachers targeted only had one or two years of the three years of value-added data required by state statute, and even though (2) the district’s senior director of assessment, accountability and evaluation noted that, in line with a plethora of other research findings, teachers being evaluated using the state’s VAM have a 51% chance of changing their scores from one year to the next. This lack of reliability, as we know it, should outright prevent any such moves in that without some level of stability, valid inferences from which valid decisions are to be made cannot be drawn. It’s literally impossible.
Nonetheless, state board of education members “unanimously… threatened to take [all of the district’s poor performing] over or close them in 2017-18 if district officials [didn’t] do what [the Board said].” See also other tales of similar districts in the article available, again, here.
In Keep’s case, “her ‘unsatisfactory’ VAM score [that caused the district to move her, as] paired with her ‘highly effective’ in-class observations by her administrators brought her overall district evaluation to ‘effective’…[although she also notes that]…her VAM scores fluctuate because the state has created a moving target.” Regardless, Keep was notified “five days before teachers were due back to their assigned schools Aug. 8 [after which she was] told she had to report to a new school with a different start time that [also] disrupted her 13-year routine and family that shares one car.”
VAM-based chaos reigns, especially in Florida.
For those of you interested, and perhaps close to Houston, Texas, I will be presenting my research on the Houston Independent School District’s (now hopefully past) use of the Education Value-Added Assessment System for more high-stakes, teacher-level consequences than anywhere else in the nation.
As you may recall from prior posts (see, for example, here, here, and here), seven teachers in the disrict, with the support of the Houston Federation of Teachers (HFT), are taking the district to federal court over how their value-added scores are/were being used, and allegedly abused. The case, Houston Federation of Teachers, et al. v. Houston ISD, is still ongoing; although, also as per a prior post, the school board just this past June, in a 3:3 split vote, elected to no longer pay an annual $680K to SAS Institute Inc. to calculate the district’s EVAAS estimates. Hence, by non-renewing this contract it appears, at least for the time being, that the district is free from its prior history using the EVAAS for high-stakes accountability. See also this post here for an analysis of Houston’s test scores post EVAAS implementation, as compared to other districts in the state of Texas. Apparently, all of the time and energy invested did not pay off for the district, or more importantly its teachers and students located within its boundaries.
Anyhow, those presenting and attending the conference–the Houston Education and Civil Rights Summit, as also sponsored and supported by United Opt Out National–will prioritize and focus on the “continued challenges of public education and the teaching profession [that] have only been exacerbated by past and current policies and practices,” as well as “the shifting landscape of public education and its impact on civil and human rights and civil society.”
As mentioned, I will be speaking, alongside two featured speakers: Samuel Abrams–the Director of the National Center for the Study of Privatization in Education (NCSPE) and an instructor in Columbia’s Teachers College, and Julian Vasquez Heilig–Professor of Educational Leadership and Policy Studies at California State Sacramento and creator of the blog Cloaking Inequality. For more information about these and other speakers, many of whom are practitioners, see the conference website available, again, here.
When is it? Friday, October 14, 2016 at 4:00 PM through to Saturday, October 15, 2016 at 8:00 PM (CDT).
Where is it? Houston Hilton Post Oak – 2001 Post Oak Blvd, Houston, TX 77056
Hope to see you there!
You might recognize the title of this post from one of my all time favorite Broadway shoes: The Phantom Of The Opera – Masquerade/Why So Silent. I thought I would use it here, to explain my recent and notable silence on the topic of value-added models (VAMs).
First, I recently returned from summer break, although I still occasionally released blog posts when important events related to VAMs and their (ab)uses for teacher evaluation purposes occurred. More importantly, though, the frequency with which said important events have happened has, relatively, fortunately, and significantly declined.
Yes — the so-far-so-good news is that schools, school districts, and states are apparently not as nearly active, or actively pursuing the use of VAMs for stronger teacher accountability purposes for educational reform. Likewise, schools, school districts, and states are not as nearly prone to make really silly (and stupid) decisions with these models, especially without the research supporting such decisions.
This is very much due to the federal government’s recent (January 1, 2016) passage of the Every Student Succeeds Act (ESSA) that no longer requires teachers to be evaluated by their student’s tests score, for example, using VAMs (see prior posts on this here and here).
While there are still states, districts, and schools that are still moving forward with VAMs and their original high-stakes teacher evaluation plans as largely based on VAMs (e.g., New Mexico, Tennessee, Texas), many others have really begun to rethink the importance and vitality of VAMs as part of their teacher evaluation systems for educational reform (e.g., Alabam, Georgia, Oklahoma). This, of course, is primary at the state level. Certainly, there are districts out there representing both sides of the same continuum.
Accordingly, however, I have had multiple conversations with colleagues and others regarding what I might do with this blog should people stop seriously investing and riding their teacher/educational reform efforts on VAMs. While I don’t think that this will ever happen, there is honestly nothing I would like more (as an academic) than to close this blog down, should educational policymakers, politicians, philanthropists, and others focus on new and entirely different, non-Draconian ways to reform America’s public schools. We shall see how it goes.
But for now, why have I been relatively so silent? The VAM as we currently know it, in use and implementation, might very well be turning into our VAMtom of the Profession 😉
For those of you who might recall, just over two years ago my book titled “Rethinking Value-Added Models in Education: Critical Perspectives on Tests and Assessment-Based Accountability,” was officially released by my publisher – Routledge, New York. The book has since been reviewed twice – once by Rachael Gabriel, an Assistant Professor at the University of Connecticut, in Education Review: A Multilingual Journal of Book Reviews (click here for the full review), and another time by Lauren Bryant, Research Scholar at North Carolina State University, in Teachers College Record (although the full review is no longer available for free).
It was just reviewed again, this time by Natalia Guzman, a doctoral student at the University of Maryland. This review was published, as well, in Education Review: A Multilingual Journal of Book Reviews (click here for the full review). Here are some of the highlights and key sections, especially important for those of you who might have not yet read the book, or know others who should.
- “Throughout the book, author Audrey Amrein-Beardsley synthesizes and critiques
numerous studies and cases from both academic and popular outlets. The main
themes that organize the content of book involve the development, implementation,
consequences, and future of valued-added methods for teacher accountability: 1) the use of social engineering in American educational policy; 2) the negative impact on the human factor in schools; 3) the acceptance of unquestioned theoretical and methodological assumptions in VAMs; and 4) the availability of conventional alternatives and solutions to a newly created problem.”
- “The book’s most prominent theme, the use of social engineering in American educational policy, emerges in the introductory chapters of the book. The author argues that U.S. educational policy is predicated on the concept of social engineering—a powerful instrument that influences attitudes and social behaviors to promote the achievement of idealized political ends. In the case of American educational policy, the origins and development of VAMs is connected to the
goal of improving student achievement and solving the problem of America’s failing public school system.”
- “The human factor involved in the implementation of VAMs emerges as a
prominent theme…Amrein-Beardsley uses powerful examples of research-
based accounts of how VAMs affected teachers and school districts, important
aspects of the human factor involved in the implementation of these models.”
- “This reader appreciated the opportunity to learn about research that directly questions similar statistical and methodological assumptions in a way that was
highly accessible, surprisingly, since discussions about VAM methodology tends to
be highly technical.”
- “The book closes with an exploration of some traditional and conventional alternatives to VAMs…The virtue of [these] proposal[s] is that it contextualizes teacher evaluation, offering multiple perspectives of the complexity of teaching, and it engages different members of the school community, bringing in the voices of teacher colleagues, parents and/or students.”
- “Overall, this book offers one of the most comprehensive critiques of what we
know about VAMs in the American public education system. The author contextualizes her critique to added-value methods in education within a larger socio-political discussion that revisits the history and evolution of teacher accountability in the US. The book incorporates studies from academic sources as well as summarizes cases from popular outlets such as newspapers and blogs.
This author presents all this information using nontechnical language, which makes it suitable for the general public as well as academic readers. Another major contribution of this book is that it gives voice to the teachers and school administrators that were affected by VAMs, an aspect that has not yet been
Thanks go out to Natalia for such a great review, and also effectively summarizing what she sees (and others have also seen) as the “value-added” in this book.
The American Prospect — a self-described “liberal intelligence” magazine — featured last week a question and answer, interview-based article with Jesse Rothstein — Professor of Economics at University of California – Berkeley — on “The Economic Consequences of Denying Teachers Tenure.” Rothstein is a great choice for this one in that indeed he is an economist, but one of a few, really, who is deep into the research literature and who, accordingly, has a balanced set of research-based beliefs about value-added models (VAMs), their current uses in America’s public schools, and what they can and cannot do (theoretically) to support school reform. He’s probably most famous for a study he conducted in 2009 about how the non-random, purposeful sorting of students into classrooms indeed biases (or distorts) value-added estimations, pretty much despite the sophistication of the statistical controls meant to block (or control for) such bias (or distorting effects). You can find this study referenced here, and a follow-up to this study here.
In this article, though, the interviewer — Rachel Cohen — interviews Jesse primarily about how in California a higher court recently reversed the Vergara v. California decision that would have weakened teacher employment protections throughout the state (see also here). “In 2014, in Vergara v. California, a Los Angeles County Superior Court judge ruled that a variety of teacher job protections worked together to violate students’ constitutional right to an equal education. This past spring, in a 3–0 decision, the California Court of Appeals threw this ruling out.”
Here are the highlights in my opinion, by question and answer, although there is much more information in the full article here:
Cohen: “Your research suggests that even if we got rid of teacher tenure, principals still wouldn’t fire many teachers. Why?”
Rothstein: “It’s basically because in most cases, there’s just not actually a long list of [qualified] people lining up to take the jobs; there’s a shortage of qualified teachers to hire.” In addition, “Lots of schools recognize it makes more sense to keep the teacher employed, and incentivize them with tenure…”I’ve studied this, and it’s basically economics 101. There is evidence that you get more people interested in teaching when the job is better, and there is evidence that firing teachers reduces the attractiveness of the job.”
Cohen: Your research suggests that even if we got rid of teacher tenure, principals still wouldn’t fire many teachers. Why?
Rothstein: It’s basically because in most cases, there’s just not actually a long list of people lining up to take the jobs; there’s a shortage of qualified teachers to hire. If you deny tenure to someone, that creates a new job opening. But if you’re not confident you’ll be able to fill it with someone else, that doesn’t make you any better off. Lots of schools recognize it makes more sense to keep the teacher employed, and incentivize them with tenure.
Cohen: “Aren’t most teachers pretty bad their first year? Are we denying them a fair shot if we make tenure decisions so soon?”
Rothstein: “Even if they’re struggling, you can usually tell if things will turn out to be okay. There is quite a bit of evidence for someone to look at.”
Cohen: “Value-added models (VAM) played a significant role in the Vergara trial. You’ve done a lot of research on these tools. Can you explain what they are?”
Rothstein: “[The] value-added model is a statistical tool that tries to use student test scores to come up with estimates of teacher effectiveness. The idea is that if we define teacher effectiveness as the impact that teachers have on student test scores, then we can use statistics to try to then tell us which teachers are good and bad. VAM played an odd role in the trial. The plaintiffs were arguing that now, with VAM, we have these new reliable measures of teacher effectiveness, so we should use them much more aggressively, and we should throw out the job statutes. It was a little weird that the judge took it all at face value in his decision.”
Cohen: “When did VAM become popular?”
Rothstein: “I would say it became a big deal late in the [George W.] Bush administration. That’s partly because we had new databases that we hadn’t had previously, so it was possible to estimate on a large scale. It was also partly because computers had gotten better. And then VAM got a huge push from the Obama administration.”
Cohen: “So you’re skeptical of VAM.”
Rothstein: “I think the metrics are not as good as the plaintiffs made them out to be. There are bias issues, among others.”
Cohen: “During the Vergara trials you testified against some of Harvard economist Raj Chetty’s VAM research, and the two of you have been going back and forth ever since. Can you describe what you two are arguing about?”
Rothstein: “Raj’s testimony at the trial was very focused on his work regarding teacher VAM. After the trial, I really dug in to understand his work, and I probed into some of his assumptions, and found that they didn’t really hold up. So while he was arguing that VAM showed unbiased results, and VAM results tell you a lot about a teacher’s long-term outcomes, I concluded that what his approach really showed was that value-added scores are moderately biased, and that they don’t really tell us one way or another about a teacher’s long-term outcomes” (see more about this debate here).
Cohen: “Could VAM be improved?”
Rothstein: “It may be that there is a way to use VAM to make a better system than we have now, but we haven’t yet figured out how to do that. Our first attempts have been trying to use them in not very intelligent ways.”
Cohen: “It’s been two years since the Vergara trial. Do you think anything’s changed?”
Rothstein: “I guess in general there’s been a little bit of a political walk-back from the push for VAM. And this retreat is not necessarily tied to the research evidence; sometimes these things just happen. But I’m not sure the trial court opinion would have come out the same if it were held today.”
Again, see more from this interview, also about teacher evaluation systems in general, job protections, and the like in the full article here.
Citation: Cohen, R. M. (2016, August 4). Q&A: The economic consequences of eenying teachers tenure. The American Prospect. Retrieved from http://prospect.org/article/qa-economic-consequences-denying-teachers-tenure
As we all likely know well by now, speakers for both parties during last and this weeks’ Republican and Democratic Conventions, respectively, spoke and in many cases spewed a number of exaggerated, misleading, and outright false claims about multiple areas of American public policy…educational policy included. Hence, many fact-checking journalists, websites, social mediaists, and the like, have since been trying to hold both parties accountable for their facts and make “the actual facts” more evident. For a funny video about all of this, actually, see HBO’s John Oliver’s most recent bit on “last week’s unsurprisingly surprising Republican convention” here (11 minutes) and some of their expressions of “feelings” as “facts.”
Fittingly, The 74 — an (allegedly) non-partisan, honest, and fact-based news site (ironically) covering America’s education system “in crisis,” and publishing articles “backed by investigation, expertise, and experience” and backed by Editor-in-Chief Campbell Brown — took on such a fact-checking challenge in an article senior staff writer Matt Burnum wrote: “Researchers: No Consensus Against Using Test Scores in Teacher Evaluations, Contra Democratic Platform.”
Apparently, what author Barnum actually did to justify the title and contents of his article, however, was (1) take the claim written into the 55-page “2016 Democratic Party Platform” document that: “We [the Democratic Party] oppose…the use of student test scores in teacher and principal evaluations, a practice which has been repeatedly rejected by researchers” (p. 33); then (2) generalize what being “repeatedly rejected by researchers” means, to inferring that a “consensus,” “wholesale,” and “categorical rejection” among researchers “that such scores should not be used whatsoever in evaluation” exists; then (3) proceed to ask a non-random or representative sample of nine researchers on the topic about whether, indeed, his deduced conclusion was true; to (4) ultimately claim that “the [alleged] suggestion that there is a scholarly consensus against using test scores in teacher evaluation is misleading.”
Misleading, rather, is Barnum’s framing of his entire piece, as Barnum twisted the original statement into something more alarmist, which apparently warranted his fact-checking, after which he engaged in a weak convenience-based investigation, with unsubstantiated findings ultimately making the headline of this subsequent article. It seems that those involved in reporting “the actual facts” also need some serious editing and fact-checking themselves in that, “The 74’s poll of just nine researchers [IS NOT]
may not be a representative sample of expert opinion,” whatsoever.
Nonetheless, the nine respondents (also without knowledge of who was contacted but did not respond, i.e., a response rate) included: Dan Goldhaber — Adjunct Professor of Education and Economics at the University of Washington, Bothell; Kirabo Jackson — Associate Professor of Education and Economics at Northwestern University; Cory Koedel — Associate Professor of Economics and Public Policy at the University of Missouri; Matthew Kraft — Assistant Professor of Education and Economics at Brown University; Susan Moore Johnson — Professor of Teacher Policy at Harvard University; Jesse Rothstein — Professor of Public Policy and Economics at the University of California, Berkeley; Matthew Steinberg — Assistant Professor of Educational Policy at the University of Pennsylvania; Katharine Strunk — Associate Professor of Educational Policy at the University of Southern California; Jim Wyckoff — Professor of Educational Policy at the University of Virginia. You can see what appear to be these researchers’ full responses to Barnum’s undisclosed solicitation at the bottom of this article, available again here, noting that the opinions of these nine are individually important as I too would value some of these nine as among (but not representative of) the experts in the area of research (see a fuller list of 37 such experts here, 2/3rds of whom are listed above).
Regardless, and assuming that Barnum’s original misinterpretation was correct, I think how Katharine Strunk put it is likely more representative of the group of researchers on this topic as a whole as based on the research: “I think the research suggests that we need multiple measures — test scores [depending on the extent to which evidence supports low- and more importantly high-stakes use], observations, and others – to rigorously and fairly evaluate teachers.” Likewise, how Jesse Rothstein framed his response, in my opinion, is another takeaway for those looking for what is more likely a more accurate and representative statement on this hypothetical consensus: “the weight of the evidence, and the weight of expert opinion, points to the conclusion that we haven’t figured out ways to use test scores in teacher evaluations that yield benefits greater than costs.”
With that being said, what is likely most the “fact” desired in this particular instance is that “the use of student test scores in teacher and principal evaluations, [IS] a practice which has been repeatedly rejected by researchers.” But it has also been disproportionately promoted by researchers with disciplinary backgrounds in economics (although this is not always the case), and disproportionately rejected so by those with disciplinary backgrounds in education, educational policy, educational measurement and statistics, and the like (although this is not always the case). The bottom line is that reaching a consensus in this area of research is much more difficult than Barnum and others might otherwise assume.
Should one really want to “factually” answer such a question, (s)he would have to more carefully: (1) define the problem and subsequent research question (e.g., the platform never claimed in the first place that said “consensus” existed), (2) engage in background research to (3) methodically define the population of researchers from which (4) the research sample is to be drawn to adequately represent the population, after which (5) an appropriate response rate is to be secured. If there are methodological weaknesses in any of these steps, the research exercise should likely stop, as Barnum should have during step #1 in this case here.
As per a recent article by Chalkbeat Colorado, “Denver Public Schools [is] Set to Strip Nearly 50 Teachers of Tenure Protections after [two-years of consecutive] Poor Evaluations.” This will make Denver Public Schools — Colorado’s largest school district — the district with the highest relative proportion of teachers to lose tenure, which demotes teachers to probationary status, which also causes them to lose their due process rights.
- The majority of the 47 teachers — 26 of them — are white. Another 14 are Latino, four are African-American, two are multi-racial and one is Asian.
- Thirty-one of the 47 teachers set to lose tenure — or 66 percent — teach in “green” or “blue” schools, the two highest ratings on Denver’s color-coded School Performance Framework. Only three — or 6 percent — teach in “red” schools, the lowest rating.
- Thirty-eight of the 47 teachers — or 81 percent — teach at schools where more than half of the students qualify for federally subsidized lunches, an indicator of poverty.
Elsewhere, in Douglas County 24, in Aurora 12, in Cherry Creek one, and in Jefferson County, the state’s second largest district, zero teachers teachers are set to lose their tenure status. This all occurred provided a sweeping educator effectiveness law — Senate Bill 191 — passed throughout Colorado six years ago. As per this law, “at least 50 percent of a teacher’s evaluation [must] be based on student academic growth.”
“Because this is the first year teachers can lose that status…[however]…officials said it’s difficult to know why the numbers differ from district to district.” This, of course, is an issue with fairness whereby a court, for example, could find that if a teacher is teaching in District X versus District Y, and (s)he had an different probability of losing tenure due only to the District in which (s)he taught, this could be quite easily argued as an arbitrary component of the law, not to mention an arbitrary component of its implementation. If I was advising these districts on these matters, I would certainly advise them to tread lightly.
However, apparently many districts throughout Colorado use a state-developed and endorsed model to evaluate their teachers, but Denver uses its own model; hence, this would likely take some of the pressure off of the state, should this end up in court, and place it more so upon the district. That is, the burden of proof would likely rest on Denver Public School officials to evidence that they are no only complying with the state law but that they are doing so in sound, evidence-based, and rational/reasonable ways.
Citation: Amar, M. (2016, July 15). Denver Public Schools set to strip nearly 50 teachers of tenure protections after poor evaluations. Chalkbeat Colorado. Retrieved from http://www.chalkbeat.org/posts/co/2016/07/14/denver-public-schools-set-to-strip-nearly-50-teachers-of-tenure-protections-after-poor-evaluations/#.V5Yryq47Tof
For those of you looking for a good read, you may want to check out this new book: “Learning from the Federal Market‐Based Reforms: Lessons for ESSA [the Every Student Succeeds Act]” here.
As Larry Cuban put it, the book’s editors have a “cast of all-star scholars” in this volume, and in Gloria Ladson-Billings words, the editors “assembled some of the nation’s best minds” to examine the evidence on today’s market-based reforms as well as more promising, equitable ones. For full disclosure, I have a chapter in this book about using value-added models (VAMs) to measure and evaluate teacher education programs (see below), although I am not making any royalties from book sales.
If interested, you can purchase the book at a reduced price of $30 (from $40) per paperback thru 7/31/17, using the following discount code at checkout: LFMBR30350. Here, again, is the link.
ABOUT THE BOOK: Over the past twenty years, educational policy has been characterized by top‐down, market‐focused policies combined with a push toward privatization and school choice. The new Every Student Succeeds Act continues along this path, though with decision‐making authority now shifted toward the states. These market‐based reforms have often been touted as the most promising response to the challenges of poverty and educational disenfranchisement. But has this approach been successful? Has learning improved? Have historically low‐scoring schools “turned around” or have the reforms had little effect? Have these narrow conceptions of schooling harmed the civic and social purposes of education in a democracy?
This book presents the evidence. Drawing on the work of the nation’s most prominent researchers, the book explores the major elements of these reforms, as well as the social, political, and educational contexts in which they take place. It examines the evidence supporting the most common school improvement strategies: school choice; reconstitutions, or massive personnel changes; and school closures. From there, it presents the research findings cutting across these strategies by addressing the evidence on test score trends, teacher evaluation, “miracle” schools, the Common Core State Standards, school choice, the newly emerging school improvement industry, and re‐segregation, among others.
The weight of the evidence indisputably shows little success and no promise for these reforms. Thus, the authors counsel strongly against continuing these failed policies. The book concludes with a review of more promising avenues for educational reform, including the necessity of broader societal investments for combatting poverty and adverse social conditions. While schools cannot single‐handedly overcome societal inequalities, important work can take place within the public school system, with evidence‐based interventions such as early childhood education, detracking, adequate funding and full‐service community schools—all intended to renew our nation’s commitment to democracy and equal educational opportunity.
CONTENTS BY SECTION AND CHAPTER
Foreword, Jeannie Oakes
SECTION I: INTRODUCTION: THE FOUNDATIONS OF MARKET BASED REFORM
- Purposes of Education: The Language of Schooling, Mike Rose.
- The Political Context, Janelle Scott.
- Historical Evolution of Test‐Based Reforms, Harvey Kantor and Robert Lowe.
- Predictable Failure of Test‐Based Accountability, Heinrich Mintrop and Gail Sunderman.
SECTION II: TEST‐BASED SANCTIONS: WHAT THE EVIDENCE SAYS
- Transformation & Reconstitution, Betty Malen and Jennifer King Rice.
- Turnarounds, Tina Trujillo and Michelle Valladares.
- Restart/Conversion, Gary Miron and Jessica Urschel.
- Closures, Ben Kirshner, Erica Van Steenis, Kristen Pozzoboni, and Matthew Gaertner.
SECTION III: FALSE PROMISES
- Miracle School Myth, P. L. Thomas.
- Has Test‐Based Accountability Worked? Committee on Incentives and Test‐Based Accountability in Public Education (Michael Hout & Stuart Elliott, Eds.).
- The Effectiveness of Test‐Based Reforms. Kevin Welner and William Mathis.
- Value Added Models: Teacher, Principal and School Evaluations, American Statistical Association.
- The Problems with the Common Core, Stan Karp.
- Reform and Re‐Segregation, Gary Orfield.
- English Language Learners. Angela Valenzuela and Brendan Maxcy.
- Racial Disproportionality: Discipline, Anne Gregory, Russell Skiba, and Pedro Noguera.
- School Choice, Christopher Lubienski and Sarah Theule Lubienski.
- The Privatization Industry, Patricia Burch and Jahni Smith.
- Virtual Education, Michael Barbour.
SECTION IV: EFFECTIVE REFORMS
- Addressing Poverty, David Berliner.
- Racial Segregation & Achievement, Richard Rothstein.
- Adequate Funding, Michael Rebell.
- Early Childhood Education, Steven Barnett.
- De‐Tracking, Kevin Welner and Carol Corbett Burris.
- Class Size, Diane Whitmore Schanzenbach.
- School–Community Partnerships, Linda Valli, Amanda Stefanski, and Reuben Jacobson.
- Community Organizing for Grassroots Support, Mark Warren.
- Teacher Education, Audrey Amrein‐Beardsley, Joshua Barnett, and Tirupalavanam Ganesh.
SECTION V: CONCLUSION
Thomas Toch — education policy expert and research fellow at Georgetown University, and founding director of the Center on the Future of American Education — just released, as part of the Center, a report titled: Grading the Graders: A Report on Teacher Evaluation Reform in Public Education. He sent this to me for my thoughts, and I decided to summarize my thoughts here, with thanks and all due respect to the author, as clearly we are on different sides of the spectrum in terms of the literal “value” America’s new teacher evaluation systems might in fact “add” to the reformation of America’s public schools.
While quite a long and meaty report, here are some of the points I think that are important to address publicly:
First, is it true that using prior teacher evaluation systems (which were almost if not entirely based on teacher observational systems) yielded for “nearly every teacher satisfactory ratings”? Indeed, this is true. However, what we have seen since 2009, when states began to adopt what were then (and in many ways still are) viewed as America’s “new and improved” or “strengthened” teacher evaluation systems, is that for 70% of America’s teachers, these teacher evaluation systems are still based only on the observational indicators being used prior, because for only 30% of America’s teachers are value-added estimates calculable. As also noted in this report, it is for these 70% that “the superficial teacher [evaluation] practices of the past” (p. 2) will remain the same, although I disagree with this particular adjective, especially when these measures are used for formative purposes. While certainly imperfect, these are not simply “flimsy checklists” of no use or value. There is, indeed, much empirical research to support this assertion.
Likewise, these observational systems have not really changed since 2009, or 1999 for that matter and not that they could change all that much; but, they are not in their “early stages” (p. 2) of development. Indeed, this includes the Danielson Framework explicitly propped up in this piece as an exemplar, regardless of the fact it has been used across states and districts for decades and it is still not functioning as intended, especially when summative decisions about teacher effectiveness are to be made (see, for example, here).
Hence, in some states and districts (sometimes via educational policy) principals or other observers are now being asked, or required to deliberately assign to teachers’ lower observational categories, or assign approximate proportions of teachers per observational category used. Whereby the instrument might not distribute scores “as currently needed,” one way to game the system is to tell principals, for example, that they should only allot X% of teachers as per the three-to-five categories most often used across said instruments. In fact, in an article one of my doctoral students and I have forthcoming, we have termed this, with empirical evidence, the “artificial deflation” of observational scores, as externally being persuaded or required. Worse is that this sometimes signals to the greater public that these “new and improved” teacher evaluation systems are being used for more discriminatory purposes (i.e., to actually differentiate between good and bad teachers on some sort of discriminating continuum), or that, indeed, there is a normal distribution of teachers, as per their levels of effectiveness. While certainly there is some type of distribution, no evidence exists whatsoever to suggest that those who fall on the wrong side of the mean are, in fact, ineffective, and vice versa. It’s all relative, seriously, and unfortunately.
Related, the goal here is really not to “thoughtfully compare teacher performances,” but to evaluate teachers as per a set of criteria against which they can be evaluated and judged (i.e., whereby criterion-referenced inferences and decisions can be made). Inversely, comparing teachers in norm-referenced ways, as (socially) Darwinian and resonate with many-to-some, does not necessarily work, either or again. This is precisely what the authors of The Widget Effect report did, after which they argued for wide-scale system reform, so that increased discrimination among teachers, and reduced indifference on the part of evaluating principals, could occur. However, as also evidenced in this aforementioned article, the increasing presence of normal curves illustrating “new and improved” teacher observational distributions does not necessarily mean anything normal.
And were these systems not used often enough or “rarely” prior, to fire teachers? Perhaps, although there are no data to support such assertions, either. This very argument was at the heart of the Vergara v. California case (see, for example, here) — that teacher tenure laws, as well as laws protecting teachers’ due process rights, were keeping “grossly ineffective” teachers teaching in the classroom. Again, while no expert on either side could produce for the Court any hard numbers regarding how many “grossly ineffective” teachers were in fact being protected but such archaic rules and procedures, I would estimate (as based on my years of experience as a teacher) that this number is much lower than many believe it (and perhaps perpetuate it) to be. In fact, there was only one teacher whom I recall, who taught with me in a highly urban school, who I would have classified as grossly ineffective, and also tenured. He was ultimately fired, and quite easy to fire, as he also knew that he just didn’t have it.
Now to be clear, here, I do think that not just “grossly ineffective” but also simply “bad teachers” should be fired, but the indicators used to do this must yield valid inferences, as based on the evidence, as critically and appropriately consumed by the parties involved, after which valid and defensible decisions can and should be made. Whether one calls this due process in a proactive sense, or a wrongful termination suit in a retroactive sense, what matters most, though, is that the evidence supports the decision. This is the very issue at the heart of many of the lawsuits currently ongoing on this topic, as many of you know (see, for example, here).
Finally, where is the evidence, I ask, for many of the declaration included within and throughout this report. A review of the 133 endnotes included, for example, include only a very small handful of references to the larger literature on this topic (see a very comprehensive list of these literature here, here, and here). This is also highly problematic in this piece, as only the usual suspects (e.g., Sandi Jacobs, Thomas Kane, Bill Sanders) are cited to support the assertions advanced.
Take, for example, the following declaration: “a large and growing body of state and local implementation studies, academic research, teacher surveys, and interviews with dozens of policymakers, experts, and educators all reveal a much more promising picture: The reforms have strengthened many school districts’ focus on instructional quality, created a foundation for making teaching a more attractive profession, and improved the prospects for student achievement” (p. 1). Where is the evidence? There is no such evidence, and no such evidence published in high-quality, scholarly peer-reviewed journals of which I am aware. Again, publications released by the National Council on Teacher Quality (NCTQ) and from the Measures of Effective Teaching (MET) studies, as still not externally reviewed and still considered internal technical reports with “issues”, don’t necessarily count. Accordingly, no such evidence has been introduced, by either side, in any court case in which I am involved, likely, because such evidence does not exist, again, empirically and at some unbiased, vetted, and/or generalizable level. While Thomas Kane has introduced some of his MET study findings in the cases in Houston and New Mexico, these might be some of the easiest pieces of evidence to target, accordingly, given the issues.
Otherwise, the only thing I can say from reading this piece that with which I agree, as that which I view, given the research literature as true and good, is that now teachers are being observed more often, by more people, in more depth, and in perhaps some cases with better observational instruments. Accordingly, teachers, also as per the research, seem to appreciate and enjoy the additional and more frequent/useful feedback and discussions about their practice, as increasingly offered. This, I would agree is something that is very positive that has come out of the nation’s policy-based focus on its “new and improved” teacher evaluation systems, again, as largely required by the federal government, especially pre-Every Student Succeeds Act (ESSA).
Overall, and in sum, “the research reveals that comprehensive teacher-evaluation models are stronger than the sum of their parts.” Unfortunately again, however, this is untrue in that systems based on multiple measures are entirely limited by the indicator that, in educational measurement terms, performs the worst. While such a holistic view is ideal, in measurement terms the sum of the parts is entirely limited by the weakest part. This is currently the value-added indicator (i.e., with the lowest levels of reliability and, related, issues with validity and bias) — the indicator at issue within this particular blog, and the indicator of the most interest, as it is this indicator that has truly changed our overall approaches to the evaluation of America’s teachers. It has yet to deliver, however, especially if to be used for high-stakes consequential decision-making purposes (e.g., incentives, getting rid of “bad apples”).
Feel free to read more here, as publicly available: Grading the Teachers: A Report on Teacher Evaluation Reform in Public Education. See also other claims regarding the benefits of said systems within (e.g., these systems as foundations for new teacher roles and responsibilities, smarter employment decisions, prioritizing classrooms, increased focus on improved standards). See also the recommendations offered, some with which I agree on the observational side (e.g., ensuring that teachers receive multiple observations during a school year by multiple evaluators), and none with which I agree on the value-added side (e.g., use at least two years of student achievement data in teacher evaluation ratings–rather, researchers agree that three years of value-added data are needed, as based on at least four years of student-level test data). There are, of course, many other recommendations included. You all can be the judges of those.