The Gates Foundation’s Expensive ($335 Million) Teacher Evaluation Missteps

The header of an Education Week article released last week (click here) was that “[t]he Bill & Melinda Gates Foundation’s multi-million-dollar, multi-year effort aimed at making teachers more effective largely fell short of its goal to increase student achievement-including among low-income and minority students.”

An evaluation of Gates Foundation’s Intensive Partnerships for Effective Teaching initiative funded at $290 million, an extension of its Measures of Effective Teaching (MET) project funded at $45 million, was the focus of this article. The MET project was lead by Thomas Kane (Professor of Education and Economics at Harvard, former leader of the MET project, and expert witness on the defendant’s side of the ongoing lawsuit supporting New Mexico’s MET project-esque statewide teacher evaluation system; see here and here), and both projects were primarily meant to hold teachers accountable using their students test scores via growth or value-added models (VAMs) and financial incentives. Both projects were tangentially meant to improve staffing, professional development opportunities, improve the retention of the teachers of “added value,” and ultimately lead to more-effective teaching and student achievement, especially in low-income schools and schools with higher relative proportions of racial minority students. The six-year evaluation of focus in this Education Week article was conducted by the RAND Corporation and the American Institutes for Research, and the evaluation was also funded by the Gates Foundation (click here for the evaluation report, see below for the full citation of this study).

Their key finding was that Intensive Partnerships for Effective Teaching district/school sites (see them listed here) implemented new measures of teaching effectiveness and modified personnel policies, but they did not achieve their goals for students.

Evaluators also found (see also here):

  • The sites succeeded in implementing measures of effectiveness to evaluate teachers and made use of the measures in a range of human-resource decisions.
  • Every site adopted an observation rubric that established a common understanding of effective teaching. Sites devoted considerable time and effort to train and certify classroom observers and to observe teachers on a regular basis.
  • Every site implemented a composite measure of teacher effectiveness that included scores from direct classroom observations of teaching and a measure of growth in student achievement.
  • Every site used the composite measure to varying degrees to make decisions about human resource matters, including recruitment, hiring, placement, tenure, dismissal, professional development, and compensation.

Overall, the initiative did not achieve its goals for student achievement or graduation, especially for low-income and racial minority students. With minor exceptions, student achievement, access to effective teaching, and dropout rates were also not dramatically better than they were for similar sites that did not participate in the intensive initiative.

Their recommendations were as follows (see also here):

  • Reformers should not underestimate the resistance that could arise if changes to teacher-evaluation systems have major negative consequences.
  • A near-exclusive focus on teacher evaluation systems such as these might be insufficient to improve student outcomes. Many other factors might also need to be addressed, ranging from early childhood education, to students’ social and emotional competencies, to the school learning environment, to family support. Dramatic improvement in outcomes, particularly for low-income and racial minority students, will likely require attention to many of these factors as well.
  • In change efforts such as these, it is important to measure the extent to which each of the new policies and procedures is implemented in order to understand how the specific elements of the reform relate to outcomes.

Reference:

Stecher, B. M., Holtzman, D. J., Garet, M. S., Hamilton, L. S., Engberg, J., Steiner, E. D., Robyn, A., Baird, M. D., Gutierrez, I. A., Peet, E. D., de los Reyes, I. B., Fronberg, K., Weinberger, G., Hunter, G. P., & Chambers, J. (2018). Improving teaching effectiveness: Final report. The Intensive Partnerships for Effective Teaching through 2015–2016. Santa Monica, CA: The RAND Corporation. Retrieved from https://www.rand.org/pubs/research_reports/RR2242.html

A North Carolina Teacher’s Guest Post on His/Her EVAAS Scores

A teacher from the state of North Carolina recently emailed me for my advice regarding how to help him/her read and understand his/her recently received Education Value-Added Assessment System (EVAAS) value added scores. You likely recall that the EVAAS is the model I cover most on this blog, also in that this is the system I have researched the most, as well as the proprietary system adopted by multiple states (e.g., Ohio, North Carolina, and South Carolina) and districts across the country for which taxpayers continue to pay big $. Of late, this is also the value-added model (VAM) of sole interest in the recent lawsuit that teachers won in Houston (see here).

You might also recall that the EVAAS is the system developed by the now late William Sanders (see here), who ultimately sold it to SAS Institute Inc. that now holds all rights to the VAM (see also prior posts about the EVAAS here, here, here, here, here, and here). It is also important to note, because this teacher teaches in North Carolina where SAS Institute Inc. is located and where its CEO James Goodnight is considered the richest man in the state, that as a major Grand Old Party (GOP) donor “he” helps to set all of of the state’s education policy as the state is also dominated by Republicans. All of this also means that it is unlikely EVAAS will go anywhere unless there is honest and open dialogue about the shortcomings of the data.

Hence, the attempt here is to begin at least some honest and open dialogue herein. Accordingly, here is what this teacher wrote in response to my request that (s)he write a guest post:

***

SAS Institute Inc. claims that the EVAAS enables teachers to “modify curriculum, student support and instructional strategies to address the needs of all students.”  My goal this year is to see whether these claims are actually possible or true. I’d like to dig deep into the data made available to me — for which my state pays over $3.6 million per year — in an effort to see what these data say about my instruction, accordingly.

For starters, here is what my EVAAS-based growth looks like over the past three years:

As you can see, three years ago I met my expected growth, but my growth measure was slightly below zero. The year after that I knocked it out of the park. This past year I was right in the middle of my prior two years of results. Notice the volatility [aka an issue with VAM-based reliability, or consistency, or a lack thereof; see, for example, here].

Notwithstanding, SAS Institute Inc. makes the following recommendations in terms of how I should approach my data:

Reflecting on Your Teaching Practice: Learn to use your Teacher reports to reflect on the effectiveness of your instructional delivery.

The Teacher Value Added report displays value-added data across multiple years for the same subject and grade or course. As you review the report, you’ll want to ask these questions:

  • Looking at the Growth Index for the most recent year, were you effective at helping students to meet or exceed the Growth Standard?
  • If you have multiple years of data, are the Growth Index values consistent across years? Is there a positive or negative trend?
  • If there is a trend, what factors might have contributed to that trend?
  • Based on this information, what strategies and instructional practices will you replicate in the current school year? What strategies and instructional practices will you change or refine to increase your success in helping students make academic growth?

Yet my growth index values are not consistent across years, as also noted above. Rather, my “trends” are baffling to me.  When I compare those three instructional years in my mind, nothing stands out to me in terms of differences in instructional strategies that would explain the fluctuations in growth measures, either.

So let’s take a closer look at my data for last year (i.e., 2016-2017).  I teach 7th grade English/language arts (ELA), so my numbers are based on my students reading grade 7 scores in the table below.

What jumps out for me here is the contradiction in “my” data for achievement Levels 3 and 4 (achievement levels start at Level 1 and top out at Level 5, whereas levels 3 and 4 are considered proficient/middle of the road).  There is moderate evidence that my grade 7 students who scored a Level 4 on the state reading test exceeded the Growth Standard.  But there is also moderate evidence that my same grade 7 students who scored Level 3 did not meet the Growth Standard.  At the same time, the number of students I had demonstrating proficiency on the same reading test (by scoring at least a 3) increased from 71% in 2015-2016 (when I exceeded expected growth) to 76% in school year 2016-2017 (when my growth declined significantly). This makes no sense, right?

Hence, and after considering my data above, the question I’m left with is actually really important:  Are the instructional strategies I’m using for my students whose achievement levels are in the middle working, or are they not?

I’d love to hear from other teachers on their interpretations of these data.  A tool that costs taxpayers this much money and impacts teacher evaluations in so many states should live up to its claims of being useful for informing our teaching.

The Late Stephen Jay Gould on IQ Testing (with Implications for Testing Today)

One of my doctoral students sent me a YouTube video I feel compelled to share with you all. It is an interview with one of my all time favorite and most admired academics — Stephen Jay Gould. Gould, who passed away at age 60 from cancer, was a paleontologist, evolutionary biologist, and scientist who spent most of his academic career at Harvard. He was “one of the most influential and widely read writers of popular science of his generation,” and he was also the author of one of my favorite books of all time: The Mismeasure of Man (1981).

In The Mismeasure of Man Gould examined the history of psychometrics and the history of intelligence testing (e.g., the methods of nineteenth century craniometry, or the physical measures of peoples’ skulls to “objectively” capture their intelligence). Gould examined psychological testing and the uses of all sorts of tests and measurements to inform decisions (which is still, as we know, uber-relevant today) as well as “inform” biological determinism (i.e., “the view that “social and economic differences between human groups—primarily races, classes, and sexes—arise from inherited, inborn distinctions and that society, in this sense, is an accurate reflection of biology). Gould also examined in this book the general use of mathematics and “objective” numbers writ large to measure pretty much anything, as well as to measure and evidence predetermined sets of conclusions. This book is, as I mentioned, one of the best. I highly recommend it to all.

In this seven-minute video, you can get a sense of what this book is all about, as also so relevant to that which we continue to believe or not believe about tests and what they really are or are not worth. Thanks, again, to my doctoral student for finding this as this is a treasure not to be buried, especially given Gould’s 2002 passing.

Houston Lawsuit Update, with Summary of Expert Witnesses’ Findings about the EVAAS

Recall from a prior post that a set of teachers in the Houston Independent School District (HISD), with the support of the Houston Federation of Teachers (HFT) are taking their district to federal court to fight for their rights as professionals, and how their value-added scores, derived via the Education Value-Added Assessment System (EVAAS), have allegedly violated them. The case, Houston Federation of Teachers, et al. v. Houston ISD, is to officially begin in court early this summer.

More specifically, the teachers are arguing that EVAAS output are inaccurate, the EVAAS is unfair, that teachers are being evaluated via the EVAAS using tests that do not match the curriculum they are to teach, that the EVAAS system fails to control for student-level factors that impact how well teachers perform but that are outside of teachers’ control (e.g., parental effects), that the EVAAS is incomprehensible and hence very difficult if not impossible to actually use to improve upon their instruction (i.e., actionable), and, accordingly, that teachers’ due process rights are being violated because teachers do not have adequate opportunities to change as a results of their EVAAS results.

The EVAAS is the one value-added model (VAM) on which I’ve conducted most of my research, also in this district (see, for example, here, here, here, and here); hence, I along with Jesse Rothstein – Professor of Public Policy and Economics at the University of California – Berkeley, who also conducts extensive research on VAMs – are serving as the expert witnesses in this case.

What was recently released regarding this case is a summary of the contents of our affidavits, as interpreted by authors of the attached “EVAAS Litigation UPdate,” in which the authors declare, with our and others’ research in support, that “Studies Declare EVAAS ‘Flawed, Invalid and Unreliable.” Here are the twelve key highlights, again, as summarized by the authors of this report and re-summarized, by me, below:

  1. Large-scale standardized tests have never been validated for their current uses. In other words, as per my affidavit, “VAM-based information is based upon large-scale achievement tests that have been developed to assess levels of student achievement, but not levels of growth in student achievement over time, and not levels of growth in student achievement over time that can be attributed back to students’ teachers, to capture the teachers’ [purportedly] causal effects on growth in student achievement over time.”
  2. The EVAAS produces different results from another VAM. When, for this case, Rothstein constructed and ran an alternative, albeit sophisticated VAM using data from HISD both times, he found that results “yielded quite different rankings and scores.” This should not happen if these models are indeed yielding indicators of truth, or true levels of teacher effectiveness from which valid interpretations and assertions can be made.
  3. EVAAS scores are highly volatile from one year to the next. Rothstein, when running the actual data, found that while “[a]ll VAMs are volatile…EVAAS growth indexes and effectiveness categorizations are particularly volatile due to the EVAAS model’s failure to adequately account for unaccounted-for variation in classroom achievement.” In addition, volatility is “particularly high in grades 3 and 4, where students have relatively few[er] prior [test] scores available at the time at which the EVAAS scores are first computed.”
  4. EVAAS overstates the precision of teachers’ estimated impacts on growth. As per Rothstein, “This leads EVAAS to too often indicate that teachers are statistically distinguishable from the average…when a correct calculation would indicate that these teachers are not statistically distinguishable from the average.”
  5. Teachers of English Language Learners (ELLs) and “highly mobile” students are substantially less likely to demonstrate added value, as per the EVAAS, and likely most/all other VAMs. This, what we term as “bias,” makes it “impossible to know whether this is because ELL teachers [and teachers of highly mobile students] are, in fact, less effective than non-ELL teachers [and teachers of less mobile students] in HISD, or whether it is because the EVAAS VAM is biased against ELL [and these other] teachers.”
  6. The number of students each teacher teaches (i.e., class size) also biases teachers’ value-added scores. As per Rothstein, “teachers with few linked students—either because they teach small classes or because many of the students in their classes cannot be used for EVAAS calculations—are overwhelmingly [emphasis added] likely to be assigned to the middle effectiveness category under EVAAS (labeled “no detectable difference [from average], and average effectiveness”) than are teachers with more linked students.”
  7. Ceiling effects are certainly an issue. Rothstein found that in some grades and subjects, “teachers whose students have unusually high prior year scores are very unlikely to earn high EVAAS scores, suggesting that ‘ceiling effects‘ in the tests are certainly relevant factors.” While EVAAS and HISD have previously acknowledged such problems with ceiling effects, they apparently believe these effects are being mediated with the new and improved tests recently adopted throughout the state of Texas. Rothstein, however, found that these effects persist even given the new and improved.
  8. There are major validity issues with “artificial conflation.” This is a term I recently coined to represent what is happening in Houston, and elsewhere (e.g., Tennessee), when district leaders (e.g., superintendents) mandate or force principals and other teacher effectiveness appraisers or evaluators, for example, to align their observational ratings of teachers’ effectiveness with value-added scores, with the latter being the “objective measure” around which all else should revolve, or align; hence, the conflation of the one to match the other, even if entirely invalid. As per my affidavit, “[t]o purposefully and systematically endorse the engineering and distortion of the perceptible ‘subjective’ indicator, using the perceptibly ‘objective’ indicator as a keystone of truth and consequence, is more than arbitrary, capricious, and remiss…not to mention in violation of the educational measurement field’s Standards for Educational and Psychological Testing” (American Educational Research Association (AERA), American Psychological Association (APA), National Council on Measurement in Education (NCME), 2014).
  9. Teaching-to-the-test is of perpetual concern. Both Rothstein and I, independently, noted concerns about how “VAM ratings reward teachers who teach to the end-of-year test [more than] equally effective teachers who focus their efforts on other forms of learning that may be more important.”
  10. HISD is not adequately monitoring the EVAAS system. According to HISD, EVAAS modelers keep the details of their model secret, even from them and even though they are paying an estimated $500K per year for district teachers’ EVAAS estimates. “During litigation, HISD has admitted that it has not performed or paid any contractor to perform any type of verification, analysis, or audit of the EVAAS scores. This violates the technical standards for use of VAM that AERA specifies, which provide that if a school district like HISD is going to use VAM, it is responsible for ‘conducting the ongoing evaluation of both intended and unintended consequences’ and that ‘monitoring should be of sufficient scope and extent to provide evidence to document the technical quality of the VAM application and the validity of its use’ (AERA Statement, 2015).
  11. EVAAS lacks transparency. AERA emphasizes the importance of transparency with respect to VAM uses. For example, as per the AERA Council who wrote the aforementioned AERA Statement, “when performance levels are established for the purpose of evaluative decisions, the methods used, as well as the classification accuracy, should be documented and reported” (AERA Statement, 2015). However, and in contrast to meeting AERA’s requirements for transparency, in this district and elsewhere, as per my affidavit, the “EVAAS is still more popularly recognized as the ‘black box’ value-added system.”
  12. Related, teachers lack opportunities to verify their own scores. This part is really interesting. “As part of this litigation, and under a very strict protective order that was negotiated over many months with SAS [i.e., SAS Institute Inc. which markets and delivers its EVAAS system], Dr. Rothstein was allowed to view SAS’ computer program code on a laptop computer in the SAS lawyer’s office in San Francisco, something that certainly no HISD teacher has ever been allowed to do. Even with the access provided to Dr. Rothstein, and even with his expertise and knowledge of value-added modeling, [however] he was still not able to reproduce the EVAAS calculations so that they could be verified.”Dr. Rothstein added, “[t]he complexity and interdependency of EVAAS also presents a barrier to understanding how a teacher’s data translated into her EVAAS score. Each teacher’s EVAAS calculation depends not only on her students, but also on all other students with- in HISD (and, in some grades and years, on all other students in the state), and is computed using a complex series of programs that are the proprietary business secrets of SAS Incorporated. As part of my efforts to assess the validity of EVAAS as a measure of teacher effectiveness, I attempted to reproduce EVAAS calculations. I was unable to reproduce EVAAS, however, as the information provided by HISD about the EVAAS model was far from sufficient.”

Houston’s “Split” Decision to Give Superintendent Grier $98,600 in Bonuses, Pre-Resignation

States of attention on this blog, and often of (dis)honorable mention as per their state-level policies bent on value-added models (VAMs), include Florida, New York, Tennessee, and New Mexico. As for a quick update about the latter state of New Mexico, we are still waiting to hear the final decision from the judge who recently heard the state-level lawsuit still pending on this matter in New Mexico (see prior posts about this case here, here, here, here, and here).

Another locale of great interest, though, is the Houston Independent School District. This is the seventh largest urban school district in the nation, and the district that has tied more high-stakes consequences to their value-added output than any other district/state in the nation. These “initiatives” were “led” by soon-to-resign/retire Superintendent Terry Greir who, during his time in Houston (2009-2015), implemented some of the harshest consequences ever attached to teacher-level value-added output, as per the district’s use of the Education Value-Added Assessment System (EVAAS) (see other posts about the EVAAS here, here, and here; see other posts about Houston here, here, and here).

In fact, the EVAAS is still used throughout Houston today to evaluate all EVAAS-eligible teachers, to also “reform” the district’s historically low-performing schools, by tying teachers’ purported value-added performance to teacher improvement plans, merit pay, nonrenewal, and termination (e.g., 221 Houston teachers were terminated “in large part” due to their EVAAS scores in 2011). However, pending litigation (i.e., this is the district in which the American and Houston Federation of Teachers (AFT/HFT) are currently suing the district for their wrongful use of, and over-emphasis on this particular VAM; see here), Superintendent Grier and the district have recoiled on some of the high-stakes consequences they formerly attached to the EVAAS  This particular lawsuit is to commence this spring/summer.

Nonetheless, my most recent post about Houston was about some of its future school board candidates, who were invited by The Houston Chronicle to respond to Superintendent Grier’s teacher evaluation system. For the most part, those who responded did so unfavorably, especially as the evaluation systems was/is disproportionately reliant on teachers’ EVAAS data and high-stakes use of these data in particular (see here).

Most recently, however, as per a “split” decision registered by Houston’s current school board (i.e., 4:3, and without any new members elected last November), Superintendent Grier received a $98,600 bonus for his “satisfactory evaluation” as the school district’s superintendent. See more from the full article published in The Houston Chronicle. As per the same article, Superintendent “Grier’s base salary is $300,000, plus $19,200 for car and technology allowances. He also is paid for unused leave time.”

More importantly, take a look at the two figures below, taken from actual district reports (see references below), highlighting Houston’s performance (declining, on average, in blue) as compared to the state of Texas (maintaining, on average, in black), to determine for yourself whether Superintendent Grier, indeed, deserved such a bonus (not to mention salary).

Another question to ponder is whether the district’s use of the EVAAS value-added system, especially since Superintendent Grier’s arrival in 2009, is actually reforming the school district as he and other district leaders have for so long now intended (e.g., since his Superintendent appointment in 2009).

Figure 1

Figure 1. Houston (blue trend line) v. Texas (black trend line) performance on the state’s STAAR tests, 2012-2015 (HISD, 2015a)

Figure 2

Figure 2. Houston (blue trend line) v. Texas (black trend line) performance on the state’s STAAR End-of-Course (EOC) tests, 2012-2015 (HISD, 2015b)

References:

Houston Independent School District (HISD). (2015a). State of Texas Assessments of Academic Readiness (STAAR) performance, grades 3-8, spring 2015. Retrieved here.

Houston Independent School District (HISD). (2015b). State of Texas Assessments of Academic Readiness (STAAR) end-of-course results, spring 2015. Retrieved here.

“Efficiency” as a Constitutional Mandate for Texas’s Educational System

The Texas Constitution requires that the state “establish and make suitable provision for the support and maintenance of an efficient system of public free schools,” as the “general diffusion of knowledge [is]…essential to the preservation of the liberties and rights of the people.” Following this notion, The George W. Bush Institute’s Education Reform Initiative  recently released its first set of reports as part of its The Productivity for Results Series: “A Legal Lever for Enhancing Productivity.” The report was authored by an affiliate of The New Teacher Project (TNTP) – the non-profit organization founded by the controversial former Chancellor of Washington DC’s public schools Michelle Rhee; an unknown and apparently unaffiliated “education researcher” named Krishanu Sengupta; and Sandy Kress, the “key architect of No Child Left Behind [under the presidential leadership of George W. Bush] who later became a lobbyist for Pearson, the testing company” (see, for example, here).

Authors of this paper review the economic and education research (although if you look through the references the strong majority of pieces come from economics research, which makes sense as this is an economically driven venture) to identify characteristics that typify enterprises that are efficient. More specifically, the authors use the principles of x-efficiency set out in the work of the highly respected Henry Levin that require efficient organizations, in this case as (perhaps inappropriately) applied to schools, to have: 1) Clear objective outcomes with measurable outcomes; 2) Incentives that are linked to success on the objective function; 3) Efficient access to useful information for decisions; 4) Adaptability to meet changing conditions; and 5) Use of the most productive technology consistent with cost constraints.

The authors also advance another series of premises, as related to this view of x-efficiency and its application to education/schools in Texas: (1) that “if Texas is committed to diffusing knowledge efficiently, as mandated by the state constitution, it should ensure that the system for putting effective teachers in classrooms and effective materials in the hands of teachers and students is characterized by the principles that undergird an efficient enterprise, such as those of x-efficiency;” (2) this system must include value-added measurement systems (i.e., VAMs), as deemed throughout this paper as not only constitutional but also rational and in support of x-efficiency; (3) given “rational policies for teacher training, certification, evaluation, compensation, and dismissal are key to an efficient education system;” (4) “the extent to which teacher education programs prepare their teachers to achieve this goal should [also] be [an] important factor;”  (5) “teacher evaluation systems [should also] be properly linked to incentives…[because]…in x-efficient enterprises, incentives are linked to success in the objective function of the organization;” (6) which is contradictory with current, less x-efficient teacher compensation systems that link incentives to time on the job, or tenure, rather than to “the success of the organization’s function; (6), in the end, “x-efficient organizations have efficient access to useful information for decisions, and by not linking teacher evaluations to student achievement, [education] systems [such as the one in Texas will] fail to provide the necessary information to improve or dismiss teachers.”

The two districts highlighted as being most x-efficient in Texas, and in this report include, to no surprise: “Houston [which] adds a value-added system to reward teachers, with student performance data counting for half of a teacher’s overall rating. HISD compares students’ academic growth year to year, under a commonly used system called EVAAS.” We’ve discussed not only this system but also its use in Houston often on this blog (see, for example, here, here, and here). Teachers in Houston who consistently perform poorly can be fired for “insufficient student academic growth as reflected by value added scores…In 2009, before EVAAS became a factor in terminations, 36 of 12,000 teachers were fired for performance reasons, or .3%, a number so low the Superintendent [Terry Grier] himself called the dismissal system into question. From 2004-2009, the district
fired or did not renew 365 teachers, 140 for “performance reasons,” including poor discipline management, excessive absences, and a lack of student progress. In 2011, 221 teacher contracts were not renewed, multiple for “significant lack of student progress attributable to the educator,” as well as “insufficient student academic growth reflected by [SAS EVAAS] value-added scores….In the 2011-12 school year, 54% of the district’s low-performing teachers were dismissed.” That’s “progress,” right?!?

Anyhow, for those of you who have not heard, this same (controversial) Superintendent, who pushed this system throughout his district is retiring (see, for example, here).

The other district of (dis)honorable mention was Dallas Independent School district; it also uses a VAM called the Classroom Effectiveness Index (CIE), although I know less about this system as I have never examined or researched it myself, nor have I read really anything about it. But in 2012, the district’s Board “decided not to renew 259 contracts due to poor performance, five times more than the previous year.” The “progress” such x-efficiency brings…

What is still worrisome to the authors, though, is that “[w]hile some districts appear to be increasing their efforts to eliminate ineffective teachers, the percentage of teachers dismissed for any reason, let alone poor performance, remains well under one percent in the state’s largest districts.” Related, and I preface this one noting that this next argument is one of the most over-cited and hyper-utilized by organizations backing “initiatives” or “reforms” such as these, that this “falls well below the five to eight percent that Hanushek calculates would elevate achievement to internationally competitive levels” “Calculations by Eric Hanushek of Stanford University show that removing the bottom five percent of teachers in the United States and replacing them with teachers of average effectiveness would raise student achievement in the U.S. 0.4 standard deviations, to the level of student achievement in Canada. Replacing the bottom eight percent would raise student achievement to the level of Finland, a top performing country on international assessments.” As Linda Darling-Hammond, also of Stanford would argue, we cannot simply “fire our way to Finland.” Sorry Eric! But this is based on econometric predictions, and no evidence exists whatsoever that this is in fact a valid inference. Nontheless, it is cited over and over again by the same/similar folks (such as the authors of this piece) to justify their currently trendy educational reforms.

The major point here, though, is that “if Texas wanted to remove (or improve) the bottom five to eight percent of its teachers, the current evaluation system would not be able to identify them;” hence, the state desperately needs a VAM-based system to do this. Again,  no research to counter this or really any claim is included in this piece; only the primarily economics-based literatures were selected in support.

In the end, though, the authors conclude that “While the Texas Constitution has established a clear objective function for the state school system and assessments are in place to measure the outcome, it does not appear that the Texas education system shares the other four characteristics of x-efficient enterprises as identified by Levin. Given the constitutional mandate for efficiency and the difficult economic climate, it may be a good time for the state to remedy this situation…[Likewise] the adversity and incentives may now be in place for Texas to focus on improving the x-efficiency of its school system.”

As I know and very much respect Henry Levin (see, for example, an interview I conducted with him a few years ago, with the shorter version here and the longer version here), I’d be curious to know what his response might be to the authors’ use of his x-efficiency framework to frame such neo-conservative (and again trendy) initiatives and reforms. Perhaps I will email him…

EVAAS, Value-Added, and Teacher Branding

I do not think I ever shared this video out, and now following up on another post, about the potential impact these videos should really have, I thought now is an appropriate time to share. “We can be the change,” and social media can help.

My former doctoral student and I put together this video, after conducting a study with teachers in the Houston Independent School District and more specifically four teachers whose contracts were not renewed due in large part to their EVAAS scores in the summer of 2011. This video (which is really a cartoon, although it certainly lacks humor) is about them, but also about what is happening in general in their schools, post the adoption and implementation (at approximately $500,000/year) of the SAS EVAAS value-added system.

To read the full study from which this video was created, click here. Below is the abstract.

The SAS Educational Value-Added Assessment System (SAS® EVAAS®) is the most widely used value-added system in the country. It is also self-proclaimed as “the most robust and reliable” system available, with its greatest benefit to help educators improve their teaching practices. This study critically examined the effects of SAS® EVAAS® as experienced by teachers, in one of the largest, high-needs urban school districts in the nation – the Houston Independent School District (HISD). Using a multiple methods approach, this study critically analyzed retrospective quantitative and qualitative data to better comprehend and understand the evidence collected from four teachers whose contracts were not renewed in the summer of 2011, in part given their low SAS® EVAAS® scores. This study also suggests some intended and unintended effects that seem to be occurring as a result of SAS® EVAAS® implementation in HISD. In addition to issues with reliability, bias, teacher attribution, and validity, high-stakes use of SAS® EVAAS® in this district seems to be exacerbating unintended effects.

Economists Declare Victory for VAMs

On a popular economics site, fivethirtyeight.com, authors use “hard numbers” to tell compelling stories, and this time the compelling story told is about value-added models and all of the wonders, thanks to the “hard numbers” derived via model output, they are working to reform the way “we” evaluate and hold teachers accountable for their effects.

In an article titled “The Science Of Grading Teachers Gets High Marks,” this site’s “quantitative editor” (?!?) – Andrew Flowers – writes about how “the science” behind using “hard numbers” to evaluate teachers’ effects is, fortunately for America and thanks to the efforts of (many/most) econometricians, gaining much-needed momentum.

Not to really anyone’s surprise, the featured economics study of this post is…wait for it…the Chetty et al. study at focus of much controversy and many prior posts on this blog (see for example here, here, here, and here). This is the study cited in President Obama’ 2012 State of the Union address when he said that, “We know a good teacher can increase the lifetime income of a classroom by over $250,000,” and this study was more recently the focus of attention when the judge in Vergara v. California cited Chetty et al.’s study as providing evidence that “a single year in a classroom with a grossly ineffective teacher costs students $1.4 million in lifetime earnings per classroom.”

These are the “hard numbers” that have since been duly critiqued by scholars from California to New York since (see, for example, here, here, here, and here), but that’s not mentioned in this post. What is mentioned, however, is the notable work of economist Jesse Rothstein, whose work I have also cited in prior posts (see, for example, here, here, here, and here,), as he has also countered Chetty et al.’s claims, not to mention added critical research to the topic on VAM-based bias.

What is also mentioned, not to really anyone’s surprise again, though, is that Thomas Kane – a colleague of Chetty’s at Harvard who has also been the source of prior VAMboozled! posts (see, for example, here, here, and here), who also replicated Chetty’s results as notably cited/used during the Vergara v. California case last summer, endorses Chetty’s work throughout this same article. Article author “reached out” to Kane “to get more perspective,” although I, for one, question how random this implied casual reach really was… Recall a recent post about our “(Unfortunate) List of VAMboozlers?” Two of our five total honorees include Chetty and Kane – the same two “hard number” economists prominently featured in this piece.

Nonetheless, this article’s “quantitative editor” (?!?) Flowers sides with them (i.e., Chetty and Kane), and ultimately declares victory for VAMs, writing that VAMs ultimately and “accurately isolate a teacher’s impact on students”…”[t]he implication[s] being, school administrators can legitimately use value-added scores to hire, fire and otherwise evaluate teacher performance.”

This “cutting-edge science,” as per a quote taken from Chetty’s co-author Friedman (Brown University), captures it all: “It’s almost like we’re doing real, hard science…Well, almost. But by the standards of empirical social science — with all its limitations in experimental design, imperfect data, and the hard-to-capture behavior of individuals — it’s still impressive….[F]or what has been called the “credibility revolution” in empirical economics, it’s a win.”

Five Nashville Teachers Face Termination

Three months ago, Tennessee Schools Director Jesse Register announced he was to fire 63 Tennessean teachers, of 195 total who for two consecutive years scored lowest (i.e., a 1 on a scale of 1 to 5) in terms of their overall “value-added” (as based on 35% EVAAS, 15% related “student achievement,” and 50% observational data). These 63 were the ones who three months ago were still “currently employed,” given the other 132 apparently left voluntarily or retired (which is a nice-sized cost-savings, so let’s be sure not to forget about the economic motivators behind all of this as well). To see a better breakdown of these numbers, click here.

This was to be the first time in Tennessee that its controversial, and “new and improved” teacher evaluation system would be used to take deliberate action against whom they deemed their “lowest-performing” teachers, as “objectively” identified in the classroom; although, officials at that time did not expect to have a “final number” to be terminated until fall.

Well, fall is here, and it seems this final number is officially five: three middle school teachers, one elementary school teacher, and one high school teacher, all teaching in metro Nashville.

The majority of these teachers come from Neely’s Bend: “one of 14 Nashville schools on the state’s priority list for operating at the bottom 5 percent in performance statewide.” Some of these teachers were evaluated even though their principal who evaluated them is  “no longer there.” Another is a computer instructor being terminated as based on this school’s overall “school-level value-added.” This is problematic in and of itself given teacher-level and in this case school-level bias seem to go hand in hand with the use of these models, and grossly interfere with accusations that these teachers “caused” low performance (see a recent post about this here).

It’s not to say these teachers were not were indeed the lowest performing; maybe they were. But I for one would love to talk to these teachers and take a look at their actual data, EVAAS and observational data included. Based on prior experiences working with such individuals, there may be more to this than what it seems. Hence, if anybody knows these folks, do let them know I’d like to better understand their stories.

Otherwise, all of this effort to ultimately attempt to terminate five of a total 5,685 certified teachers in the district (0.09%) seems awfully inefficient, and costly, and quite frankly absurd given this is a “new and improved” system meant to be much better than a prior system that likely yielded a similar termination rate, not including, however, those who left voluntarily prior.

Perhaps an ulterior motive is, indeed, the cost-savings realized given the mere “new and improved” threat.

Gene Glass v. Chetty and Kane

Following the Vergara v. California decision now two weeks ago (see three recent VAMboozled posts about this court decision here, here, and here), one of my mentors and still friends/colleagues – Gene Glass, Regents’ Professor Emeritus at ASU – wrote the following on his twitter and Facebook account about Raj Chetty’s and Tom Kane’s testimonies, testimonies that played key roles in the judge’s (unfortunate) decision ruling teacher tenure as unconstitutional.

Testimony in Vergara by Harvard profs. Does anybody–other than Judge Treu–really believe these guys?! Amazing! — These dudes must be in love with their production functions. How anyone but a know-nothing judge could buy this stuff is amazing to me. A bad teacher costs a class of 28 kids $1.4 Million in life-time earnings. Really? What do a couple of lousy economists cost society?

Embedded image permalink

Glass then followed up with a comment: “Aristotle in Nichomachean Ethics: “We must not expect more precision than the subject-matter admits.” (Chp. 3)…after which a Facebook friend replied…”Aristotle was 2.13 times more effective than Plato at teaching Aristotle’s ethics.” Funny!

Anyhow, somebody else at The Becoming Radical seemed to come across this post and wrote another blog post around it titled, “The Very Disappointing Teacher Impact Numbers from Chetty.” To read more about what this author has titled the now “very famous” but still “mostly hypothetical” Chetty et al. study and what their numbers ACTUALLY mean, even if true and accurate in the real world, click here.