Two Questions Every Presidential Candidate Needs to Answer about Education

In an article released by Truthout, author Paul Thomas recently released an article titled “Five Questions Every Presidential Candidate Needs to Answer About Education.” You can read all five questions framed by Thomas here, but I want to highlight for you all the two (of the five) questions that most closely relate to the issues with which we deal via this blog.

Here are the two questions about education reform in this particular arena that every candidate should have to answer, and a few of Thomas’s words about why these questions matter.

1. While state and national leaders in education have repeatedly noted the importance of teacher quality – while also misrepresenting that importance [emphasis added] – increasing standards-based teaching, high-stakes testing and value-added methods of teacher evaluation, along with the dismantling unions, have de-professionalized teaching and discouraged young people from entering the field. How will you work to return professionalism and autonomy to teachers?

The teacher quality debate is the latest phase of accountability linked to test scores that started with school and student accountability in the 1980s and ’90s. While everyone can agree that teacher quality is important, the real issues are how we measure that impact and how we separate teacher and even school quality effects from the much larger and more powerful impact of out-of-school factors – that account for about 60% to 86% of measurable student learning.

The misguided focus on teacher quality – linking evaluations significantly to test scores – has begun to have a negative impact on teacher quality, precisely because such measures de-professionalize educators. Teachers typically want autonomy, administrative and parental support, and conditions (such as appropriate materials and smaller class sizes) that increase their professionalism and effectiveness – not raises [although I think given sweeping cuts to teacher salaries that have accompanied all of this craziness, raises would also certainly help situations, at least to bring many teachers back to what they were making before such reforms were worked to justify salary cuts].

2. Since the era of high-stakes accountability initiated in the early 1980s has not, in fact, closed the achievement gap, can you commit to ending accountability-based education reform, including a significant reduction in high-stakes testing, and then detail reform based on equity of opportunities for all students?

Andre Perry, former founding dean of urban education at Davenport University, has noted that accountability has failed educational equity because “having the ability to compare performances among groups hasn’t brought educational justice to black and brown students.”

More useful models do exist. The National Education Policy Center has called for identifying and recognizing school reform that addresses equity for all students through their Schools of Opportunity project. This model shifts the focus on reform away from narrow measures, test data and punitive policies and toward creating the opportunities that affluent students tend to have but poor and racial minorities are denied, including access to experienced and certified teachers, rich and diverse courses, small class sizes, and well-funded safe schools. [I too would certainly support a paradigm shift away from focusing on accountability to opportunities, no doubt!]

***

Citation: Thomas, P. (2015). Five questions every presidential candidate needs to answer about education. Truthout. Retrieved from http://www.truth-out.org/opinion/item/32472-five-questions-every-presidential-candidate-needs-to-answer-about-education

National Opinion Poll on Testing and Test-Based Teacher Accountability

In an article titled “Testing Doesn’t Measure up for Americans,” just released by Phi Delta Kappa (PDK) International in partnership with the flagship professional polling organization Gallup, you can read the results of the annual public poll that PDK/Gallup conduct every year regarding the public’s thoughts on America’s public education system. Pollers polled a nationally representative samples of 3,499 Americans over 18 via the web and 1,001 Americans over 18 via phone. As most pertinent to our purposes here, here are the key findings with which you all might be most interested.

“Americans agree that there is too much testing in schools, but few parents report that their children are complaining about excessive testing. Most Americans believe parents should have the right to opt out of standardized testing, but few said they
would exercise that option themselves.”

  • 55% of Americans and 61% of public school parents oppose including student
    scores on standardized tests as part of teacher evaluations.
  • To measure the effectiveness of the public schools, using tests came in last as a measure of effectiveness with just 14% of public school parents valuing test scores as very important for such measurements.
  • 64% of Americans and a similar proportion of public school parents said there is too much emphasis on standardized testing in the public schools in their community with just 7% believing there’s not enough.
  • When asked what ideas were most important for improving public schools
    in their community from a list of five options, testing ranked last in importance
    once again.
  • Americans split on whether parents should be allowed to excuse their child from taking one or more standardized tests: 41% said yes, 44% said no.
  • A majority of public school parents said they would not excuse their own child from taking a standardized test; nearly one-third said they would excuse their own child.
  • Lack of financial support is the biggest problem facing American schools, according to respondents to the PDK/Gallup poll. That’s been a consistent message from the public for the past 10 years. Having sufficient money to spend would improve the quality of the public schools, according to a sizeable portion of American adults.

*****

Citation: Phi Delta Kappa (PDK) International/Gallup. (2015). The 47the Annual PDK/Gallup poll of the public’s attitudes toward the public schools. Phi Delta Kappan, 97(1). Retrieved from http://pdkpoll2015.pdkintl.org/wp-content/uploads/2015/08/pdkpoll47_2015.pdf

New Research Study: Controlling for Student Background Variables Matters

An article about the “Sensitivity of Teacher Value-Added Estimates to Student and Peer Control Variables” was recently published in the peer-reviewed Journal of Research on Educational Effectiveness. While this article is not open-access, here is a link to the article released by Mathematica prior, which nearly mirrors the final published article.

In this study, researchers Matthew Johnson, Stephen Lipscomb, and Brian Gill, all of whom are associated with Mathematica, examined the sensitivity and precision of various VAMs, the extent to which their estimates vary given whether modelers include student- and peer-level background characteristics as control variables, and the extent to which their estimates vary given whether modelers include 1+ years of students’ prior achievement scores, also as control variables. They did this while examining state data, as also compared to what they called District X – a district within the state with three-times more African-American students, two-times more students receiving free or reduced-price lunch, and generally more special educations students than the state average. While the state data included more students, the district data included additional control variables, supporting researchers’ analyses.

Here are the highlights, with thanks to lead author Matthew Johnson for edits and clarifications.

  • Different VAMs produced similar results overall [when using the same data], almost regardless of specifications. “[T]eacher estimates are highly correlated across model specifications. The correlations [they] observe[d] in the state and district data range[d] from 0.90 to 0.99 relative to [their] baseline specification.”

This has been something that has been evidenced in the literature prior, when the same models are applied to the same datasets taken from the same sets of tests at the same time. Hence, many critics argue that similar results come about when using different VAMs on the same data because conditions are such that when using the same, fallible, large-scale standardized test data, even the most sophisticated models are processing “garbage in” and “garbage out.” When the tests used and inserted into the same VAM vary, however, even if the tests used are measuring the same constructs (e.g., mathematics learning in grade X), things go haywire. For more information about this, please see Papay (2010) here.

  • However, “even correlation coefficients above 0.9 do not preclude substantial amounts of potential misclassification of teachers across performance categories.” The researchers also found that, even with such consistencies, 26% of teachers rated in the bottom quintile were placed in higher performance categories under an alternative model.
  • Modeling choices impacted the rankings of teachers in District X in “meaningful” ways, given District X’s varying student demographics compared with those in the state overall. In other words, VAMs that do not include all relevant student characteristics can penalize teachers in districts that serve students who are more disadvantaged than statewide averages.

See an article related to whether VAMs can include all relevant student characteristics, also given the non-random assignment of students to teachers (and teachers to classrooms) here.

*****

Original Version Citation: Johnson, M., Lipscomb, S., & Gill, B. (2013). Sensitivity of teacher value-added estimates to student and peer control variables. Princeton, NJ: Mathematica Policy Research.

Published Version Citation: Johnson, M., Lipscomb, S., & Gill, B. (2013). Sensitivity of teacher value-added estimates to student and peer control variables. Journal of Research on Educational Effectiveness, 8(1), 60-83. doi:10.1080/19345747.2014.967898

Why Gene Glass is No Longer a Measurement Specialist

One of my mentors – Dr. Gene Glass (formerly at ASU and now at Boulder) wrote a letter earlier this week on his blog, titled “Why I Am No Longer a Measurement Specialist.” This is a must read for all of you following the current policy trends not only surrounding teacher-level accountability, but also high-stakes testing in general.

Gene – one of the most well-established and well-known measurement specialists in and outside of the field of education, world renowned for developing “meta-analysis,” writes:

I was introduced to psychometrics in 1959. I thought it was really neat.By 1960, I was programming a computer on a psychometrics research project funded by the Office of Naval Research. In 1962, I entered graduate school to study educational measurement under the top scholars in the field.

My mentors – both those I spoke with daily and those whose works I read – had served in WWII. Many did research on human factors — measuring aptitudes and talents and matching them to jobs. Assessments showed who were the best candidates to be pilots or navigators or marksmen. We were told that psychometrics had won the war; and of course, we believed it.

The next wars that psychometrics promised it could win were the wars on poverty and ignorance. The man who led the Army Air Corps effort in psychometrics started a private research center. (It exists today, and is a beneficiary of the millions of dollars spent on Common Core testing.) My dissertation won the 1966 prize in Psychometrics awarded by that man’s organization. And I was hired to fill the slot recently vacated by the world’s leading psychometrician at the University of Illinois. Psychometrics was flying high, and so was I.

Psychologists of the 1960s & 1970s were saying that just measuring talent wasn’t enough. Talents had to be matched with the demands of tasks to optimize performance. Measure a learning style, say, and match it to the way a child is taught. If Jimmy is a visual learner, then teach Jimmy in a visual way. Psychometrics promised to help build a better world. But twenty years later, the promises were still unfulfilled. Both talent and tasks were too complex to yield to this simple plan. Instead, psychometricians grew enthralled with mathematical niceties. Testing in schools became a ritual without any real purpose other than picking a few children for special attention.

Around 1980, I served for a time on the committee that made most of the important decisions about the National Assessment of Educational Progress. The project was under increasing pressure to “grade” the NAEP results: Pass/Fail; A/B/C/D/F; Advanced/Proficient/Basic. Our committee held firm: such grading was purely arbitrary, and worse, would only be used politically. The contract was eventually taken from our organization and given to another that promised it could give the nation a grade, free of politics. It couldn’t.

Measurement has changed along with the nation. In the last three decades, the public has largely withdrawn its commitment to public education. The reasons are multiple: those who pay for public schools have less money, and those served by the public schools look less and less like those paying taxes.

The degrading of public education has involved impugning its effectiveness, cutting its budget, and busting its unions. Educational measurement has been the perfect tool for accomplishing all three: cheap and scientific looking.

International tests have purported to prove that America’s schools are inefficient or run by lazy incompetents. Paper-and-pencil tests seemingly show that kids in private schools – funded by parents – are smarter than kids in public schools. We’ll get to the top, so the story goes, if we test a teacher’s students in September and June and fire that teacher if the gains aren’t great enough.

There has been resistance, of course. Teachers and many parents understand that children’s development is far too complex to capture with an hour or two taking a standardized test. So resistance has been met with legislated mandates. The test company lobbyists convince politicians that grading teachers and schools is as easy as grading cuts of meat. A huge publishing company from the UK has spent $8 million in the past decade lobbying Congress. Politicians believe that testing must be the cornerstone of any education policy.

The results of this cronyism between corporations and politicians have been chaotic. Parents see the stress placed on their children and report them sick on test day. Educators, under pressure they see as illegitimate, break the rules imposed on them by governments. Many teachers put their best judgment and best lessons aside and drill children on how to score high on multiple-choice tests. And too many of the best teachers exit the profession.

When measurement became the instrument of accountability, testing companies prospered and schools suffered. I have watched this happen for several years now. I have slowly withdrawn my intellectual commitment to the field of measurement. Recently I asked my dean to switch my affiliation from the measurement program to the policy program. I am no longer comfortable being associated with the discipline of educational measurement.

Gene V Glass
Arizona State University
National Education Policy Center
University of Colorado Boulder

Stanford Professor Haertel: Short Video about VAM Reliability and Bias

I just came across this 3-minute video that you all might/should find of interest (click here for direct link to this video on YouTube; click here to view the video’s original posting on Stanford’s Center for Opportunity Policy in Education (SCOPE)).

Featured is Stanford’s Professor Emeritus – Dr. Edward Haertel – describing what he sees as two major flaws in the use of VAMs for teacher evaluation and accountability. These are two flaws serious enough, he argues, to prevent others from using VAM scores to make high-stakes decisions about really any of America’s public school teachers. “Like all measurements, these scores are imperfect. They are appropriate and useful for some purposes, but not for others. Viewed from a measurement perspective, value-added scores have limitations that make them unsuitable for high-stakes personnel decisions.”

The first problem is the unreliability of VAM scores which is attributed to noise from the data. The effect of a teacher is important, but weak when all of the other contributing factors are taken into account. The separation of the effect of a teacher from all the other effects is very difficult. This isn’t a flaw that can be fixed by more sophisticated statistical models; it is innate to the data collected.

The second problem is that the models must account for bias. The bias is the difference in circumstances faced by a teacher in a strong school and a teacher in a high-needs school. The instructional history of a student includes out of school support, peer support, and the academic learning climate of the school and VAMs do not take these important factors into account.

NY Teacher Lederman’s Day in Court

Do you recall the case of Sheri Lederman? The Long Island teacher who, apparently by all accounts other than her composite growth (or value-added) score is a terrific 4th grade/18 year veteran teacher, who received a score of 1 out of 20 after she scored a 14 out of 20 the year prior (see prior posts herehere and here; see also here and here)?

With her husband, attorney Bruce Lederman leading her case, she is suing the state of New York (the state in which Governor Cuomo is pushing to now have teachers’ value-added scores count for approximately 50% of their total evaluations) to challenge the state’s teacher evaluation system. She is also being fully supported by her students, her principal, her superintendent, and a series of VAM experts including: Linda Darling-Hammond (Stanford), Aaron Pallas (Columbia University Teachers College), Carol Burris (Educator and Principal of the Year from New York), Brad Lindell (Long Island Research Consultant), and me (Arizona State University) (see their/our expert witness affidavits here). See also an affidavit more recently submitted by Jesse Rothstein (Berkeley) here, as well as the full document explaining the entire case – the Memorandum of Law – here.

Well, the Ledermans had their day in court this past Wednesday (August 12, 2015).

It was apparent in the hearing that the Judge carefully read all the papers prior, and he was fully familiar with the issues. As per Bruce Lederman, “[t]he issue that seemed to catch the Judge’s attention the most was whether it was rational to have a system which decides in advance that 7% of teachers will be ineffective, regardless of actual results. The Judge asked numerous questions about whether it was fair to use a bell curve,” whereby when using a bell curve to distribute teachers’ growth or value-added scores, there will always be a set of “ineffective” teachers, regardless of whether in face they are truly “ineffective.” This occurs not naturally but by the statistical manipulation needed to fit all scores within the normal distribution needed to spread out the scores in order to make relative distinctions and categorizations (e.g., highly effective, effective, ineffective), the validity of which are highly uncertain (see, for example, a prior post here). Hence, “[t]he Judge pressed the lawyer representing New York’s Education Department very hard on this particular issue,” but the state’s lawyer did not (most likely because she could not) give the Judge a satisfactory explanation, justification, or rationale.

For more information on the case, see here the video that I feel best captures the case, thanks to CBS news in Albany. For another video see here, compliments of NBC news in Albany. See also two additional articles, here and here, with the latter including the photo of Sheri and Bruce Lederman pasted below.

a - ledermans_0

Individual-Level VAM Scores Over Time: “Less Reliable than Flipping a Coin”

In case you missed it (I did), an article authored by Stuart Yeh (Associate Professor at the University of Minnesota) titled “A Reanalysis of the Effects of Teacher Replacement Using Value-Added Modeling” was (recently) published in the esteemed, peer-reviewed journal: Teachers College Record. While the publication suggests a 2013 publication date, my understanding is that the actual article was more recently released.

Regardless, it’s contents are important to share, particularly in terms of VAM-based levels of reliability, whereas reliability is positioned as follows: “The question of stability [reliability/consistency] is not a question about whether average teacher performance rises, declines, or remains flat over time. The issue that concerns critics of VAM is whether individual teacher performance fluctuates over time in a way that invalidates inferences that an individual teacher is “low-” or “high-” performing. This distinction is crucial because VAM is increasingly being applied such that individual teachers who are identified as low-performing are to be terminated. From the perspective of individual teachers, it is inappropriate and invalid to fire a teacher whose performance is low this year but high the next year, and it is inappropriate to retain a teacher whose performance is high this year but low next year. Even if average teacher performance remains stable over time, individual teacher performance may fluctuate wildly from year to year” (p. 7).

Yeh’s conclusions, then (and as based on the evidence presented in this piece) is that “VAM is less reliable than flipping a coin for the purpose of categorizing high- and low-performing teachers” (p. 19). More specifically, VAMs have an estimated, overall error rate of 59% (see Endnote 2, page 26 for further explanation).

That being said, not only is the assumption that teacher quality is a fixed characteristic (i.e., that a high-performing teacher this year will be a high-performing teacher next year, and a low-performing teacher this year will be a low-performing teacher next year) false and not supported by the available data, albeit continuously assumed by many VAM proponents, (including Chetty et al.; see, for example, here, here, and here), prior estimates that using VAMs to identify teachers is no different than the flip of a coin may actually be an underestimate given current reliability estimates (see also Table 2, p. 19; see also p. 25, 26).

In section two of this article, for those of you following the Chetty et al. debates, Yeh also critiques other assumptions supporting and behind the Chetty et al. studies (see other, similar critiques here, here, here, and here). For example, Yeh critiques the VAM-based proposals to raise student achievement by (essentially) terminating low-value-added teachers. Here, the assumption is that “the use of VAM to identify and replace the lowest-performing 5% of teachers with average teachers would increase student achievement and would translate into sizable gains in the lifetime earnings of their students” (p. 2). However, because this also assumes that “there is an adequate supply of unemployed teachers who are ready and willing to be hired and would perform at a level that is 2.04 standard deviations above the performance of teachers who are fired based on value-added rankings [and] Chetty et al. do not justify this assumption with empirical data” (p. 14), this too proves much more idealistic than realistic in the grand scheme of things.

In section three of this article, for those of you generally interested in better and in this case more cost effective solutions, Yeh discusses a number of cost-effectiveness analyses comparing 22 leading approaches for raising student achievement, the results of which suggest that “the most efficient approach—rapid performance feedback (RPF)—is approximately 5,700 times as efficient as the use of VAM to identify and replace low-performing teachers” (p. 25; see also p. 23-24).

***

Citation: Yeh, S. S. (2013). A re-analysis of the effects of teacher replacement using value-added modeling. Teachers College Record, 115(12), 1-35. Retrieved from http://www.tcrecord.org/Content.asp?ContentID=16934

Special Issue of “Educational Researcher” Examines Value-Added Measures (Paper #1 of 9)

A few months ago, the flagship journal of the American Educational Research Association (AERA) – the peer-reviewed journal titled Educational Researcher (ER) – published a “Special Issue” including nine articles examining value-added measures (VAMs) (i.e., one introduction (reviewed below), four feature articles, one essay, and three commentaries). I will review each of these pieces separately over the next few weeks or so, although if any of you want an advanced preview, do click here as AERA made each of these articles free and accessible.

In this “Special Issue” editors Douglas Harris – Associate Professor of Economics at Tulane University – and Carolyn Herrington – Professor of Educational Leadership and Policy at Florida State University – solicited “[a]rticles from leading scholars cover[ing] a range of topics, from challenges in the design and implementation of teacher evaluation systems, to the emerging use of teacher observation information by principals as an alternative to VAM data in making teacher staffing decisions.” They challenged authors “to participate in the important conversation about value-added by providing rigorous evidence, noting that successful policy implementation and design are the product of evaluation and adaption” (assuming “successful policy implementation and design” exist, but I digress).

More specifically, in the co-editors’ Introduction to the Special Issue, Harris and Herrington note that in this special issue they “pose dozens of unanswered questions [see below], not only about the net effects of these policies on measurable student outcomes, but about the numerous, often indirect ways in which [unintended] and less easily observed effects might arise.” This section is of, in my opinion, the most “added value.”

Here are some of their key assertions:

  • “[T]eachers and principals trust classroom observations more than value added.”
  • “Teachers—especially the better ones—want to know what exactly they are doing well and doing poorly. In this respect, value-added measures are unhelpful.”
  • “[D]istrust in value-added measures may be partly due to [or confounded with] frustration with high-stakes testing generally.”
  • “Support for value added also appears stronger among administrators than teachers…But principals are still somewhat skeptical.”
  • “[T]he [pre-VAM] data collection process may unintentionally reduce the validity and credibility of value-added measures.”
  • “[I]t seems likely that support for value added among educators will decrease as the stakes increase.”
  • “[V]alue-added measures suffer from much higher missing data rates than classroom observation[s].”
  • “[T]he timing of value-added measures—that they arrive only once a year and during the middle of the school year when it is hard to adjust teaching assignments—is a real concern among teachers and principals alike.”
  • “[W]e cannot lose sight of the ample evidence against the traditional model [i.e., based on snapshot measures examined once per year as was done for decades past, or pre-VAM].” This does not make VAMs “better,” but with this statement most researchers agree.

Inversely, here are some points or assertions that should cause pause:

  • “The issue is not whether value-added measures are valid but whether they can be used in a way that improves teaching and learning.” I would strongly argue that validity is a pre-condition to use, as we do not want educators using invalid data to even attempt to improve teaching and learning. I’m actually surprised this statement was published, as so scientifically and pragmatically off-based.
  • We know “very little” about how educators “actually respond to policies that use value-added measures.” Clearly, the co-editors are not followers of this blog, other similar outlets (e.g., The Washington Post’s Answer Sheet), or other articles published in the media as well as scholarly journals, about educator use, interpretation, opinion, response, and the like to VAMs. (For example articles published in scholarly journals see, for example, here, here, and here).
  • “[I]n the debate about these policies, perspective has taken over where the evidence trail ends.” Rather, the evidence trail is already quite saturated in many respects, as study after study continues to evidence the same things (e.g., inconsistencies in teacher-level ratings over time, mediocre correlations between VAM and observational output, all of which matter most if high-stakes decisions are to be tied to value-added output).
  • [T]he best available evidence [is just] beginning to emerge on the use of value added.” Perhaps the authors of this piece are correct if focusing only on use, or the lack thereof as we have a good deal of evidence that much use is not happening given issues with transparency, accessibility, comprehensibility, relevance, fairness, and the like.

In the end, and also of high value, the authors of this piece offer others (e.g., graduate students, practitioners, or academics looking take note) some interesting points for future research, given VAMs are likely still to stay, for at least awhile. Some, although not all of their suggested questions for future research are included here:

  • How do educators’ perceptions impact their behavioral responses with and towards VAMs or VAM output? Does administrator skepticism affect how they use these measures?
  • Does the use of VAMs actually lead to more teaching to the test and shallow instruction, aimed more at developing basic skills than at critical thinking and creative
    problem solving?
  • Does the approach further narrow the curriculum and lead to more gaming of the system or, following Campbell’s law, distort the measures in ways that make them less informative?
  • In the process of sanctioning and dismissing low-performing teachers, do value-added-based accountability policies sap the creativity and motivation of our best teachers?
  • Does the use of value-added measures reduce trust and undermine collaboration among educators and principals, thus weakening the organization as a whole?
  • Are these unintended effects made worse by the common misperception that when
    teachers help their colleagues within the school, they could reduce their own value-added measures?
  • Aside from these incentives, does the general orientation toward individual
    performance lead teachers to think less about their colleagues and organizational roles and responsibilities?
  • Are teacher and administrator preparation programs helping to prepare future educators for value-added-based accountability by explaining the measures and their potential uses and misuses?
  • Although not explicitly posed as a question in this piece, but important, what are their (i.e., VAMs and VAM outputs) benefits, intended or otherwise?

Some of these questions are discussed or answered at least in part in the eight articles included in this “Special Issue” also to be highlighted in the next few weeks or so, one by one. Do stay tuned.

*****

Article #1 Reference: Harris, D. N., & Herrington, C. D. (2015). Editors’ introduction: The use of teacher value-added measures in schools: New evidence, unanswered questions, and future prospects. Educational Researcher, 44(2), 71-76. doi:10.3102/0013189X15576142

EVAAS, Value-Added, and Teacher Branding

I do not think I ever shared this video out, and now following up on another post, about the potential impact these videos should really have, I thought now is an appropriate time to share. “We can be the change,” and social media can help.

My former doctoral student and I put together this video, after conducting a study with teachers in the Houston Independent School District and more specifically four teachers whose contracts were not renewed due in large part to their EVAAS scores in the summer of 2011. This video (which is really a cartoon, although it certainly lacks humor) is about them, but also about what is happening in general in their schools, post the adoption and implementation (at approximately $500,000/year) of the SAS EVAAS value-added system.

To read the full study from which this video was created, click here. Below is the abstract.

The SAS Educational Value-Added Assessment System (SAS® EVAAS®) is the most widely used value-added system in the country. It is also self-proclaimed as “the most robust and reliable” system available, with its greatest benefit to help educators improve their teaching practices. This study critically examined the effects of SAS® EVAAS® as experienced by teachers, in one of the largest, high-needs urban school districts in the nation – the Houston Independent School District (HISD). Using a multiple methods approach, this study critically analyzed retrospective quantitative and qualitative data to better comprehend and understand the evidence collected from four teachers whose contracts were not renewed in the summer of 2011, in part given their low SAS® EVAAS® scores. This study also suggests some intended and unintended effects that seem to be occurring as a result of SAS® EVAAS® implementation in HISD. In addition to issues with reliability, bias, teacher attribution, and validity, high-stakes use of SAS® EVAAS® in this district seems to be exacerbating unintended effects.