New Mexico Teachers Burn Their State-Based Teacher Evaluations

Share Button

More than three dozen teachers,”including many who [had] just been rated “highly effective” by the New Mexico Public Education Department, working in the Albuquerque Public School District – the largest public school district in the state of New Mexico – turned to a burning bin this week, tossing their state-developed teacher evaluations into the fire in protest in front of district headquarters.

See the full article (with picture below) in The Albuquerque Journal here.

mkb050815h/metro/Marla Brose/050815 Linnea Montoya, a kindergarten teacher at Montezuma Elementary, drops her teacher evaluation into a waste basket with other burning evaluations in front of Albuquerque Public Schools headquarters, Wednesday, May 20, 2015, in Albuquerque, N.M. A group of teachers filled the entrance to APS to participate in the teacher evaluation protest. "It insulted my fellow teachers who mentored me and scored lower," Montoya said. (Marla Brose/Albuquerque Journal)

“Courtney Hinman ignited the blaze by taking a lighter to his “effective” evaluation. He was quickly followed by a “minimally effective” special education teacher from Albuquerque High School, then by a “highly effective” teacher from Monte Vista Elementary School. Wally Walstrom, also of Monte Vista Elementary, told the crowd of 60 or 70 people that his “highly effective” rating was “meaningless,” before tossing it into the fire. One after another, teachers used the words “meaningless” and “unfair” to describe the evaluations and the process used to arrive at those judgments…Another teacher said the majority of his autistic, special-needs students failed the SBA – a mandatory assessment test – yet he was judged “highly effective. ‘How can that be?’ he asked as he dropped his evaluation into the fire.”

“An English teacher said he was judged on student progress – in algebra and geometry.
Another said she had taught a mere two months, yet was evaluated as if she had been in the classroom for an entire school year. Several said their scores were lowered only because they were sick and stayed away from school. One woman said parents routinely say she’s the best teacher their children have ever had, yet she was rated ‘minimally effective.’ An Atrisco Heritage teacher said most of the math teachers there had been judged ‘minimally effective.’ And a teacher of gifted children who routinely scored at the top in assessment testing asked, ‘How could they advance?’ before tossing his “highly effective” evaluation into the blaze.”

With support from New Mexico’s Governor Susana Martinez, New Mexico teacher evaluation systems’ master creator – Education Secretary Hanna Skandera’s – could not be reached for comment.

Read the full article, again, here, and read more about what else is going on in New Mexico in prior posts on VAMboozled! here, here, here, and here.)

Share Button

Is this Thing On? Amplifying the Call to Stop the Use of Test Data for Educator Evaluations (At Least for Now)

Share Button

I invited a colleague of mine and now member of the VAMboozled! team – Kimberly Kappler Hewitt (Assistant Professor, University of North Carolina, Greensboro) – to write another guest post for you all (see her first post here). She wrote another, this time capturing what three leading professional organizations have to say on the use of VAMs and tests in general for purposes of teacher accountability. Here’s what she wrote:

Within the last year, three influential organizations—reflecting researchers, practitioners, and philanthropic sectors—have called for a moratorium on the current use of student test score data for educator evaluations, including the use of value-added models (VAMs).

In April of 2014, the American Statistical Association (ASA) released a position statement that was highly skeptical of the use of VAMs for educator evaluation. ASA declared that “Attaching too much importance to a single item of quantitative information is counterproductive—in fact, it can be detrimental to the goal of improving quality.” To be clear, the ASA stopped short of outright condemning the use of VAM for educator evaluation, and declared that its statement was designed to provide guidance, not prescription. Instead, ASA outlined the possibilities and limitations of VAM and called into question how it is currently being (mis)used for educator evaluation.

In June of 2014, the Gates Foundation, the largest American philanthropic education funder, released “A Letter to Our Partners: Let’s Give Students and Teachers Time.” This was written by Vicki Phillips, Director of Education, College Ready, in which she (on behalf of the Foundation) called for a two-year moratorium on the use of test scores for educator evaluation. She explained that “teachers need time to develop lessons, receive more training, get used to the new tests, and offer their feedback.”

Similarly, the Association for Supervision and Curriculum Development (ASCD), which is arguably the leading international educator organization comprised of 125,000 members in more than 130 nations, also recently released a policy brief that also calls for a two-year moratorium on high stakes use of state tests—including their use for educator evaluations. ASCD also explicitly acknowledged that “reliance on high-stakes standardized tests to evaluate students, educators, or schools is antithetical to a whole child education. It is also counter to what constitutes good educational practice.”

While the call to halt the current use of test scores for educator evaluation is echoed across all three of these organizations, there are important nuances to their messages. The Gates Foundation, for example, makes it clear that the foundation supports the use of student test data for educator evaluation even as it declares the need for a two-year moratorium, the purpose of which is to allow teachers the time to adjust to the new Common Core Standards and related tests:

The Gates Foundation is an ardent supporter of fair teacher feedback and evaluation systems that include measures of student gains. We don’t believe student assessments should ever be the sole measure of teaching performance, but evidence of a teacher’s impact on student learning should be part of a balanced evaluation that helps all teachers learn and improve.

The Gates Foundation cautions, though, the risk of moving too quickly to tie test scores to teacher evaluation:

Applying assessment scores to evaluations before these pieces are developed would be like measuring the speed of a runner based on her time—without knowing how far she ran, what obstacles were in the way, or whether the stopwatch worked!

I wonder what the stopwatch symbolizes in the simile: Does the Gates Foundation have questions about the measurement mechanism itself (VAM or another student growth measure), or is Gates simply arguing for more time in order for educators to be “ready” for the race they are expected to run?

While the Gates call for a moratorium is oriented on increasing the possibility of realizing the positive potential of policies regarding the use of student test data for educator evaluation by providing more time to prepare educators for them, ASA on the other hand is concerned about the potential negative effects of such policies. The ASA, in its attempt to provide guidance, identified problems with the current use of VAM for educator evaluation and raised important questions about the potential effects of high stakes use of VAM for educator evaluation:

A decision to use VAMs for teacher evaluations might change the way the tests are viewed and lead to changes in the school environment. For example, more classroom time might be spent on test preparation and on specific content from the test at the exclusion of content that may lead to better long-term learning gains or motivation for students. Certain schools may be hard to staff if there is a perception that it is harder for teachers to achieve good VAM scores when working in them. Over-reliance on VAM scores may foster a competitive environment, discouraging collaboration and efforts to improve the educational system as a whole.

Similarly to ASA, ASCD is concerned with the negative effects of current accountability practices, including “over testing, a narrowing of the curriculum, and a de-emphasis of untested subjects and concepts—the arts, civics, and social and emotional skills, among many others.” While ASCD is clear that it is not calling for a moratorium on testing, it is calling for a moratorium on accountability consequences linked to state tests: “States can and should still administer standardized assessments and communicate the results and what they mean to districts, schools, and families, but without the threat of punitive sanctions that have distorted their importance.” ASCD goes further than ASA and Gates in calling for a complete revamp of accountability practices, including policies regarding teacher accountability:

We need a pause to replace the current system with a new vision. Policymakers and the public must immediately engage in an open and transparent community decision-making process about the best ways to use test scores and to develop accountability systems that fully support a broader, more accurate definition of college, career, and citizenship readiness that ensures equity and access for all students.

So…are policymakers listening? Are these influential organizations able to amplify the voices of researchers and practitioners across the country who also want a moratorium on misguided teacher accountability practices? Let’s hope so.

Share Button

Teacher Won’t be Bullied by Alhambra (AZ) School Officials

Share Button

Lisa Elliott, a National Board Certified Teacher (NBCT) and 18-year veteran teacher who has devoted her 18-year professional career to the Alhambra Elementary School District — a Title I school district (i.e., having at least 40% of the student population from low-income families) located in the Phoenix/Glendale area — expresses in this video how she refuses to be bullied by her district’s misuse of standardized test scores.

Approximately nine months ago she was asked to resign her teaching position by the district’s interim superintendent – Dr. Michael Rivera – due to her students’ low test scores for the 2013-2014 school year, and despite her students exceeding expectations on other indicators of learning and achievement. She “respectfully declined” submitting her resignation letter because, for a number of reasons, including that her “children are more than a test score.” Unfortunately, however, other excellent teachers in her district just left…

Share Button

Yong Zhao’s Stand-Up Speech

Share Button

Yong Zhao — Professor in the Department of Educational Methodology, Policy, and Leadership at the University of Oregon — was a featured speaker at the recent annual conference of the Network for Public Education (NPE). He spoke about “America’s Suicidal Quest for Outcomes,” as in, test-based outcomes.

I strongly recommend you take almost an hour (i.e., 55 minutes) out of your busy days and sit back and watch what is the closest thing to a stand-up speech I’ve ever seen. Zhao offers a poignant but also very entertaining and funny take on America’s public schools, surrounded by America’s public school politics and situated in America’s pop culture. The full transcription of Zhao’s speech is also available here, as made available by Mercedes Schneider, for any and all who wish to read it: Yong_Zhao NPE Transcript

Zhao speaks of democracy, and embraces his freedom of speech in America (v. China) that permits him to speak out. He explains why he pulled his son out of public school, thanks to No Child Left Behind (NCLB), yet he criticizes G. W. Bush for causing his son to (since college graduation) live in his basement. Hence, Zhao’s “readiness” to leave the basement is much more important than any other performance “readiness” measure being written into the plethora of educational policies surrounding “readiness” (e.g., career and college readiness, pre-school readiness).

Zhao uses what happened to Easter Island’s Rapa Nui civilization that led to their extinction as an analogy for what may happen to us post Race to the Top, given both sets of people are/were driven by false hopes of the gods raining down on them prosperity, should they successfully compete for success and praise. Like the Rapa Nui built monumental statues in their race to “the top” (literally), the unintended consequences that came about as a result (e.g., the exploitation of their natural resources) destroyed their civilization. Zhao argues the same thing is happening in our country with test scores being the most sought after monuments, again, despite the consequences.

Zhao calls for mandatory lists of side effects that come along with standardized testing, similar to something I wrote years ago in an article titled “Buyer, Be Aware: The Value-Added Assessment Model is One Over-the-Counter Product that May Be Detrimental to Your Health.” In this article I pushed for a Federal Drug Administration (FDA) approach to educational research, that would serve as a model to protect the intellectual health of the U.S. A simple approach that legislators and education leaders would have to follow when they passed legislation or educational policies whose benefits and risks are known, or unknown.

Otherwise, he calls all educators (and educational policymakers) to continuously ask themselves one question when test scores rise: “What did you give up to achieve this rise in scores.” When you choose something, what do you lose?

Do give it a watch!

Share Button

Statistical Wizardry in New Mexico UnEnchanted

Share Button

As per the Merriam-Webster dictionary, the word wizardry is defined as something that is “very impressive in a way that seems magical.” It includes the “magical things done by a wizard.” While educational statisticians of all sorts have certainly engaged in statistical wizardry in one form or another, across many states for many years past, especially when it comes to working VAM magic, the set of statistical wizards in the land of enchantment — New Mexico — are at it again (see prior posts about this state here, here, and here).

In an article recently released in the Albuquerque Journal, news staff wrote an article titled, “Teacher Evaluations Show Dip in ‘Effective’ Rating.” The full headline should have read more explicitly that across the state “the percentage of effective teachers decreased while the percentage of highly effective and exemplary teachers rose.”

What is highlighted and advanced (as a causal conjecture) is that “the effects” of the state’s teacher evaluations for the academic 2014-15 year, given the state’s evaluation system’s “overhaul” (i.e., on average teachers are now to be evaluated 50% using student test scores, 40% using observational scores, and 10% using other “multiple measures,” including attendance), was the cause of the aforementioned decrease and both increases.

That is, the state system not only helped with (1) the more accurate identification and labeling of even more ineffective teachers, it also helped with, albeit in contradiction, (2) the improvement of other teachers who were otherwise accurately positioned the year prior. The teachers on the left side of the bell curve (see below) were more accurately identified this year, and the teachers on the “right” side became more effective due to the new and improved teacher evaluation system constructed by the state…and what might be renamed the Hogwarts Department of Education, led by Hanna Skandera – the state’s Voldemort – who, in this article pointed out that these results evidence (and I use that term loosely) “that the system is doing a better job of pointing out good teachers.”

But is this really the reality, oh wise one of the dark arts?

Here is the primary figure of interest:NormCurve

What is illustrated are New Mexico’s teachers by proportion and by score (i.e., Ineffective to Exemplary) covering the 2013-2014 and 2014-2015 years. More importantly what is evidenced here, though, is yet another growing trend across the country, although New Mexico is one state taking the lead in this regard, especially in terms of publicity.

The trend is that instead of having such figures with 99% of teachers being rated as satisfactory or above (see “The Widget Effect” report here), these new and improved teacher evaluation systems are to distribute teachers’ evaluation scores around a normal curve, that is more likely true, whereas many more teachers are to be identified for their ineffectiveness.

Apparently, it’s working! Or is it…

This can occur, regardless of what is actually happening in terms of actual effectiveness across America’s classrooms, when the purported value that teachers add to or detract from student learning (i.e., 50% of the state’s model) is to substantively count, because VAM output is not calculated in absolute terms, but rather in relative or normative terms. Herein lies the potion to produce the policy results so desired.

VAM-based scores can be easily constructed and manufactured by those charged with constructing such figures and graphs, also because tests themselves are also constructed to fit normal curves; hence, it is actually quite easy to distribute such scores around a bell curve, even if the data do not look nearly as clean from the beginning (they never do) and even if these figures do not reflect reality.

Regardless, such figures are often used because they give the public easy-to-understand illustrations, that lead to commonsensical perceptions that teachers are not only widely varying in terms of their effectiveness, but also that new and improved evaluation systems are helping to better differentiate and identify teachers in terms of their variation in (in)effectiveness.

However, before we accept these figures and the text around them as truth, we must agree that such a normal curve is actually a reflection of reality. We must also question whether for every high performing teacher, we must have another teacher performing equally bad, and vice versa. Generalizing upwards, we must also question whether 50% of all of America’s public school teachers are actually effective as compared to the other 50% who are not. Where some teachers get better, must other teachers get worse? For every one who succeeds must we have one who fails? For those of you who might be familiar, recall the debate surrounding The Bell Curve, as this is precisely what we are witnessing here.

By statistical design, in such cases, there will always be some teachers who will appear relatively less effective simply because they fall on the wrong side of the mean, and vice versa, but nothing here (or elsewhere as per similar graphs and figures) is actually a “true” indicator of the teacher’s actual effectiveness. This is yet another assumption that must be kept in check, especially when grand wizards claim that the new teacher evaluation systems they put in place caused such magical moments.

Share Button

New York’s VAM, by the American Institute for Research (AIR)

Share Button

A colleague of mine — Stephen Caldas, Professor of Educational Leadership at Manhattanville College, one of the “heavyweights” who recently visited New York to discuss the state’s teacher evaluation system, and who according to Chalkbeat New York, “once called New York’s evaluation system “psychometrically indefensible” — wrote me with a critique of New Yorks’ VAM which I decided to post for you all here.

His critique is of the 2013-2014 Growth Model for Educator Evaluation Technical Report, produced by the American Institute for Research (AIR) that, “describes the models used to measure student growth for the purpose of educator evaluation in New York State for the 2013-2014 School Year” (p. 1).

Here’s what he wrote:

I’ve analyzed this tech report, which for many would be a great sedative prior to sleeping. It’s the latest in a series of three reports by AIR paid for by the New York State Education Department. Although the truth of how good the growth models used by AIR really are is buried deep in the report in Table 11 (p. 31) and Table 20 (p. 44), both of which are recreated here.

Table 11Table 20These tables give us indicators of how well the growth models are at predicting growth in current year student English/language arts (ELA) and mathematics (MATH) student scores by grade level and subject (i.e., the dependent variables). At the secondary level, an additional outcome, or dependent variable predicted is the number of Regents Exams a student passed for the first time in the current year. The unadjusted models only included prior academic achievement as predictor variables, and are shown for comparison purposes only. The adjusted models were the models that were actually used by the state to make predictions that fed into teacher and principal effectiveness scores. In additional to using prior student achievement as a predictor, the adjusted prediction models included these additional predictor variables: student and school-level poverty status, student and school-level socio-economic status (SES), student and school-level English language learner (ELL) status, and scores on the New York State English as a Second Language Achievement Test (the NYSESLAT). These tables above report a statistic called “Pseudo R-squared” or just “R-squared,” and this statistic shows us the predictive power of the overall models.

To help interpret these numbers, if one observes a “1.0” (which one won’t), it would mean that the model was “100%” perfect (with no prediction error). One would obtain the “percentage of perfect” (if you will) by moving the decimal point two places to the right. Otherwise, the difference between the percentage perfect and 100 is called the “error” or “e.”

With this knowledge, one can see in the adjusted ELA 8th grade model (Table 11) that the predictor variables altogether explain “74%” of the variance of current year student ELA 8th grade scores (R-squared = 0.74). Conversely, this same model has a 26% of error (and this is one of the best ones illustrated in the report). In other words, this particular prediction model cannot account for 26% of the cause of current ELA 8th grade scores, “all other things considered” (i.e., the predictor variables that are so highly correlated with test scores in the first place).

The prediction models at the secondary level are much, MUCH worse. If one is to look at Table 20, one would see that in the worst model (adjusted ELA Common Core ) the predictor variables together only explain 45% of student ELA Common Core test scores. Thus, this prediction model cannot account for 55% of the causes of these scores!!

While not terrible R-squared values for social science research, these are horrific values for a model used to make individual level predictions at the teacher or school level with any degree of precision. Quite frankly, they simply cannot be precise given these huge quantities of error. The chances that these models would precisely (with no error) predict a teacher’s or school’s ACTUAL student test scores is slim to none. Yet, the results of these imprecise growth models can contribute up to 40% of a teacher’s effectiveness rating.

This high level of imprecision would explain why teachers like Sheri Lederman of Long Island, who is apparently a terrific fourth grade educator based on all kinds of data besides her most recent VAM scores, received an “ineffective” rating based on this flawed growth model (see prior posts here and here). She clearly has a solid basis for her lawsuit against the state of New York in which she claims her score was “arbitrary and capricious.”

This kind of information on all the prediction error in these growth models needs to be in an executive summary in front of these technical reports. The interpretation of this error should be in PLAIN LANGUAGE for the tax payers who foot the bill for these reports, the policy makers who need to understand the findings in these reports, and the educators who suffer the consequences of such imprecision in measurement.

Share Button

“Insanity Reigns” in New York

Share Button

As per an article in Capitol Confidential, two weeks ago New York’s Governor Cuomo – the source of many posts, especially lately (see, for example, here, here, here, here, and here) — was questioned about the school districts that throughout New York were requesting delays in implementing the state’s new teacher evaluation program. Cuomo was also questioned about students in his state who were opting out of the state’s tests.

In response, Cuomo “stressed that the tests used in the evaluations don’t affect the students’ grades.” In his direct words, “[t]he grades are meaningless to the students.”

Yet the tests are to be used to evaluate how effective New York’s teachers are? So, the tests are meaningless to students throughout the state, but the state is to use them to evaluate the effectiveness of students’ teachers throughout the state regardless? The tests won’t count for measuring student knowledge (ostensibly what the tests are designed to measure) but they will be used to evaluate teachers (which the tests were not designed to measure)?

In fact, the tests as per Cuomo, “won’t count at all for the students…for at least the next five years.” Hence, students “can opt out if they want to.” Inversely, if a student decides to take the test the student should consider it “practice” because, again, “the score doesn’t count.” Nor will it count for some time.

In others words, those of a colleague who sent me this article: “Cuomo’s answer to parents who are on the fence about opting out, “oh, it’s just practice.” He expects that when parents hear that testing is low stakes for their kids they will not opt out, but once kids hear that the tests don’t count for them, how hard do you think they are going to try. Low stakes for students, high stakes for the teacher. Insanity reigns!”

This all brings into light the rarely questioned assumption about how the gains that students make on “meaningless” tests actually indicate how much “value” a teacher “adds” to or detracts from his/her students.

What is interesting to point out here is that with No Child Left Behind (NCLB), Governor turned President George W. Bush’s brainchild, the focus was entirely on student-level accountability (i.e., a student must pass a certain test or face the consequences). The goal was that 100% of America’s public school students would be academically proficient in reading and mathematics by 2014 – yes, last year.

When that clearly did not work as politically intended, the focus changed to teacher accountability — thanks to President Obama, his U.S. Secretary of Education Arne Duncan, and their 2009 Race to the Top competition. Approximately $4.35 billion in taxpayer revenues later, we now have educational policies focused on teacher, but no longer student accountability, with similar results (or the lack thereof).

The irony here is that for the most part the students taking these tests are no longer to be held accountable for their performance, but their teachers are to be held for their students’ performance instead, and regardless. Accordingly, across the country we now have teachers, justifiably nervous, who without telling their students that their professional lives are on the line — which is true in many cases — or otherwise lying to their students (e.g., your grades on these tests will be used to place you into college) — which is false in all cases — could face serious consequences, now because their students who as per Cuomo don’t have to care about their test performance (e.g., for five years)

While VAMs certainly have a number of serious issues with which we must contend, this is another that is not often mentioned, made transparent, or discussed. But the reality is that teachers across the country are living out this reality, in practice, every time they prepare their students for these tests.

So I suppose, within the insanity, we have Cuomo to thank for his comments here, as these alone make yet another reality behind VAMs all too apparent.

Share Button

My Book on HBO’s “Last Week Tonight with John Oliver”

Share Button

My book on “Rethinking Value-Added Models…” was featured last night, on HBO’s Last Week Tonight with John Oliver.


Holy cow!! Literally!!

Perhaps more importantly, though [insert smiley face emoji here], is the 18 minute series in which the book is mentioned (at the 8:20 point) — all about Standardized Testing. Click on the YouTube video below to watch the whole show.

This 18 minutes includes information on the educational policies supporting the history of high-stakes standardized tests in the U.S., how educational policymakers (including U.S. Presidents G.W. Bush and Obama) have unwaveringly “advanced” this history, how our nation’s over-reliance on such test-based policies have done nothing for our nation for the past ten years (as cited in this clip, even though they have really done little to nothing for now more than 30 years), how and why the opt-out movement is still sweeping the nation, and the like. Also prominent is Florida Teacher Luke Flint’s “Story” about his VAM Scores (also covered here).

This is a must watch, and funny!! Funny as it can be, of course, given the currently serious situation. The video’s content is a bit, let’s say edgy too though, so please be advised.

Share Button

VAM “Heavyweights” Headed to New York

Share Button

Last week I received an email from the New York State Education Commissioner inviting me to participate on a panel, to be held during a Learning Summit on Teacher and Principal Evaluation this Thursday, May 7, 2015, in the state’s capitol – Albany. This event is being hosted by the New York State Board of Regents, in conjunction with the New York State Education Department. Unfortunately, given the two-week notice and a prior engagement, I had to decline (as did others including Stanford’s Linda Darling-Hammond and New York University’s Sean Corcoran, to name a few).

Regardless, recall that New York is one of our states to watch, especially since the election of Governor Cuomo and the state’s collective efforts to overhaul the state’s teacher evaluation system (see prior posts about New York here, here, here, and here). Accordingly, this event will also be one “to watch.” Also interesting “to watch” will be how much influence this panel of experts will actually have on New York’s teacher evaluation policy, for better or worse.

I write “for better or worse” in that, as per a recent article released by Chalkbeat New York — a news site covering educational change in New York schools — the “heavyweights” who will actually be making the trip to Albany “represent many sides of the contentious debates around how to rate [and evaluate] teachers.”

The experts on the pro-Cuomo side include:

  • Thomas Kane — the Professor of Economics from Harvard University who directed the $45 million worth of Measures of Effective Teaching (MET) studies for the Bill & Melinda Gates Foundation, who has been the source of many prior (and not so positive) posts on this blog (see, for example, here and here), and who within the last couple of weeks wrote an op-ed piece praising Governor Cuomo for his state’s new teacher evaluation plans (see a prior post about this here). He in and of himself is “one to watch” in that he is not often one to support his pro-VAM claims with much evidence. Hopefully the panel will take note.
  • Catherine Brown — Vice President of Educational Policy at the Center for American Progress, which in general supports and endorses the use of VAMs.
  • Sandi Jacobs — former charter corps member with Teach For America (TFA) and current Vice President at the National Council on Teacher Quality (NCTQ), that according to Chalkbeat New York, is “an organization that has pushed states to adopt stringent evaluation systems that rely more on student learning measures.”
  • Leslie Guggenheim — Vice President overseeing New Teacher Effectiveness at The New Teacher Project (TNTP), the advocacy organization whose influential “The Widget Effect” report asserted that all teachers are being rated good or great (which should be impossible in a country not academically competing with Finland and Shanghai) because current teacher evaluation systems are preventing “us” from truly identifying exceptional teachers from their not-so-exceptional colleagues.

The experts on “the other side” include:

  • Jesse Rothstein — the University of California, Berkeley Associate Professor of Economics who has researched and written extensively about VAMs, who is most well-known for his work on how the non-random of sorting students into teachers’ classrooms biases VAM output (and accordingly distorts the validity of the inferences to be made as based on VAM output; see, for example, here, here, and here), who also analyzed Kane’s MET data and “found substantial differences in value-added scores for the same teacher when different tests were used,” who has engaged in other debates with Thomas Kane (see, for example, here), and who has also been the source of many prior posts on this blog (see, for example, here and here).
  • Aaron Pallas — Professor of Sociology and Education at Teachers College, Columbia University who, according to Chalkbeat New York, has “been critical of New York’s system and served as an expert witness in lawsuits brought by teachers unions challenging low teacher ratings.”
  • Stephen Caldas — professor at Manhattanville College who, according to Chalkbeat New York, “once called New York’s evaluation system “psychometrically indefensible.”

Nonetheless, “[i]t’s unclear what influence the researchers and policy analysts will have on the state’s work, given that much of the evaluation system has [already] been prescribed….[yet]…[o]fficials are required by law to collect public comment on how to design the regulations that will govern how a teacher’s performance in the classroom gets graded, a process that must be complete by the end of June.”

Do stay tuned.

Share Button

The (Relentless) Will to Quantify

Share Button

An article was just published in the esteemed, peer-reviewed journal Teachers College Record titled, “The Will to Quantify: The “Bottom Line” in the Market Model of Education Reform” and authored by Leo Casey – Executive Director of the Albert Shanker Institute. I will summarize its key points here (1) for those of you without the time to read the whole article (albeit worth reading) and (2) just in case the link above does not work for those of you out there without subscriptions to Teachers College Record.

In this article, Casey reviews the case of New York and the state department of education’s policy attempts to use New York teachers’ value-added data to reform the state’s public schools, “in the image and likeness of competitive businesses.” Casey interrogates this state’s history given the current, market-based, corporate reform environment surrounding (and swallowing) America’s public schools within New York, but also beyond.

Recall that New York is one of our states to watch, especially since the election of Governor Cuomo into the Governor’s office (see prior posts about New York here, here, and here). Accordingly, according to Casey as demonstrated in this article, this is the state to use to demonstrate how “[t]he market model of education reform has become a prisoner to a Nietzschean will to quantify, in which the validity and reliability of the actual numbers is irrelevant.”

In New York, using the state’s large-scale standardized tests in English/language arts and mathematics, grades 3 through 8, teachers’ value-added data reports were first developed for approximately 18,000 teachers throughout the state for three school years: 2007-2010. The scores were constructed with all assurances that these scores “would not be used for [teacher] evaluation purposes,” while the state department specifically identified tenure decisions and annual rating processes as two areas where teachers’ value-added scores “would play no role.” At that time the department of education also took a “firm position” that that these reports would not be disclosed or shared outside of the school community (i.e., with the public).

Soon, thereafter, however the department of education, “acting unilaterally,” began to use the scores in tenure decisions and began to, after a series of Freedom of Information requests, release the scores to the media, who in turn released the scores to the public at large. By February of 2012, teachers’ value-added scores were published by all  major New York media.

Recall these articles, primarily about the worst teachers in New York (see, for example, here, here, and here), and recall the story of Pascale Mauclair – a sixth-grade teacher in Queens who was “pilloried” in the New York Post as the city’s “worst teacher” based solely on her value-added reports. After a more thorough investigation, however, “Mauclair proved to be an excellent teacher who had the unqualified support of her school, one of the best in the city: her principal declared without hesitation or qualification that she would put her own child in Mauclair’s class, and her colleagues met Mauclair with a standing ovation when she returned to the school after the Post’s attack. Mauclair’s undoing had been her own dedication to teaching students with the greatest needs. As a teacher of English as a Second Language, she had taken on the task of teaching small self-contained classes of recent immigrants for the last five years.”

Nonetheless, the state department of education continued (and continues) to produce data for New York teachers “with a single year of test score data, and sample sizes as low as 10…When students did not have a score in a previous year, scores were statistically “imputed” to them in order to produce a basis for making a growth measure.”

These scores had, and often continue to have (also across states), “average confidence intervals of 60 to 70 percentiles for a single-year estimate. On a distribution that went from 0 to 99, the average margins of error in the [New York scores] were, by the [state department of education’s] own calculations, 35 percentiles for Math and 53 percentiles for English Language Arts. One-third of all [scores], the [department] conceded, were completely unreliable—that is, so imprecise as to not warrant any confidence in them. The sheer magnitude of these numbers takes us into the realm of the statistically surreal.” Yet the state continues to this day in its efforts to use these data despite the gross statistical and consequential human errors present.

This is, in the words of Casey, is “a demonstration of [extreme] professional malpractice in the realm of testing.” Yet educational reformers like Governor Cuomo as well as “Michael Bloomberg, Joel Klein, and a cohort of similarly minded education reformers across the United States, the fundamental problem with American public education is that it has been organized as a monopoly that is not subject to the discipline of the marketplace. The solution to all that ails public schools, therefore, is to remake them in the image and likeness of a competitive business. Just as private businesses rise and fall on their ability to compete in the marketplace, as measured by the ‘bottom line’ of their profit balance sheet, schools need to live or die on their ability to compete with each other, based on an educational ‘bottom line.’ If ‘bad’ schools die and new ‘good’ schools are created in their stead, the productivity of education improves. But to undertake this transformation and to subject schools to market discipline, an educational “bottom line” must be established. Standardized testing and value-added measures of performance based on standardized testing provide that ‘bottom line.”

Otherwise, some of the key findings taken from other studies Casey cited in this piece are also good to keep in mind:

  • “A 2010 U.S. Department of Education study found that value-added measures in general have disturbingly high rates of error, with the use of three years of test data producing a 25% error rate in classifying teachers as above average, average, and below average and one year of test data yielding a 35% error rate.” Nothing much has changed in terms of error rates here, so this study stills stands as one of the capstone pieces on this topic.
  • “New York University Professor Sean Corcoran has shown that it is hard to isolate a specific teacher effect from classroom factors and school effects using value-added measures, and that in a single year of test scores, it is impossible to distinguish the teacher’s impact. The fewer the years of data and the smaller the sample (the number of student scores), the more imprecise the value-added estimates.”
  • Also recall that “the tests in question [are/were] designed for another purpose: the measure of student performance, not teacher or school performance.” That is, these tests were designed and validated to measure student achievement, BUT they were never designed or validated for their current purposes/uses: to measure teacher effects on student achievement.
Share Button