New Mexico’s Teacher Evaluation Lawsuit: Four Teachers’ Individual Cases

Regarding a prior post about a recently filed “Lawsuit in New Mexico Challenging State’s Teacher Evaluation System,” filed by the American Federation of Teachers (AFT) and charging that the state’s current teacher evaluation system is unfair, error-ridden, harming teachers, and depriving students of high-quality educators (see the actual lawsuit here), the author of an article recently released in The Washington Post takes “A closer look at four New Mexico teachers’ evaluations.”

Emma Brown writes that the state believes this system supports the “aggressive changes’ needed “to produce real change for students” and “these evaluations are an essential tool to support the teachers and students of New Mexico.” Teachers, on the other hand (and in general terms), believe that the new evaluations “are arbitrary and offer little guidance as to how to improve.”

Highlighted further in this piece, though, are four specific teachers’ evaluations taken from this state’s system along with each teacher’s explanations of the problems as they see them. The first veteran teacher with 36 years of “excellent evaluations” scored ineffective for missing too much work, although she was approved for and put on a six-month’s leave after a serious injury caused by a fall. She took four of the six months, but her “teacher attendance” score dropped her to the bottom of the teacher rankings. She has since retired.

The second, 2nd-grade teacher, also a veteran teacher with 27 years of experience, received 50% of her “teacher attendance” points also given a family-related illness, but she also received 8 out of 50 “student achievement” points. She argues that her students, because most of them are well above average had difficulties demonstrating growth. In other words, her argument rests on the real concern (and very real concern in terms of the current research) that “ceiling effects” are/were preventing her students from growing upwards, enough, when compared to other “similar” students who are also to demonstrate “a full year’s worth of growth.” She is also retiring in a few months “in part because she is so frustrated with the evaluation system.”

The third teacher, a middle-school teacher, scored 23 out of 70 “value-added” points, even though he switched from teaching language arts to teaching social studies at the middle-school level. This teacher did not apparently have the three-years needed (not to mention in the same subject area) to calculate his “value-added,” nor does he have “any idea” where his score came from or how it was calculated.” Accordingly, his score “doesn’t give him any information about how to get better,” which falls under the general issue that these scores are apparently offering teachers little guidance as to how to improve. This is an issue familiar across most if not all such models.

The fourth teacher, an alternative high school mathematics and science teacher of pregnant and parenting teens many of whom have learning or emotional disabilities, received 24 of 70 “student achievement” points, she is arguing, are based on tests that are “unvetted and unreliable,” especially given the types of students she teaches. As per her claim: ““There are things I am being evaluated on which I do not and cannot control…Each year my school graduates 30 to 60 students, each of whom is either employed, or enrolled in post-secondary training/education. This is the measure of our success, not test scores.”

This is certainly a state to watch, as the four New Mexico teachers highlighted in this article certainly have unique and important cases, all of which may be used to help set precedent in this state as well as others. Do stay tuned…

Teacher Evaluation and Accountability Alternatives, for A New Year

At the beginning of December I posted a post about Diane Ravitch’s really nice piece published in the Huffington Post about what she views as a much better paradigm for teacher evaluation and accountability. Diane Ravitch posted another on similar alternatives, although this one was written by teachers themselves.

I thought this was more than appropriate, especially given a New Year is upon us, and while it might very well be wishful thinking, perhaps at least some of our state policy makers might be willing to think in new ways about what really could be new and improved teacher evaluation systems. Cheers to that!

The main point here, though, is that alternatives do, indeed, exist. Likewise, it’s not that teachers do not want to be held accountable for, and evaluated on that which they do, but they do want whatever systems are in place (formal or informal) to be appropriate, professional, and fair. How about that for policy-based resolution.

This is from Diane’s post: The Wisdom of Teachers: A New Vision of Accountability.

Anyone who criticizes the current regime of test-based accountability is inevitably asked: What would you replace it with? Test-based accountability fails because it is based on a lack of trust in professionals. It fails because it confuses measurement with instruction. No doctor ever said to a sick patient, “Go home, take your temperature hourly, and call me in a month.” Measurement is not a treatment or a cure. It is measurement. It doesn’t close gaps: it measures them.

Here is a sound alternative approach to accountability, written by a group of teachers whose collective experience is 275 years in the classroom. Over 900 teachers contributed ideas to the plan. It is a new vision that holds all actors responsible for the full development and education of children, acknowledging that every child is a unique individual.

Its key features:

  • Shared responsibility, not blame
  • Educate the whole child
  • Full and adequate funding for all schools, with less emphasis on standardized testing
  • Teacher autonomy and professionalism
  • A shift from evaluation to support
  • Recognition that in education one size does not fit all

Houston, We Have A Problem: New Research Published about the EVAAS

New VAM research was recently published in the peer-reviewed Education Policy Analysis Archives journal, titled “Houston, We Have a Problem: Teachers Find No Value in the SAS Education Value-Added Assessment System (EVAAS®).” This article was published by a former doctoral student of mine, turned researcher now at a large non-profit — Clarin Collins. I asked her to write a guest post for you all summarizing the fully study (linked again here). Here is what she wrote.

As someone who works in the field of philanthropy, completed a doctoral program more than two years ago, and recently became a new mom, you might question why I worked on an academic publication and am writing about it here as a guest blogger? My motivation is simple: the teachers. Teachers continue to be at the crux of the national education reform efforts as they are blamed for the nation’s failing education system and student academic struggles. National and state legislation has been created and implemented as believed remedies to “fix” this problem by holding teachers accountable for student progress as measured by achievement gains.

While countless researchers have highlighted the faults of teacher accountability systems and growth models (unfortunately to fall on the deaf ears of those mandating such policies), very rarely are teachers asked how such policies play out in practice, or for their opinions, as representing their voices in all of this. The goal of this research, therefore, was first, to see how one such teacher evaluation policy is playing out in practice and second, to give voice to marginalized teachers, those who are at the forefront of these new policy initiatives. That being said, while I encourage you to check out the full article [linked again here], I highlight key findings in this summary, using the words of teachers as often as possible to permit them, really, to speak for themselves.

In this study I examined the SAS Education Value-Added Assessment System (EVAAS) in practice, as perceived and experienced by teachers in the Southwest School District (SSD). SSD [a pseudonym] is using EVAAS for high-stakes consequences more than any other district or state in the country. I used a mixed-method design including a large-scale electronic survey to investigate the model’s reliability and validity; to determine whether teachers used the EVAAS data in formative ways as intended; to gather teachers’ opinions on EVAAS’s claimed benefits and statements; and to understand the unintended consequences that might have also occurred as a result of EVAAS use in SSD.

Results revealed that the reliability of the EVAAS model produced split and inconsistent results among teacher participants regardless of subject or grade-level taught. As one teacher stated, “In three years, I was above average, below average and average.” Teachers indicated that it was the students and their varying background demographics who biased their EVAAS results, and much that was demonstrated via their scores was beyond the control of teachers. “[EVAAS] depends a lot on home support, background knowledge, current family situation, lack of sleep, whether parents are at home, in jail, etc. [There are t]oo many outside factors – behavior issues, etc.” that apparently are not controlled or accounted for in the model.

Teachers reported dissimilar EVAAS and principal observation scores, reducing the criterion-related validity of both measures of teacher quality. Some even reported that principals changed their observation scores to match their EVAAS scores; “One principal told me one year that even though I had high [state standardized test] scores and high Stanford [test] scores, the fact that my EVAAS scores showed no growth, it would look bad to the superintendent.” Added another teacher, “I had high appraisals but low EVAAS, so they had to change the appraisals to match lower EVAAS scores.”

The majority of teachers disagreed with SAS’s marketing claims such as EVAAS reports are easy to use to improve instruction, and EVAAS will ensure growth opportunities for all students. Teachers called the reports “vague” and “unclear” and were “not quite sure how to interpret” and use the data to inform their instruction. As one teacher explained, she looked at her EVAAS report “only to guess as to what to do for the next group in my class.”

Many unintended consequences associated with the high-stakes use of EVAAS emerged through teachers’ responses, which revealed among others that teachers felt heightened pressure and competition, which they believed reduced morale and collaboration, and encouraged cheating or teaching to the test in attempt to raise EVAAS scores. Teachers made comments such as, “To gain the highest EVAAS score, drill and kill and memorization yields the best results, as does teaching to the test,” and “When I figured out how to teach to the test, the scores went up,” as well as, “EVAAS leaves room for me to teach to the test and appear successful.”

Teachers realized this emphasis on test scores was detrimental for students, as one teacher wrote, “As a result of the emphasis on EVAAS, we teach less math, not more. Too much drill and kill and too little understanding [for the] love of math… Raising a generation of children under these circumstances seems best suited for a country of followers, not inventors, not world leaders.”

Teachers also admitted they are not collaborating to share best practices as much anymore: “Since the inception of the EVAAS system, teachers have become even more distrustful of each other because they are afraid that someone might steal a good teaching method or materials from them and in turn earn more bonus money. This is not conducive to having a good work environment, and it actually is detrimental to students because teachers are not willing to share ideas or materials that might help increase student learning and achievement.”

While I realize this body of work could simply add to “the shelves” along with those findings of other researchers striving to deflate and demystify this latest round of education reform, if nothing else, I hope the teachers who participated in this study know I am determined to let their true experiences, perceptions of their experiences, and voices be heard.

—–

Again, to find out more information including the statistics in support of the above assertions and findings, please click here to read the full study.

Thomas Kane On Educational Reform

You might recall from a prior post, the name of Thomas Kane, an economics professor from Harvard University who also directed the $45 million worth of Measures of Effective Teaching (MET) studies for the Bill & Melinda Gates Foundation. Not surprisingly as a VAM advocate, he advanced then, and continues to advance now, a series of highly false claims about the wonderful potentials of VAMs.

As highlighted in the piece Kane wrote, and Brookings released on its website on “The Case for Combining Teacher Evaluation and the Common Core,” Kane continues to advance a series of highly false claims and assumptions in terms of how “better teacher evaluation systems will be vital for any broad [educational] reform effort, such as implementing the Common Core.” Exerting series of “heroic assumptions” without evidence seems to be a recurring theme, which I cannot myself figure out knowing Kane’s an academic and quite honestly should know better.

Here are some examples of what I speak (and protest):

  • Educational reform is “a massive adult behavior change exercise…[U]nless we change what adults do every day inside their classrooms, we cannot expect student outcomes to improve.” Enter teachers as the new and popular (thanks to folks like Kane) sources of blame. We are to accept Kane’s assumption, here, that teachers have not been motivated prior to change their adult behaviors and teach their students well, help their students learn, improve their students’ outcomes, and the like.
  • Hence, when “current attempts to implement new teacher evaluations fall short—as they certainly will, given the long history of box-checking—we must improve them.” We are to accept Kane’s assumption here, despite the fact that little to no research evidence exists supporting that teacher evaluation systems improve much of anything including “improved student outcomes,” that new teacher evaluation systems based on carrot and stick measures are going to do this.
  • Positioning new and improved teacher evaluation systems against another educational reform approach (which I have never seen positioned as a reform approach, but nonetheless), Kane argues “professional development hasn’t worked in the past” so we must go with new teacher evaluation systems? Nobody I know who conducts research on educational reform ever suggested professional development was or could ever be proposed to reform America’s public schools. Rather, professional development is simply a standard of a profession, the teaching profession that (at least to many of us) it is still meant to be just that. If we are to talk about research-based ways to reform our schools, there are indeed other solutions. These other solutions, however, are unfortunately more expensive and, hence, less popular among those who continue to advance cheap and “logical” or “rational” solutions such as those advanced by Kane.
  • Ironically, Kane cites and links to two external studies when arguing that “[b]etter teacher evaluation systems have been shown to be related to better outcomes for students.” While the first piece Kane references might have something to do with this (as per reading the abstract, but not the full piece), the second piece cited and linked to by Kane, rather, is about how professional or teacher development focused on supporting teacher and student interactions actually increased student learning and achievement. But “professional development hasn’t worked in the past?” Funny…
  • Kane also exerts that “The Common Core is more likely to succeed in sites that are implementing better teacher evaluation and feedback as well.” Where’s the evidence on that one…
  • There is really only one thing written into this piece on which we agree: the use of student surveys to provide teachers with student-based feedback (this was the source of a recent post I wrote here).

Thereafter, Kane goes into a series of suggestions for administrators and teachers on how they should, for example, conduct “side-by-side comparison[s] of the new and old standards and identify a few standards—no more than two or three in each grade and subject—to focus on during the upcoming year” — and — how administrators should “schedule classroom observations for the days when the new standards are to be taught.” Indeed, “[e]ven one successful cycle will lay the foundation for the next round of instructional improvement.”

I do have to say, though, as a former teacher, I would advise others to not heed the advice of a person who has conducted a heck of a lot of research “on” education but who has, as far as I can tell or find on the internet (see his full resume or curriculum vita here), not ever been a teacher “in” education himself, or much less set foot in the classroom. I’m sorry practitioners, for my colleague for doing this from (as you sometimes criticize us as a whole) atop his ivory tower post.

Kane concludes with the following: “The norm of autonomous, self-made, self-directed instruction—with no outside feedback or intervention—is long-standing and makes the U.S. education system especially resistant to change. In most high-performing countries, teachers have no such expectations. The lesson study in Japan is a good example. Teachers do not bootstrap their own instruction. They do not expect to be left alone. They expect standards, they expect feedback from peers and supervisors and they expect to be held accountable—for the quality of their delivery as well as for student results. Therefore, a better system for teacher evaluation and feedback is necessary to support individual behavior change, and it’s a tool for collective culture change as well.”

So much of what he wrote here, really in every single sentence, could not be further from the truth, so much so I care not to dissect each point and waste your time further.

As I also said in my prior post, if I was to make a list of VAMboozlers, Kane would still be near the top of the list. All of the reasons for my nomination are highlighted yet again here, unfortunately, but this time as per what Kane wrote himself. Again, though, you can be the judges and read this piece for yourselves, or not.

Can Today’s Tests Yield Instructionally Useful Data?

The answer is no, or at best not yet.

Some heavy hitters in the academy just released an article that might be of interest to you all. In the article the authors discuss whether “today’s standardized achievement tests [actually] yield instructionally useful data.”

The authors include W. James Popham, Professor Emeritus from the University of California, Los Angeles; David Berliner, Regents’ Professor Emeritus at Arizona State University; Neal Kingston, Professor at the University of Kansas; Susan Fuhrman, current President of Teachers College, Columbia University; Steven Ladd, Superintendent of Elk Grove Unified School District in California; Jeffrey Charbonneau, National Board Certified Teacher in Washington and the 2013 US National Teacher of the Year; and Madhabi Chatterji, Associate Professor at Teachers College, Columbia University.

These authors explored some of the challenges and promises in terms of using and designing standardized achievement tests and other educational tests that are “instructionally useful.” This was the focus of a recent post about whether Pearson’s tests are “instructionally sensitive” and what University of Texas – Austin’s Associate Professor Walter Stroup versus Pearson’s Senior Vice President had to say on this topic.

In this study, authors deliberate more specifically the consequences of using inappropriately designed tests for decision-making purposes, particularly when tests are insensitive to instruction. Here, the authors underscore serious issues related to validity, ethics, and consequences, all of which they use and appropriately elevate to speak out, particularly against the use of current, large-scale standardized achievement tests for evaluating teachers and schools.

The authors also make recommendations for local policy contexts, offering recommendations to support (1) the design of more instructionally sensitive large-scale tests as well as (2) the design of other smaller scale tests that can also be more instructionally sensitive, and just better. These include but are not limited to classroom tests as typically created, controlled, and managed by teachers, as well as district tests as sometimes created, controlled, and managed by district administrators.

Such tests might help to create more but also better comprehensive educational evaluation systems, the authors ultimately argue. Although this, of course, would require more professional development to help teachers (and others, including district personnel) develop more instructionally sensitive, and accordingly useful tests. As they also note, this would also require that “validation studies…be undertaken to ensure validity in interpretations of results within the larger accountability policy context where schools and teachers are evaluated.”

This is especially important if tests are to be used for low and high-stakes decision-making purposes. Yet this is something that is way too often forgotten when it comes to test use, and in particular test abuse. All should really take heed here.

Reference: Popham, W. J., Berliner, D. C., Kingston, N. M., Fuhrman, S. H., Ladd, S. M., Charbonneau, J., & Chatterji, M. (2014). Can today’s standardized achievement tests yield instructionally useful data? Quality Assurance in Education, 22(4), 303-318 doi:10.1108/QAE-07-2014-0033. Retrieved from http://www.tc.columbia.edu/aeri/publications/QAE1.pdf

The Arbitrariness Inherent in Teacher Observations

In a recent article released in The Journal News, a newspaper serving many suburban New York counties, another common problem is highlighted whereby districts that have adopted the same teacher observational system (in this case as mandated by the state) are scoring what are likely to be very similar teachers very differently. Whereby teachers in one of the best school districts not only in the state but in the nation apparently has no “highly effective” teachers on staff, teachers in a neighboring district apparently have a staff 99% filled with “highly effective” teachers.

The “believed to be” model developer, Charlotte Danielson, is cited as stating that “Saying 99 percent of your teachers are highly effective is laughable.” I don’t know if I completely agree with her statement, and I do have to admit I question her perspective on this one, and all of her comments throughout this article for that matter, as she is the one who is purportedly offering up her “valid” Framework for Teaching for such observational purposes. Perhaps she’s displacing blame and arguing that it’s the subjectivity of the scorers rather than the subjectivity inherent in her system that should be to blame for the stark discrepancies.

As per Danielson: “The local administrators know who they are evaluating and are often influenced by personal bias…What it also means is that they might have set the standards too low.” As per the Superintendent of the District with 99% highly effective teachers: The state’s “flawed” evaluation model forced districts to “bump up” the scores so “effective” teachers wouldn’t end up with a rating of “developing.” The Superintendent adds that it is possible under the state’s system to be rated “effective” across domains and still end up rated as “developing,” which means teachers may be in need of intervention/improvement, or may be eligible for an expedited hearing process that could lead to their termination. Rather it may have been the case that the scores were inflated to save effective teachers from what the district viewed as an ineffective set of consequences attached to the observational system (i.e., intervention or termination).

Danielson is also cited as saying that “teachers should live in “effective” and only [occasionally] visit “highly effective.” She also notes that if her system contradicts teachers’ value-added scores, this too should “raise red flags” about the quality of the teacher, although she does not (in this article) pay any respect or regard for the issues not only inherent in value-added measures but also her observational system.

What is most important in this article, though, is that reading through it illustrates well the arbitrariness of how all of the measures being mandated and used to evaluate teachers are actually being used. Take, for example, the other note herein that the state department’s intent seems to be that 70%-80% percent of teachers should “fall in the middle” as “developing” or “effective.” While this is mathematically impossible (i.e., to have 70%-80% hang around average), this could not be more arbitrary.

In the end, teacher evaluation systems are highly flawed and highly subjective and highly prone to error and the like, and for people who just don’t “get it” to be passing policies on the contrary, is nonsensical and absurd. These flaws are not as important when evaluation system data can be used for formative, or informative purposes whereas data consumers have more freedom to take the data for what they are worth. When summary, or summative decisions are to be made as based on these data, regardless of whether low or high-stakes are attached to the decision, this is where things really go awry.

Principals’ Perspectives on Value-Added

Principals are not using recent teacher evaluation data, including data from value-added assessment systems, student surveys, and other student achievement indicators, to inform decisions about hiring, placements, and professional development, according to findings from a research study recently released by researchers at Vanderbilt University.

The data most often used by principals? Data collected via their direct observations of their teachers in practice.

Education Week’s Denisa Superville also covered this study here, writing that principals are most likely to use classroom-observation data to inform such decisions, rather than the data yielded via VAMs and other student test scores. Of least relevance were data derived via parent surveys.

Reasons for not using value-added data specifically? “[A]access to the data, the availability of value-added measures when decisions are being made, a lack of understanding of the statistical models used in the evaluation systems, and the absence of training in using [value-added] data.”

Moving forward, “the researchers recommend that districts clarify their expectations for how principals should use data and what data sources should be used for specific human-resources decisions. They recommend training for principals on using value-added estimates, openly encouraging discussions about data use, and clarifying the roles of value-added estimates and observation scores.”

If this is to happen, hopefully such efforts will be informed by the research community, in order to help district and administrators more critically consume value-added data in particular, for that which they can and cannot do.

Note: This study is not yet peer-reviewed, so please consume this information for yourself with that being known.

Texas Hangin’ its Hat on its New VAM System

A fellow blogger, James Hamric and author of Hammy’s Education Reform Blog, emailed a few weeks ago connecting me with a recent post he wrote about teacher evaluations in Texas, titling them and his blog post “The good, the bad and the ridiculous.”

It seems that the Texas Education Agency (TEA), which serves a similar role in Texas as a state department of education elsewhere, recently posted details about the state’s new Teacher Evaluation and Support System (TESS) that the state submitted to the U.S. Department of Education to satisfy the condition’s of its No Child Left Behind (NCLB) waiver, excusing Texas from not meeting NCLB’s prior goal that all students in the state (and all other states) would be 100% proficient in mathematics and reading/language arts by 2014.

While “80% of TESS will be rubric based evaluations consisting of formal observations, self assessment and professional development across six domains…The remaining 20% of TESS ‘will be reflected in a student growth measure at the individual teacher level that will include a value-add score based on student growth as measured by state assessments.’ These value added measures (VAMs) will only apply to approximately one quarter of the teachers [however, and as is the case more or less throughout the country] – those [who] teach testable subjects/grades. For all the other teachers, local districts will have flexibility for the remaining 20% of the evaluation score.” This “flexibility” will include options that include student learning objectives (SLOs), portfolios or district-level pre- and post-tests.

Hamric then goes onto review his concerns about the VAM-based component. While we have highlighted these issues and concerns many times prior on this blog, I do recommend reading these as summarized by others other than us who write here in this blog. This may just help to saturate our minds, and also prepare them to defend ourselves against the “good, bad, and the ridiculous” and perhaps work towards better systems of teacher evaluation, as is really the goal. Click here, again, to read this post in full.

Related, Hamric concludes with the following, “the vast majority of educators want constructive feedback, almost to a fault. As long as the administrator is well trained and qualified, a rubric based evaluation should be sufficient to assess the effectiveness of a teacher. While the mathematical validity of value added models are accepted in more economic and concrete realms, they should not be even a small part of educator evaluations and certainly not any part of high-stakes decisions as to continuing employment. It is my hope that, as Texas rolls out TESS in pilot districts in the 2014-2015 school year, serious consideration will be given to removing the VAM component completely.”

VAMs and the “Dummies In Charge:” A Clever “Must Read”

Peter Greene wrote a very clever, poignant, and to the point article about VAMs, titled “VAMs for Dummies” in The Blog of The Huffington Post. While I already tweeted, facebooked, shared, and else short of printing this one out for my files, I thought it imperative I also share it out with you. Also – Greene gave us at VAMboozled a wonderful shout out directing readers here to find out more. So Peter — even though I’ve never met you — thanks for the kudos and keep it coming. This is a fabulous piece!

Click here to read his piece in full. I’ve also pasted it below, mainly because this one is a keeper. See also a link to a “nifty” 3-minute video on VAMs below.

If you don’t spend every day with your head stuck in the reform toilet, receiving the never-ending education swirly that is school reformy stuff, there are terms that may not be entirely clear to you. One is VAM — Value-Added Measure.

VAM is a concept borrowed from manufacturing. If I take one dollar’s worth of sheet metal and turn it into a lovely planter that I can sell for ten dollars, I’ve added nine dollars of value to the metal.

It’s a useful concept in manufacturing management. For instance, if my accounting tells me that it costs me ten dollars in labor to add five dollars of value to an object, I should plan my going-out-of-business sale today.

And a few years back, when we were all staring down the NCLB law requiring that 100 percent of our students be above average by this year, it struck many people as a good idea — let’s check instead to see if teachers are making students better. Let’s measure if teachers have added value to the individual student.

There are so many things wrong with this conceptually, starting with the idea that a student is like a piece of manufacturing material and continuing on through the reaffirmation of the school-is-a-factory model of education. But there are other problems as well.

1) Back in the manufacturing model, I knew how much value my piece of metal had before I started working my magic on it. We have no such information for students.

2) The piece of sheet metal, if it just sits there, will still be a piece of sheet metal. If anything, it will get rusty and less valuable. But a child, left to its own devices, will still get older, bigger, and smarter. A child will add value on its own, out of thin air. Almost like it was some living, breathing sentient being and not a piece of raw manufacturing material.

3) All piece of sheet metals are created equal. Any that are too not-equal get thrown in the hopper. On the assembly line, each piece of metal is as easy to add value to as the last. But here we have one more reformy idea predicated on the idea that children are pretty much identical.

How to solve these three big problems? Call the statisticians!

This is the point at which that horrifying formula that pops up in these discussion appears. Or actually, a version of it, because each state has its own special sauce when it comes to VAM. In Pennsylvania, our special VAM sauce is called PVAAS [i.e., the EVAAS in Pennsylvania]. I went to a state training session about PVAAS in 2009 and wrote about it for my regular newspaper gig. Here’s what I said about how the formula works at the time:

PVAAS uses a thousand points of data to project the test results for students. This is a highly complex model that three well-paid consultants could not clearly explain to seven college-educated adults, but there were lots of bars and graphs, so you know it’s really good. I searched for a comparison and first tried “sophisticated guess;” the consultant quickly corrected me–“sophisticated prediction.” I tried again–was it like a weather report, developed by comparing thousands of instances of similar conditions to predict the probability of what will happen next? Yes, I was told. That was exactly right. This makes me feel much better about PVAAS, because weather reports are the height of perfect prediction.

Here’s how it’s supposed to work. The magic formula will factor in everything from your socio-economics through the trends over the past X years in your classroom, throw in your pre-testy thing if you like, and will spit out a prediction of how Johnny would have done on the test in some neutral universe where nothing special happened to Johnny. Your job as a teacher is to get your real Johnny to do better on The Test than Alternate Universe Johnny would.

See? All that’s required for VAM to work is believing that the state can accurately predict exactly how well your students would have done this year if you were an average teacher. How could anything possibly go wrong??

And it should be noted — all of these issues occur in the process before we add refinements such as giving VAM scores based on students that the teacher doesn’t even teach. There is no parallel for this in the original industrial VAM model, because nobody anywhere could imagine that it’s not insanely ridiculous.

If you want to know more, the interwebs are full of material debunking this model, because nobody — I mean nobody — believes in it except politicians and corporate privateers. So you can look at anything from this nifty three minute video to the awesome blog Vamboozled by Audrey Amrein-Beardsley.

This is one more example of a feature of reformy stuff that is so top-to-bottom stupid that it’s hard to understand. But whether you skim the surface, look at the philosophical basis, or dive into the math, VAM does not hold up. You may be among the people who feel like you don’t quite get it, but let me reassure you — when I titled this “VAM for Dummies,” I wasn’t talking about you. VAM is always and only for dummies; it’s just that right now, those dummies are in charge.

 

The Intersection of Standards and their Assessments: From an AZ Teacher

In January, I wrote a post about “An AZ Teacher’s Perspective on Her “Value-Added.” Valerie Strauss covered the same story in her The Answer Sheet blog for The Washington Post, validating for me that readers appreciate stories from the field that explain in better terms than I can what is actually happening as these VAM-based teacher accountability and evaluation systems are being “lived out” in practice.

Well, the same AZ teacher wrote to me another story that I encourage you all to read, about the intersection and alignment of standards and their assessments, or more specifically the lack thereof.

She writes:

A fundamental principal in education is the precise alignment of the teaching of learning objectives (standards) with the assessment of learning objectives (tests). Research has demonstrated that when an educator plans lessons that begin with an analysis of what students need to learn, coupled with how a student will demonstrate the learning, achievement tends to happen. This is a “best practice” in education.

Enter: standards’ reform. My school district saw the writing on the wall: Common Core implementation was going to be massive. Beyond a shift in the philosophical underpinnings of standards (college and career readiness vs. every state for themselves), Common Core implementation meant, in some cases, a shift in instructional approaches (inquiry vs. modeling). And frequently, Common Core implementation meant changes in what got taught and in which grade levels.

Like any well-meaning, responsible school district, my district realized these changes were going to take time.  And so they began Common Core implementation earlier than others in the 2012-2013 school year. And based on what we know about best practices, when the standards change, the assessments should change as well. But they didn’t—yet. For the past two years, I (and many, many others) have been teaching standards that are NOT fully aligned to the state assessment system. Instead, we’ve been frantically (and some may say schizophrenically!) trying to teach two sets of standards—the old (aligned with Arizona’s current state assessment) and the new Common Core.

Enter: value-added measures. Value added measures are statistical tools aimed at capturing a teacher’s impact on student achievement through student performance on standardized tests. A few years ago, Arizona passed a law that mandates up to 50% of a teacher’s evaluation be comprised of student test scores. And again, my school district did what any responsible, law-abiding district would do: implement a teacher evaluation system that complies with state law: 50% of a teacher’s evaluation is composed of student test scores and 50% is composed of classroom observations.

The intersection of these two policies is a problem for teachers (and students!). If the state assessment is not designed to precisely test the standards that are being taught, it cannot be legitimately claimed that value-added measures (or any other measure using student test scores!) are capturing a teacher’s impact on student achievement? One problem for teachers is that their employment status may hinge on the outcome. One problem for students is that what they are learning may not be what they are ultimately held accountable for on the state assessment.

In this case, compliance with law has superseded the use of best practices. Let us hope this doesn’t happen in another field–say, healthcare?