The Multiple Teacher Evaluation System(s) in New Mexico, from a Concerned New Mexico Parent

A “concerned New Mexico parent” who wrote a prior post for this blog here, wrote another for you all below, about the sheer numbers of different teacher evaluation systems, or variations, now in place in his/her state of New Mexico. (S)he writes:

Readers of this blog are well aware of the limitations of VAMs for evaluating teachers. However, many readers may not be aware that there are actually many system variations used to evaluate teachers. In the state of New Mexico, for example, 217 different variations are used to evaluate the many and diverse types of teachers teaching in the state [and likely all other states].

But. Is there any evidence that they are valid? NO. Is there any evidence that they are equivalent? NO. Is there any evidence that this is fair? NO.

The New Mexico Public Education Department (NMPED) provides a framework for teacher evaluations, and the final teacher evaluation should be weighted as follows: Improved Student Achievement (50%), Teacher Observations (25%), and Multiple Measures (25%).

Every school district in New Mexico is required to submit a detailed evaluation plan of specifically what measures will be used to satisfy the overall NMPED 50-25-25 percentage framework, after which NMPED approves all plans.

The exact details of any district’s educator effectiveness plan can be found on the NMTEACH website, as every public and charter school plan is posted here.

There are massive differences between how groups of teachers are graded between districts, however, which distorts most everything about the system(s), including the extent to which similar (and different) teachers might be similarly (and fairly) evaluated and assessed.

Even within districts, there are massive differences in how grade level (elementary, middle, high school) teachers are evaluated.

And, even something as seemingly simple as evaluating K-2 teachers requires 42 different variations in scoring.

Table 1 below shows the number of different scales used to calculate teacher effectiveness for each group of teachers and each grade level, for example, at the state level.

New Mexico divides all teachers into three categories — group A teachers have scores based on the statewide test (mathematics, English/language arts (ELA)), group B teachers (e.g. music or history) do not have a corresponding statewide test, and group C teachers teach grades K-2. Table 1 shows the number of scales used by New Mexico school districts for each teacher group. It is further broken down by grade-level. For example, as illustrated, there are 42 different scales used to evaluate Elementary-level Group A teachers in New Mexico. The column marked “Unique (one-offs)” indicates the number of scales that are completely unique for a given teacher group and grade-level. For example, as illustrated, there are 11 unique scales used to grade Group B High School teachers, and for each of these eleven scales, only one district, one grade-level, and one teacher group is evaluated within the entire state.

Based on the size of the school district, a unique scale may be grading as few as a dozen teachers! In addition, there are 217 scales used statewide, with 99 of these scales being unique (by teacher)!

Table 1: New Mexico Teacher Evaluation System(s)

Group Grade Scales Used Unique (one-offs)
Group A (SBA-based) All 58 15
(e.g. 5th grade English teacher) Elem 42 10
MS 37 2
HS 37 3
Group B (non-SBA) All 117 56
(e.g. Elem music teacher) Elem 67 37
MS 62 8
HS 61 11
Group C (grades K-2) All 42 28
Elem 42 28
TOTAL   217 variants 99 one-offs

The table above highlights the spectacular absurdity of the New Mexico Teacher Evaluation System.

(The complete listings of all variants for the three groups are contained here (in Table A for Group A), here (in Table B for Group B), and here (in Table C for Group C). The abbreviations and notes for these tables are listed here (in Table D).

By approving all of these different formulas, all things considered, NMPED is also making the following nonsensical claims..

NMPED Claim: The prototype 50-25-25 percentage split has some validity.

There is no evidence to support this division between student achievement measures, observation, and multiple measures at all. It simply represents what NMPED could politically “get away with” in terms of a formula. Why not 60-20-20 or 57-23-20 or 46-18-36, etcetera? The NMPED prototype scale has no proven validity, whatsoever.

NMPED Claim: All 217 formulas are equivalent to evaluate teachers.

This claim by NMPED is absurd on its face and every other part of its… Is there any evidence that they have cross-validated the tests? There is no evidence that any of these scales are valid or accurate measures of “teacher effectiveness.” Also, there is no evidence whatsoever that they are equivalent.

Further, if the formulas are equivalent (as NMPED claims), why is New Mexico wasting money on technology for administering SBA tests or End-of-Course exams? Why not use an NMPED-approved formula that includes tests like Discovery, MAPS, DIBELS, or Star that are already being used?

NMPED Claim: Teacher Attendance and Student Surveys are interchangeable.

According to the approved plans, many districts assign 10% to Teacher Attendance while other districts assign 10% to Student Surveys. Both variants have been approved by NMPED.

Mathematically, (i.e., in terms of the proportions either is to be allotted) they appear to be interchangeable. If that is so, why is NMPED also specifically trying to enforce Teacher Attendance as an element of the evaluation scale? Why did Hanna Skandera proclaim to the press that this measure improved New Mexico education? (For typical news coverage, on this topic, for example, see here).

The use of teacher attendance appears to be motivated by union-busting rather than any mathematical rationale.

NMPED Claim: All observation methods are equivalent.

NMPED allows for three very different observation methods to be used for 40% of the final score. Each method is somewhat complicated and involves different observers.

There is no indication that NMPED has evaluated the reliability or validity of these three very different observation methods, or tested their results for equivalence. They simply assert that they are equivalent.

NMPED Claim: These formulas will be used to rate teachers.

These formulas are the worst kind of statistical jiggery-pokery (to use a newly current phrase). NMPED presents a seemingly rational, scientific number to the public using invalid and unvalidated mathematical manipulations and then determines teachers’ careers based on the completely bogus New Mexico teacher evaluation system(s).

Conclusion: Not only is the emperor naked, he has a closet containing 217 equivalent outfits at home!

Splits, Rotations, and Other Consequences of Teaching in a High-Stakes Environment in an Urban School

An Arizona teacher who teaches in a very urban, high-needs schools writes about the realities of teaching in her school, under the pressures that come along with high-stakes accountability and a teacher workforce working under an administration, both of which are operating in chaos. This is a must read, as she also talks about two unintended consequences of educational reform in her school about which I’ve never heard before: splits and rotations. Both seem to occur at all costs simply to stay afloat during “rough” times, but both also likely have deleterious effects on students in such schools, as well as teachers being held accountable for the students “they” teach.

She writes:

Last academic year (2012-2013) a new system for evaluating teachers was introduced into my school district. And it was rough. Teachers were dropping like flies. Some were stressed to the point of requiring medical leave. Others were labeled ineffective based on a couple classroom observations and were asked to leave. By mid-year, the school was down five teachers. And there were a handful of others who felt it was just a matter of time before they were labeled ineffective and asked to leave, too.

The situation became even worse when the long-term substitutes who had been brought in to cover those teacher-less classrooms began to leave also. Those students with no contracted teacher and no substitute began getting “split”. “Splitting” is what the administration of a school does in a desperate effort to put kids somewhere. And where the students go doesn’t seem to matter. A class roster is printed, and the first five students on the roster go to teacher A. The second five students go to teacher B, and so on. Grade-level isn’t even much of a consideration. Fourth graders get split to fifth grade classrooms. Sixth graders get split to 5th and 7th grade classrooms. And yes, even 7th and 8th graders get split to 5th grade classrooms. Was it difficult to have another five students in my class? Yes. Was it made more difficult that they weren’t even of the same grade level I was teaching? Yes. This went on for weeks…

And then the situation became even worse. As it became more apparent that the revolving door of long-term substitutes was out of control, the administration began “The Rotation.” “The Rotation” was a plan that used the contracted teachers (who remained!) as substitutes in those teacher-less classrooms. And so once or twice a week, I (and others) would get an email from the administration alerting me that it was my turn to substitute during prep time. Was it difficult to sacrifice 20-40 % of weekly prep time (that is used to do essential work like plan lessons, gather materials, grade, call parents, etc…) Yes. Was it difficult to teach in a classroom that had a different teacher, literally, every hour without coordinated lessons? Yes.

Despite this absurd scenario, in October 2013, I received a letter from my school district indicating how I fared in this inaugural year of the teacher evaluation system. It wasn’t good. Fifty percent of my performance label was based on school test scores (not on the test scores of my homeroom students). How well can students perform on tests when they don’t have a consistent teacher?

So when I think about accountability, I wonder now what it is I was actually held accountable for? An ailing, urban school? An ineffective leadership team who couldn’t keep a workforce together? Or was I just held accountable for not walking away from a no-win situation?

Coincidentally, this 2013-2014 academic year has, in many ways, mirrored the 2012-2013. The upside is that this year, only 10% of my evaluation is based on school-wide test scores (the other 40% will be my homeroom students’ test scores). This year, I have a fighting chance to receive a good label. One more year of an unfavorable performance label and the district will have to, by law, do something about me. Ironically, if it comes to that point, the district can replace me with a long-term substitute, who is not subject to the same evaluation system that I am. Moreover, that long-term substitute doesn’t have to hold a teaching certificate. Further, that long-term substitute will cost the district a lot less money in benefits (i.e. healthcare, retirement system contributions).

I should probably start looking for a job—maybe as a long-term substitute.

Another Teacher’s Contract Not Renewed — Another Teacher Speaks Out

As per a video linked within this article just released by ABC News in Tennessee, another teacher, this time coming from Knox County, Tennessee, speaks to her school board about her teaching contract not being renewed.


As per the article, teacher “Christina Graham taught kindergarten for three years at Copper Ridge Elementary. Last year she spoke out against SAT-10 testing for kindergartners. After her speech to the school board, Graham was called in to talk to her principal about being a representative for Knox County Schools. Graham said that was only time she was ever pulled in to talk about an issue, and she was told then that it wasn’t a disciplinary meeting.”

The only reason for her non-renewal? “She no longer fit the vision for that school.”

Many parents and teachers are speaking out on her behalf, arguing her non-renewal is the district’s way of retaliating against a teacher who spoke out. See, also, the district’s official response here.

One School’s Legitimately, “New and Improved” Teacher Evaluation System: In TIME Magazine

In an article featured this week in TIME Magazine titled “How Do You Measure a Teacher’s Worth?” author Karen Hunter Quartz – research director at the UCLA Community School and a faculty member in the UCLA Graduate School of Education – describes the legitimately, “new and improved” teacher evaluation system co-constructed by teachers, valued as professionals, in Los Angeles.

Below are what I read as the highlights, and also some comments re: the highlights, but please do click here for the full read as this whole article is in line with what many who research teacher evaluation systems support (see, for example, Chapter 8 in my Rethinking Value-Added Models in Education…).

“For the past five years, teachers at the UCLA Community School, in Koreatown, have been mapping out their own process of evaluation based on multiple measures — and building both a new system and their faith in it…this school is the only one trying to create its own teacher evaluation infrastructure, building on the district’s groundwork…[with] the evaluation process [fully] owned by the teachers themselves.”

“Indeed, these teachers embrace their individual and collective responsibility to advance exemplary teaching practices and believe that collecting and using multiple measures of teaching practice will increase their professional knowledge and growth. They are tough critics of the measures under development, with a focus on making sure the measures help make teachers better at their craft.”

Their new and improved system is based on three different kinds of data — student surveys, observations, and portfolio assessments. The latter includes an assignment teachers gave students, how teachers taught this assignment, and samples of the student work produced during/post the assignment given. Teachers’ portfolios were then scored by “educators trained at UCLA to assess teaching quality on several dimensions, including academic rigor and relevance. Teachers then completed a reflection on the scores they received, what they learned from the data, and how they planned to improve their practice.”

Hence, the “legitimate” part of the title of this post, in that this section is being externally vetted. As for the “new and improved” part of the title of this post, this comes from data indicating that “almost all teachers reported in a survey that they appreciated receiving multiple measures of their practice. Most teachers reported that the measures were a fair assessment of the quality of their teaching, and that the evaluation process helped them grow as educators.”

However, there was also “consensus that more information was needed to help them improve their scores. For example, some teachers wanted to know how to make assignments more relevant to students’ lives; others asked for more support reflecting on their observation transcripts.”

In the end, though, “[p]erhaps the most important accomplishment of this new system was that it restored teachers’ trust in the process of evaluation. Very few teachers trust that value-added measures — which are based on tests that are far removed from their daily work — can inform their improvement. This is an issue explored by researchers who are probing the unintended consequences of teacher accountability systems tied to value-added measures.”

New Mexico Teachers Burn Their State-Based Teacher Evaluations

More than three dozen teachers,”including many who [had] just been rated “highly effective” by the New Mexico Public Education Department, working in the Albuquerque Public School District – the largest public school district in the state of New Mexico – turned to a burning bin this week, tossing their state-developed teacher evaluations into the fire in protest in front of district headquarters.

See the full article (with picture below) in The Albuquerque Journal here.

mkb050815h/metro/Marla Brose/050815 Linnea Montoya, a kindergarten teacher at Montezuma Elementary, drops her teacher evaluation into a waste basket with other burning evaluations in front of Albuquerque Public Schools headquarters, Wednesday, May 20, 2015, in Albuquerque, N.M. A group of teachers filled the entrance to APS to participate in the teacher evaluation protest. "It insulted my fellow teachers who mentored me and scored lower," Montoya said. (Marla Brose/Albuquerque Journal)

“Courtney Hinman ignited the blaze by taking a lighter to his “effective” evaluation. He was quickly followed by a “minimally effective” special education teacher from Albuquerque High School, then by a “highly effective” teacher from Monte Vista Elementary School. Wally Walstrom, also of Monte Vista Elementary, told the crowd of 60 or 70 people that his “highly effective” rating was “meaningless,” before tossing it into the fire. One after another, teachers used the words “meaningless” and “unfair” to describe the evaluations and the process used to arrive at those judgments…Another teacher said the majority of his autistic, special-needs students failed the SBA – a mandatory assessment test – yet he was judged “highly effective. ‘How can that be?’ he asked as he dropped his evaluation into the fire.”

“An English teacher said he was judged on student progress – in algebra and geometry.
Another said she had taught a mere two months, yet was evaluated as if she had been in the classroom for an entire school year. Several said their scores were lowered only because they were sick and stayed away from school. One woman said parents routinely say she’s the best teacher their children have ever had, yet she was rated ‘minimally effective.’ An Atrisco Heritage teacher said most of the math teachers there had been judged ‘minimally effective.’ And a teacher of gifted children who routinely scored at the top in assessment testing asked, ‘How could they advance?’ before tossing his “highly effective” evaluation into the blaze.”

With support from New Mexico’s Governor Susana Martinez, New Mexico teacher evaluation systems’ master creator – Education Secretary Hanna Skandera’s – could not be reached for comment.

Read the full article, again, here, and read more about what else is going on in New Mexico in prior posts on VAMboozled! here, here, here, and here.)

New York’s VAM, by the American Institute for Research (AIR)

A colleague of mine — Stephen Caldas, Professor of Educational Leadership at Manhattanville College, one of the “heavyweights” who recently visited New York to discuss the state’s teacher evaluation system, and who according to Chalkbeat New York, “once called New York’s evaluation system “psychometrically indefensible” — wrote me with a critique of New Yorks’ VAM which I decided to post for you all here.

His critique is of the 2013-2014 Growth Model for Educator Evaluation Technical Report, produced by the American Institute for Research (AIR) that, “describes the models used to measure student growth for the purpose of educator evaluation in New York State for the 2013-2014 School Year” (p. 1).

Here’s what he wrote:

I’ve analyzed this tech report, which for many would be a great sedative prior to sleeping. It’s the latest in a series of three reports by AIR paid for by the New York State Education Department. Although the truth of how good the growth models used by AIR really are is buried deep in the report in Table 11 (p. 31) and Table 20 (p. 44), both of which are recreated here.

Table 11Table 20These tables give us indicators of how well the growth models are at predicting growth in current year student English/language arts (ELA) and mathematics (MATH) student scores by grade level and subject (i.e., the dependent variables). At the secondary level, an additional outcome, or dependent variable predicted is the number of Regents Exams a student passed for the first time in the current year. The unadjusted models only included prior academic achievement as predictor variables, and are shown for comparison purposes only. The adjusted models were the models that were actually used by the state to make predictions that fed into teacher and principal effectiveness scores. In additional to using prior student achievement as a predictor, the adjusted prediction models included these additional predictor variables: student and school-level poverty status, student and school-level socio-economic status (SES), student and school-level English language learner (ELL) status, and scores on the New York State English as a Second Language Achievement Test (the NYSESLAT). These tables above report a statistic called “Pseudo R-squared” or just “R-squared,” and this statistic shows us the predictive power of the overall models.

To help interpret these numbers, if one observes a “1.0” (which one won’t), it would mean that the model was “100%” perfect (with no prediction error). One would obtain the “percentage of perfect” (if you will) by moving the decimal point two places to the right. Otherwise, the difference between the percentage perfect and 100 is called the “error” or “e.”

With this knowledge, one can see in the adjusted ELA 8th grade model (Table 11) that the predictor variables altogether explain “74%” of the variance of current year student ELA 8th grade scores (R-squared = 0.74). Conversely, this same model has a 26% of error (and this is one of the best ones illustrated in the report). In other words, this particular prediction model cannot account for 26% of the cause of current ELA 8th grade scores, “all other things considered” (i.e., the predictor variables that are so highly correlated with test scores in the first place).

The prediction models at the secondary level are much, MUCH worse. If one is to look at Table 20, one would see that in the worst model (adjusted ELA Common Core ) the predictor variables together only explain 45% of student ELA Common Core test scores. Thus, this prediction model cannot account for 55% of the causes of these scores!!

While not terrible R-squared values for social science research, these are horrific values for a model used to make individual level predictions at the teacher or school level with any degree of precision. Quite frankly, they simply cannot be precise given these huge quantities of error. The chances that these models would precisely (with no error) predict a teacher’s or school’s ACTUAL student test scores is slim to none. Yet, the results of these imprecise growth models can contribute up to 40% of a teacher’s effectiveness rating.

This high level of imprecision would explain why teachers like Sheri Lederman of Long Island, who is apparently a terrific fourth grade educator based on all kinds of data besides her most recent VAM scores, received an “ineffective” rating based on this flawed growth model (see prior posts here and here). She clearly has a solid basis for her lawsuit against the state of New York in which she claims her score was “arbitrary and capricious.”

This kind of information on all the prediction error in these growth models needs to be in an executive summary in front of these technical reports. The interpretation of this error should be in PLAIN LANGUAGE for the tax payers who foot the bill for these reports, the policy makers who need to understand the findings in these reports, and the educators who suffer the consequences of such imprecision in measurement.

A Pragmatic Position on VAMs

Bennett Mackinney, a local administrator and former doctoral student in one of my research methods courses, recently emailed me asking the following: “Is a valid and reliable VAM theoretically possible? If so, is the issue that no state is taking the time and energy to develop it correctly? Or, is it just impossible? I need to land on a pragmatic position on VAMs. Part of the problem is that for the past decade all of us over here at Title I schools [i.e., schools with disproportionate numbers of disadvantaged students] have been saying “[our state test] is not fair…. our kids come to us with so many issues… you need to measure us on growth not final performance….”  I feel like VAM advocates will come to us and say, “fine, here’s a model that will meet your request to be fairly measured on growth…”

I responded with the following: “In my research-based opinion, we are searching and always will be searching for a type of utopia in this area, one that will likely be out of our research forever UNLESS these PARC, etc. tests come through with miracles [which, as history is likely to repeat itself, is highly unlikely]. However, at the end of the day, [we] can be confident that this is better than the snapshot measures used before (I.e., before growth measures), particularly for analyses of large-scale trends, but certainly not for teacher evaluation and especially not for high-stakes teacher evaluation purposes. Regardless of the purpose, though, NEVER should [VAMs] be used in isolation of other measures and NEVER should they be used without a great deal of human judgment re: what the VAM estimates in reality and in context demonstrate, in light of what they can and just cannot do.”

For more on this, see my prior post about the position statement recently released by the American Statistical Association (ASA), or the actual statement itself.

Doug Harris on the (Increased) Use of Value-Added in Louisiana

Thus far, there have been four total books written about value-added models (VAMs) in education: one (2005) scholarly, edited book that was published prior to our heightened policy interest in VAMs; one (2012) that is less scholarly but more of a field guide on how to use VAM-based data; my recent (2014) scholarly book; and another recent (2011) scholarly book written by Doug Harris. Doug is an Associate Professor of Economics at Tulane in Louisiana. He is also, as I’ve written prior, “a ‘cautious’ but quite active proponent of VAMs.”

There is quite an interesting history surrounding these latter two books, given Harris and I have quite different views on VAMs and their potentials in education. To read more about our differing opinions you can read a review of Harris’s book I wrote for Teachers College Record, and another review a former doctoral student and I wrote for Education Review, to which he responded in his (and his book’s) defense, to which we also responded (with a “rebuttal to a rebuttal, as you will“). What was ultimately titled a “Value-Added Smackdown” in a blog post featured in Education Week, let’s just say, got a little out of hand, with the “smackdown” ending up focusing almost solely around our claim that Harris believed, and we disagreed, with the notion that “value-added [was and still is] good enough to be used for [purposes of] educational accountability.” We asserted then, and I continue to assert now, that “value-added is not good enough to be attaching any sort of consequences much less any such decisions to its output. Value-added may not even be good enough even at the most basic, pragmatic level.”

Harris continues to disagree…

Just this month he released a technical report to his state’s school board (i.e., the Louisiana Board of Elementary and Secondary Education (BESE)), in which he (unfortunately) evidenced that he has not (yet) changed his scholarly stripes….even given the most recent research about the increasingly apparent, methodological, statistical, and pragmatic limitations (see, for example, here, here, here, here, and here), and the recent position statement released by the American Statistical Association underscoring the key points being evidenced across (most) educational research studies. See also the 24 articles published about VAMs in all American Educational Research Association (AERA) Journals here, along with open-access links to the actual articles.

In this report Harris makes “Recommendations to Improve the Louisiana System of
Accountability for Teachers, Leaders, Schools, and Districts,” the main one being that the state focus “more on student learning or growth—[by] specifically, calculating the predicted test scores and rewarding schools based on how well students do compared with those predictions.” The recommendations in more detail, in support, and that also pertain to our interests here include the following five (of six recommendations total):

1. “Focus more on student growth [i.e., value-added] in order to better measure the performance of schools.” Not that there is any research evidence in support, but “The state should [also] aim for a 50-­‐50 split between growth and achievement levels [i.e., not based on value-added].” Doing this at the school accountability level “would also improve alignment with teacher accountability, which includes student growth [i.e., teacher-level value-added] as 50% of the evaluation.”

2. “Reduce uneven incentives and avoid “incentive cliffs” by increasing [school performance score] points more gradually as students move to higher performance levels,” notwithstanding the fact that no research to date has evidenced that such incentives incentivize much of anything intended, at least in education. Regardless, and despite the research, “Giving more weight to achievement growth [will help to create] more even [emphasis added] incentives (see Recommendation #1).”

3. Related, “Create a larger number of school letter grades [to] create incentives for all schools to improve,” by adding +/- extensions to the school letter grades, because “[i]f there were more categories, the next [school letter grade] level would always be within reach….provide. This way all schools will have an incentive to improve, whereas currently only those who are at the high end of the B-­‐D categories have much incentive.” If only the real world of education worked as informed by simple idioms, like those simplifying the theories supporting incentives (e.g., the carrot just in front of the reach of the mule will make the mule draw the cart harder).

5. “Eliminate the first over-­ride provision in the teacher accountability system, which automatically places teachers who are “Ineffective” on either measure in the “Ineffective” performance category.” With this recommendation, I fully agree, as Louisiana is one of the most extreme states when it comes to attaching consequences to problematic data, although I don’t think Harris would agree with my “problematic” classification. But this would mean that “teachers who appear highly effective on one measure could not end up in the “Ineffective” category,” which for this state would certainly be a step in the right direction. Although Harris’s assertion that doing this would also help prevent principals from saving truly ineffective teachers (e.g., by countering teachers’ value-added scores with artificially inflated or allegedly fake observational scores), on behalf of principals as professionals, I find insulting.

6. “Commission a full-­scale third party evaluation of the entire accountability system focused on educator responses and student outcomes.” With this recommendation, I also fully agree under certain conditions: (1) the external evaluator is indeed external to the system and has no conflicts of interest, including financial (even prior to payment for the external review), (2) that which the external evaluator is to investigate is informed by the research in terms of the framing of the questions that need to be asked, (3) as also recommended by Harris, that perspectives of those involved (e.g., principals and teachers) are included in the evaluation design, and (4) all parties formally agree to releasing all data regardless of what (positive or negative) the external evaluator might evidence and find.

Harris’s additional details and “other, more modest recommendations” include the following:

  • Keep “value-­‐added out of the principal [evaluation] measure,” but the state should consider calculating principal value-­‐added measures and issuing reports that describe patterns of variation (e.g., variation in performance overall [in] certain kinds of schools) both for the state as a whole and specific districts.” This reminds me of the time that value-added measures for teachers were to be used only for descriptive purposes. While noble as a recommendation, we know from history what policymakers can do once the data are made available.
  • “Additional Teacher Accountability Recommendations” start on page 11 of this report, although all of these (unfortunately, again) focus on value-added model twists and tweaks (e.g., how to adjust for ceiling effects for schools and teachers with disproportionate numbers of gifted/high-achieving students, how to watch and account for bias) to make the teacher value-added model even better.

Harris concludes that “With these changes, Louisiana would have one of the best accountability systems in the country. Rather than weakening accountability, these recommendations [would] make accountability smarter and make it more likely to improve students’ academic performance.” Following these recommendations would “make the state a national leader.” While Harris cites 20 years of failed attempts in Louisiana and across all states across the country as the reason America’s public education system has not improved its public school students’ academic performance, I’d argue it’s more like 40 years of failed attempts because Harris’s (and so many others’) accountability-bent logic is seriously flawed.

Policy Idiocy in New York: Teacher Value-Added to Trump All Else

A Washington Post post, recently released by Valerie Strauss and written by award-winning (e.g., New York’s 2013 High School Principal of the Year) Carol Burris of South Side High School in New York, details how in New York their teacher evaluation system is “going from bad to worse.”

It seems that New York state’s education commissioner – John King – recently resigned, thanks to a new job working as a top assistant directly under US Secretary of Education Arne Duncan. But that is not where the “going from bad to worse” phrase applies.

Rather, it seems the state’s Schools Chancellor Merryl Tisch, with the support and prodding of New York Governor Andrew Cuomo, wants to take what was the state’s teacher evaluation system requirement that 20% of an educator’s evaluation be based on “locally selected measures of achievement,” to a system whereas teachers’ value-added as based on growth on the state’s (Common Core) standardized test scores will be set at 40%.

In addition, she (along with the support of prodding of Cuomo) is pushing for a system in which these scores would “trump all,” and in which a teacher rated as ineffective in the growth score portion would be rated ineffective overall. A teacher with two ineffective ratings would “not return to the classroom.”

This is not only “going from bad to worse,” but going from bad to idiotic.

All of this is happending despite the research studies that, by this point, should have literally buried such policy initiatives. Many of these research studies are highlighted on this blog here, here, here, and here, and are also summarized in two position statements on value-added models (VAMs) released by the American Statistical Association last April (to read about their top 10 concerns/warnings, click here) and the National Association of Secondary School Principals (NASSP) last month (to read about their top 6 concerns/warnings, click here).

All of this is also happening despite the fact that this flies in the face of the 2014 “Standards for Educational and Psychological Testing” also released this year by the leading professional organizations in the area of educational measurement, including the American Educational Research Association (AERA), American Psychological Association (APA), and National Council on Measurement in Education (NCME).

And all of this is happening despite the fact that teachers and principals in this state of New York already, and democratically, created sound evaluation plans to which the majority had already agreed, given the system they created to meet state and federal policy mandates was much more defensible, and much less destructive.

I, for one, find this policy idiocy infuriating.

To read the full post, one that is definitely worth a full read, click here.

Teacher Evaluation and Accountability Alternatives, for A New Year

At the beginning of December I posted a post about Diane Ravitch’s really nice piece published in the Huffington Post about what she views as a much better paradigm for teacher evaluation and accountability. Diane Ravitch posted another on similar alternatives, although this one was written by teachers themselves.

I thought this was more than appropriate, especially given a New Year is upon us, and while it might very well be wishful thinking, perhaps at least some of our state policy makers might be willing to think in new ways about what really could be new and improved teacher evaluation systems. Cheers to that!

The main point here, though, is that alternatives do, indeed, exist. Likewise, it’s not that teachers do not want to be held accountable for, and evaluated on that which they do, but they do want whatever systems are in place (formal or informal) to be appropriate, professional, and fair. How about that for policy-based resolution.

This is from Diane’s post: The Wisdom of Teachers: A New Vision of Accountability.

Anyone who criticizes the current regime of test-based accountability is inevitably asked: What would you replace it with? Test-based accountability fails because it is based on a lack of trust in professionals. It fails because it confuses measurement with instruction. No doctor ever said to a sick patient, “Go home, take your temperature hourly, and call me in a month.” Measurement is not a treatment or a cure. It is measurement. It doesn’t close gaps: it measures them.

Here is a sound alternative approach to accountability, written by a group of teachers whose collective experience is 275 years in the classroom. Over 900 teachers contributed ideas to the plan. It is a new vision that holds all actors responsible for the full development and education of children, acknowledging that every child is a unique individual.

Its key features:

  • Shared responsibility, not blame
  • Educate the whole child
  • Full and adequate funding for all schools, with less emphasis on standardized testing
  • Teacher autonomy and professionalism
  • A shift from evaluation to support
  • Recognition that in education one size does not fit all