A “concerned New Mexico parent” who wrote a prior post for this blog here, wrote another for you all below, about the sheer numbers of different teacher evaluation systems, or variations, now in place in his/her state of New Mexico. (S)he writes:
Readers of this blog are well aware of the limitations of VAMs for evaluating teachers. However, many readers may not be aware that there are actually many system variations used to evaluate teachers. In the state of New Mexico, for example, 217 different variations are used to evaluate the many and diverse types of teachers teaching in the state [and likely all other states].
But. Is there any evidence that they are valid? NO. Is there any evidence that they are equivalent? NO. Is there any evidence that this is fair? NO.
The New Mexico Public Education Department (NMPED) provides a framework for teacher evaluations, and the final teacher evaluation should be weighted as follows: Improved Student Achievement (50%), Teacher Observations (25%), and Multiple Measures (25%).
Every school district in New Mexico is required to submit a detailed evaluation plan of specifically what measures will be used to satisfy the overall NMPED 50-25-25 percentage framework, after which NMPED approves all plans.
The exact details of any district’s educator effectiveness plan can be found on the NMTEACH website, as every public and charter school plan is posted here.
There are massive differences between how groups of teachers are graded between districts, however, which distorts most everything about the system(s), including the extent to which similar (and different) teachers might be similarly (and fairly) evaluated and assessed.
Even within districts, there are massive differences in how grade level (elementary, middle, high school) teachers are evaluated.
And, even something as seemingly simple as evaluating K-2 teachers requires 42 different variations in scoring.
Table 1 below shows the number of different scales used to calculate teacher effectiveness for each group of teachers and each grade level, for example, at the state level.
New Mexico divides all teachers into three categories — group A teachers have scores based on the statewide test (mathematics, English/language arts (ELA)), group B teachers (e.g. music or history) do not have a corresponding statewide test, and group C teachers teach grades K-2. Table 1 shows the number of scales used by New Mexico school districts for each teacher group. It is further broken down by grade-level. For example, as illustrated, there are 42 different scales used to evaluate Elementary-level Group A teachers in New Mexico. The column marked “Unique (one-offs)” indicates the number of scales that are completely unique for a given teacher group and grade-level. For example, as illustrated, there are 11 unique scales used to grade Group B High School teachers, and for each of these eleven scales, only one district, one grade-level, and one teacher group is evaluated within the entire state.
Based on the size of the school district, a unique scale may be grading as few as a dozen teachers! In addition, there are 217 scales used statewide, with 99 of these scales being unique (by teacher)!
Table 1: New Mexico Teacher Evaluation System(s)
Group | Grade | Scales Used | Unique (one-offs) |
Group A (SBA-based) | All | 58 | 15 |
(e.g. 5^{th} grade English teacher) | Elem | 42 | 10 |
MS | 37 | 2 | |
HS | 37 | 3 | |
Group B (non-SBA) | All | 117 | 56 |
(e.g. Elem music teacher) | Elem | 67 | 37 |
MS | 62 | 8 | |
HS | 61 | 11 | |
Group C (grades K-2) | All | 42 | 28 |
Elem | 42 | 28 | |
TOTAL | 217 variants | 99 one-offs |
The table above highlights the spectacular absurdity of the New Mexico Teacher Evaluation System.
(The complete listings of all variants for the three groups are contained here (in Table A for Group A), here (in Table B for Group B), and here (in Table C for Group C). The abbreviations and notes for these tables are listed here (in Table D).
By approving all of these different formulas, all things considered, NMPED is also making the following nonsensical claims..
NMPED Claim: The prototype 50-25-25 percentage split has some validity.
There is no evidence to support this division between student achievement measures, observation, and multiple measures at all. It simply represents what NMPED could politically “get away with” in terms of a formula. Why not 60-20-20 or 57-23-20 or 46-18-36, etcetera? The NMPED prototype scale has no proven validity, whatsoever.
NMPED Claim: All 217 formulas are equivalent to evaluate teachers.
This claim by NMPED is absurd on its face and every other part of its… Is there any evidence that they have cross-validated the tests? There is no evidence that any of these scales are valid or accurate measures of “teacher effectiveness.” Also, there is no evidence whatsoever that they are equivalent.
Further, if the formulas are equivalent (as NMPED claims), why is New Mexico wasting money on technology for administering SBA tests or End-of-Course exams? Why not use an NMPED-approved formula that includes tests like Discovery, MAPS, DIBELS, or Star that are already being used?
NMPED Claim: Teacher Attendance and Student Surveys are interchangeable.
According to the approved plans, many districts assign 10% to Teacher Attendance while other districts assign 10% to Student Surveys. Both variants have been approved by NMPED.
Mathematically, (i.e., in terms of the proportions either is to be allotted) they appear to be interchangeable. If that is so, why is NMPED also specifically trying to enforce Teacher Attendance as an element of the evaluation scale? Why did Hanna Skandera proclaim to the press that this measure improved New Mexico education? (For typical news coverage, on this topic, for example, see here).
The use of teacher attendance appears to be motivated by union-busting rather than any mathematical rationale.
NMPED Claim: All observation methods are equivalent.
NMPED allows for three very different observation methods to be used for 40% of the final score. Each method is somewhat complicated and involves different observers.
There is no indication that NMPED has evaluated the reliability or validity of these three very different observation methods, or tested their results for equivalence. They simply assert that they are equivalent.
NMPED Claim: These formulas will be used to rate teachers.
These formulas are the worst kind of statistical jiggery-pokery (to use a newly current phrase). NMPED presents a seemingly rational, scientific number to the public using invalid and unvalidated mathematical manipulations and then determines teachers’ careers based on the completely bogus New Mexico teacher evaluation system(s).
Conclusion: Not only is the emperor naked, he has a closet containing 217 equivalent outfits at home!
The post states that there are massive differences in how grade level (elementary, middle, and high school) teachers are evaluated. Having different evaluation criteria for a kindergarten teacher and a high school teacher seem appropriate to me because the skills required to be an excellent kindergarten teacher are different from the skills required to be an excellent high school teacher.
The problem is not that different grade-levels are judged by different evaluation criteria.
Rather the problem is that 217 different formulas are used; all purporting to measure the same thing — teacher effectiveness. There is no evidence that even one of these formulas actually measures teacher effectiveness.
If you examine Table A and Table B, you will see many instances in which Elementary, Middle and High School teachers are all evaluated by the same formula. Any row of data that has numbers in all three Elem, MS and HS columns falls into this category.
If you examine Tables A, B and C, you can count 217 different formulas for grading teachers. Since teachers are stack-ranked and compared across districts and the state, this presents a major problem of equity.
Finally, if you look at Table C, you discover that a relatively homogeneous group of teachers (namely only K-2 teachers) still uses 42 different formulas to rate the teachers.
No doubt the skills for successful Elementary and High School teaching differ.
NMPED does not address this problem; they have simply created a menagerie of random formulas that supposedly measure teacher effectiveness.
Concerned Parent,
If having different criteria for different grades is not a problem, perhaps your original post would have skipped over it.
It seems to me that decentralized control over education inevitably results in differing standards of evaluation for teachers. They are all trying something different.
Perhaps if you were to explain the teacher evaluation method that has been shown to accurately measure teacher effectiveness you could convince the districts to use that evaluation method. It would also have been useful in your post so that we can see just how far these methods are from the systems that are backed by evidence.
The problem is neither local control nor departure from scientific norms.
First, all plans are approved and decreed as equivalent by NMPED. This is not a grass-roots or local movement.
Second, there is not one shred of evidence for the validity of any of these formulas.
According to the VAMboozled website analysis of the 45-million dollar Gates Foundation study on Measuring the Effectiveness of Teachers — “When the MET researchers studied the separate and combined effects of teacher observations, value-added test scores, and student surveys, they found correlations so weak that no common attribute or characteristic of teacher-quality could be found. Even with 45 million dollars and a crackerjack team of researchers, they could not define an “effective teacher.” (see http://vamboozled.com/the-2013-bunkum-awards-the-gates-foundations-45-met-studies/ or see the National Education Policy Center’s analysis at http://nepc.colorado.edu/thinktank/review-MET-final-2013)
Thus, my conclusion remains — the emperor remains naked, and there is no validity to the New Mexico Teacher Effectiveness Evaluations.
Concerned,
Again, it would be helpful to specify the valid methods of evaluating teachers so we can see how far off these methods are from the valid method(s).
The Albuquerque Journal is reporting that the New Mexico Public Education Department (NMPED) has revamped the evaluation system for Group B and new teachers.
Their evaluations will now be based 50% on observations rather than the previous reliance on student test scores. The student test scores will now only comprise only 25% of the evaluation score.
Group A teachers are unaffected. See the link for details:
http://www.abqjournal.com/622769/news/nm-teacher-evals-see-change.html
Are kindergarten teachers effectiveness based on student scores?
Not typically, but the MEAP tests (and perhaps others) are increasingly being used to evaluate teachers. It’s simply more difficult, however, to measure growth over time as Kindergarten is the base year.