Florida’s VAM-Based Evaluation System Ruled “Unfair but Not Unconstitutional”

From Diane Ravitch’s blog comes an important update re: on the lawfulness of VAM-based systems that I want to be sure readers of this blog didn’t miss.

She writes: “A federal judge in Florida dismissed a lawsuit against the state evaluation system, declaring that it was unfair to rate teachers based on the scores of students they never taught but not unconstitutional.

The evaluation system may be stupid; it may be irrational; it may be unfair; but it does not violate the Constitution. So says the judge.

An article in the Florida Education Association newsletter described the ruling:

“The federal lawsuit, known as Cook v. Stewart, was filed last year by the FEA, the National Education Association and seven accomplished teachers and the local education associations in Alachua, Escambia and Hernando counties. The lawsuit challenged the evaluation of teachers based on the standardized test scores of students they do not teach or from subjects they do not teach. They brought suit against the Florida commissioner of education, the State Board of Education and the school boards of those three counties, who have implemented the evaluation system to comply with 2011’s Senate Bill 736.

“On Tuesday afternoon, U.S. District Judge Mark Walker dismissed FEA’s challenges to the portions of SB 736 that call for teachers to be evaluated based upon students and/or subjects the teachers do not teach, though he expressed reservations on the practice.

We are disheartened by the judge’s ruling. Judge Walker acknowledged the many problems with this evaluation system, but he ruled that they did not meet the standard to be declared unconstitutional. We are evaluating what further steps we might take in this legal process.

Judge Walker indicated his discomfort with the evaluation process in his order.

“The unfairness of the evaluation system as implemented is not lost on this Court,” he wrote. “We have a teacher evaluation system in Florida that is supposed to measure the individual effectiveness of each teacher. But as the Plaintiffs have shown, the standards for evaluation differ significantly. FCAT teachers are being evaluated using an FCAT VAM that provides an individual measurement of a teacher’s contribution to student improvement in the subjects they teach.” He noted that the FCAT VAM has been applied to teachers whose students are tested in a subject that teacher does not teach and to teachers who are measured on students they have never taught, writing that “the FCAT VAM has been applied as a school-wide composite score that is the same for every teacher in the school. It does not contain any measure of student learning growth of the … teacher’s own students.”

In his ruling, Judge Walker indicated there were other problems.

“To make matters worse, the Legislature has mandated that teacher ratings be used to make important employment decisions such as pay, promotion, assignment, and retention,” he wrote. “Ratings affect a teacher’s professional reputation as well because they are made public — they have even been printed in the newspaper. Needless to say, this Court would be hard-pressed to find anyone who would find this evaluation system fair to non-FCAT teachers, let alone be willing to submit to a similar evaluation system.”

“This case, however, is not about the fairness of the evaluation system,” Walker wrote. “The standard of review is not whether the evaluation policies are good or bad, wise or unwise; but whether the evaluation policies are rational within the meaning of the law. The legal standard for invalidating legislative acts on substantive due process and equal protection grounds looks only to whether there is a conceivable rational basis to support them,” even though this basis might be “unsupported by evidence or empirical data.”

American Statistical Association (ASA) Position Statement on VAMs

Inside my most recent post, about the Top 14 research-based articles about VAMs, there was a great research-based statement that was released just last week by the American Statistical Association (ASA), titled the “ASA Statement on Using Value-Added Models for Educational Assessment.”

It is short, accessible, easy to understand, and hard to dispute, so I wanted to be sure nobody missed it as this is certainly a must read for all of you following this blog, not to mention everybody else dealing/working with VAMs and their related educational policies. Likewise, this represents the current, research-based evidence and thinking of probably 90% of the educational researchers and econometricians (still) conducting research in this area.

Again, the ASA is the best statistical organization in the U.S. and likely one of if not the best statistical associations in the world. Some of the most important parts of their statement, taken directly from their full statement as I see them, follow:

  1. VAMs are complex statistical models, and high-level statistical expertise is needed to
    develop the models and [emphasis added] interpret their results.
  2. Estimates from VAMs should always be accompanied by measures of precision and a discussion of the assumptions and possible limitations of the model. These limitations are particularly relevant if VAMs are used for high-stakes purposes.
  3. VAMs are generally based on standardized test scores, and do not directly measure
    potential teacher contributions toward other student outcomes.
  4. VAMs typically measure correlation, not causation: Effects – positive or negative –
    attributed to a teacher may actually be caused by other factors that are not captured in the model.
  5. Under some conditions, VAM scores and rankings can change substantially when a
    different model or test is used, and a thorough analysis should be undertaken to
    evaluate the sensitivity of estimates to different models.
  6. VAMs should be viewed within the context of quality improvement, which distinguishes aspects of quality that can be attributed to the system from those that can be attributed to individual teachers, teacher preparation programs, or schools.
  7. Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions. Ranking teachers by their VAM scores can have unintended consequences that reduce quality.
  8. Attaching too much importance to a single item of quantitative information is counter-productive—in fact, it can be detrimental to the goal of improving quality.
  9. When used appropriately, VAMs may provide quantitative information that is relevant for improving education processes…[but only if used for descriptive/description purposes]. Otherwise, using VAM scores to improve education requires that they provide meaningful information about a teacher’s ability to promote student learning…[and they just do not do this at this point, as there is no research evidence to support this ideal].
  10. A decision to use VAMs for teacher evaluations might change the way the tests are viewed and lead to changes in the school environment. For example, more classroom time might be spent on test preparation and on specific content from the test at the exclusion of content that may lead to better long-term learning gains or motivation for students. Certain schools may be hard to staff if there is a perception that it is harder for teachers to achieve good VAM scores when working in them. Overreliance on VAM scores may foster a competitive environment, discouraging collaboration and efforts to improve the educational system as a whole.

Also important to point out is that included in the report the ASA makes recommendations regarding the “key questions states and districts [yes, practitioners!] should address regarding the use of any type of VAM.” These include, although they are not limited to questions about reliability (consistency), validity, the tests on which VAM estimates are based, and the major statistical errors that always accompany VAM estimates, but are often buried and often not reported with results (i.e., in terms of confidence
intervals or standard errors).

Also important is the purpose for ASA’s statement, as written by them: “As the largest organization in the United States representing statisticians and related professionals, the American Statistical Association (ASA) is making this statement to provide guidance, given current knowledge and experience, as to what can and cannot reasonably be expected from the use of VAMs. This statement focuses on the use of VAMs for assessing teachers’ performance but the issues discussed here also apply to their use for school or principal accountability. The statement is not intended to be prescriptive. Rather, it is intended to enhance general understanding of the strengths and limitations of the results generated by VAMs and thereby encourage the informed use of these results.”

Do give the position statement a read and use it as needed!

Correction: Make the “Top 13” VAM Articles the “Top 14”

As per my most recent post earlier today, about the Top 13 research-based articles about VAMs, low and behold another great research-based statement was just this week released by the American Statistical Association (ASA), titled the “ASA Statement on Using Value-Added Models for Educational Assessment.”

So, let’s make the Top 13 the Top 14 and call it a day. I say “day” deliberately; this is such a hot and controversial topic it is often hard to keep up with the literature in this area, on literally a daily basis.

As per this outstanding statement released by the ASA – the best statistical organization in the U.S. and one of if not the best statistical associations in the world – some of the most important parts of their statement, taken directly from their full statement as I see them, follow:

  1. VAMs are complex statistical models, and high-level statistical expertise is needed to
    develop the models and [emphasis added] interpret their results.
  2. Estimates from VAMs should always be accompanied by measures of precision and a discussion of the assumptions and possible limitations of the model. These limitations are particularly relevant if VAMs are used for high-stakes purposes.
  3. VAMs are generally based on standardized test scores, and do not directly measure
    potential teacher contributions toward other student outcomes.
  4. VAMs typically measure correlation, not causation: Effects – positive or negative –
    attributed to a teacher may actually be caused by other factors that are not captured in the model.
  5. Under some conditions, VAM scores and rankings can change substantially when a
    different model or test is used, and a thorough analysis should be undertaken to
    evaluate the sensitivity of estimates to different models.
  6. VAMs should be viewed within the context of quality improvement, which distinguishes aspects of quality that can be attributed to the system from those that can be attributed to individual teachers, teacher preparation programs, or schools.
  7. Most VAM studies find that teachers account for about 1% to 14% of the variability in test scores, and that the majority of opportunities for quality improvement are found in the system-level conditions. Ranking teachers by their VAM scores can have unintended consequences that reduce quality.
  8. Attaching too much importance to a single item of quantitative information is counter-productive—in fact, it can be detrimental to the goal of improving quality.
  9. When used appropriately, VAMs may provide quantitative information that is relevant for improving education processes…[but only if used for descriptive/description purposes]. Otherwise, using VAM scores to improve education requires that they provide meaningful information about a teacher’s ability to promote student learning…[and they just do not do this at this point, as there is no research evidence to support this ideal].
  10. A decision to use VAMs for teacher evaluations might change the way the tests are viewed and lead to changes in the school environment. For example, more classr
    oom time might be spent on test preparation and on specific content from the test at the exclusion of content that may lead to better long-term learning gains or motivation for students. Certain schools may be hard to staff if there is a perception that it is harder for teachers to achieve good VAM scores when working in them. Overreliance on VAM scores may foster a competitive environment, discouraging collaboration and efforts to improve the educational system as a whole.

Also important to point out is that included in the report the ASA makes recommendations regarding the “key questions states and districts [yes, practitioners!] should address regarding the use of any type of VAM.” These include, although they are not limited to questions about reliability (consistency), validity, the tests on which VAM estimates are based, and the major statistical errors that always accompany VAM estimates, but are often buried and often not reported with results (i.e., in terms of confidence
intervals or standard errors).

Also important is the purpose for ASA’s statement, as written by them: “As the largest organization in the United States representing statisticians and related professionals, the American Statistical Association (ASA) is making this statement to provide guidance, given current knowledge and experience, as to what can and cannot reasonably be expected from the use of VAMs. This statement focuses on the use of VAMs for assessing teachers’ performance but the issues discussed here also apply to their use for school or principal accountability. The statement is not intended to be prescriptive. Rather, it is intended to enhance general understanding of the strengths and limitations of the results generated by VAMs and thereby encourage the informed use of these results.”

If you’re going to choose one article to read and review, this week or this month, and one that is thorough and to the key points, this is the one I recommend you read…at least for now!

One of the First (of X) Lawsuits to Come?

From another blog post titled, “TEA files lawsuit on behalf of Knox County teacher:

“Tennessee Education Association has filed a lawsuit on the behalf of a Knox County teacher who believes she was unfairly denied a bonus as a result of her value-added test scores. The statewide organization expects this TVAAS (Tennessee Value-Added Assessment System) lawsuit to be the first of many as more districts tie high-staked decisions to  students’ achievement and growth scores on standardized tests.

Knox County Lisa Trout was denied a bonus after her value-added score was calculated. “After being told she would receive the system-wide TVAAS estimate because of her position in an alternative school, a guidance counselor incorrectly claimed 10 of Ms. Trout’s students for her TVAAS score without her knowledge,” said TEA general counsel Richard Colbert in a press release.  ”As a result, Ms. Trout ultimately received a lower TVAAS estimate than she should have and was denied the APEX bonus she had earned.”

TEA’s lawsuit also contests the arbitrariness TVAAS estimates that use test results of only a small segment of a teacher’s students to estimate her overall effectiveness.

“Ms. Trout’s situation illustrates the fundamental problem with using statistical estimates for high-stakes decisions that affect teacher pay,” Colbert said. “Her case raises great concerns over the constitutionality of such practices.”

A Florida Media Arts Teacher on Her “VAM” Score

A Media Arts Teacher from the state of Florida wrote a piece for The Washington Post – The Answer Sheet by Valerie Strauss about her VAM score recently publicly released, even though she is a media arts teacher and does not teach the subject areas and many of the students tested and whose test scores are being used to hold her accountable.

Bizarre, right? Not really, as this too is a reality facing many teachers who teach out-of-subject areas, or more specifically subject areas that “don’t count,” and who teach students sometimes a lot yet sometimes never. They are being assigned “school-level” VAM scores, and these estimates regardless of their actual contributions are being used to make consequential decisions (e.g., in this case, about her merit pay).

She writes about “What it feels like to be evaluated on test scores of students I don’t have,” noting, more specifically, about what others “need to know about [her] VAM score.” For one, she writes, “As a media specialist, [her] VAM is determined by the reading scores of all the students in [her] school, whether or not [she] teach[es] them. [Her] support of the other academic areas is not reflected in this number.” Secondly, she writes, “Like most teachers, [she has] no idea what [her] score means. [She] know[s] that [her] VAM is related to school-wide reading scores but [she] do[es]n’t understand how it’s calculated or exactly what data is [sic] used. This number does not give [her] feedback about what [she] did for [her] students to support their academic achievement last year or how to improve [her] instruction going forward.” She also writes about issues with her school being evaluated differently from the state system given they are involved in a Gates Foundation grant, and she writes about her concerns about the lack of consistency in teacher-level scores over time, as based on her knowledge of the research. See the full article linked again here to read more.

Otherwise, she concludes with what a very real question, also being faced by many. She writes, “[W]hy do I even care about my VAM score? Because it counts. My VAM score is a factor in determining if I am eligible for a merit pay bonus, whether I have a job in the next few years, and how many times I’ll be evaluated this year.” Of course, she cares as she and many others are being forced to care about their professional livelihoods out from under a system that is out of her control, out of thousands of teachers’ control, and in so many ways just simply out of control.

See also what she has to offer in terms of what she frames as a much better evaluation system, that would really take into account her effectiveness and the numbers that are certainly much more indicative of her as a teacher. These are the numbers, if we continue to fixate on the quantification of effectiveness, that in all actuality should count.

 

 

ASU Regents’ Professor Emeritus David Berliner at Vergara v. California

As you (hopefully) recall from a prior post, nine “students” from the Los Angeles School District are currently suing the state of California “arguing that their right to a good education is [being] violated by job protections that make it too difficult to fire bad [teachers].” This case is called Vergara v. California, and it is meant to challenge “the laws that handcuff schools from giving every student an equal opportunity to learn from effective teachers.” Behind these nine students stand a Silicon Valley technology magnate (David Welch), who is financing the case and an all-star cast of lawyers, and Students Matter, the organization founded by said Welch.

This past Tuesday (March 18, 2014 – “Vergara Trial Day 28“), David C. Berliner, Regents’ Professor Emeritus here at Arizona State University (ASU), who also just happens to be my forever mentor and academic luminary, took the stand. He spoke, primarily, about the out-of-school factors that impact student performance in schools and how this impacts and biases all estimates based on test scores (often regardless of the controls uses – see a most recent post about this evidence of bias here).

As per a recent post by Diane Ravitch (thanks to an insider at the trial) Berliner said:

“The public and politicians and parents overrate the in-school effects on their children and underrate the power of out-of-school effects on their children.” He noted that in-school factors account for just 20 percent of the variation we see in student achievement scores.

He also discussed value-added models and the problems with solely relying on these models for teacher evaluation. He said, “My experience is that teachers affect students incredibly. Probably everyone in this room has been affected by a teacher personally. But the effect of the teacher on the score, which is what’s used in VAM’s, or the school scores, which is used for evaluation by the Feds — those effects are rarely under the teacher’s control…Those effects are more often caused by or related to peer-group composition…”

Now, Students Matter has taken an interesting (and not surprising) take on Berliner’s testimony (given their own slant/biases given their support of this case), which can also be found at Vergara Trial Day 28. But please read this with caution as the author(s) of this summary, let’s say, twisted some of the truths in Berliner’s testimony.

Berliner’s reaction? “Boy did they twist it. Dirty politics.” Hmm…

Data Quality and VAM Evaluations, by “A Concerned New Mexico Parent”

“A Concerned New Mexico Parent” wrote in the following. Be sure to follow along, as this parent demonstrates some of the very real, real-world issues with the data that are being used to calculate VAMs, given their missingness, incompleteness, arbitrariness, and the like. Yet, VAMs “spit out” mathematical estimates that, because they are based on advanced statistics are to be trusted, yet the resultant “estimates” mask all of the chaos (demonstrated below) behind the sophisticated statistical scene.

(S)he writes:

As a parent, I have been concerned with New Mexico’s implementation of the trifecta of bad education policies – stack-ranking of teachers, the Common Core curriculum, and the use of value-added models (VAMs) for teacher evaluation.

The Vamboozled blog has done a great job in educating teachers and the public about the pitfalls of the VAM approach to teacher evaluations. The recent blog post by an Arizona teacher about her “value-added” contribution caused me to investigate the issue closer to home.

Recently Hanna Skandera (the acting head of the New Mexico Public Education Department and a Jeb Bush protégé) recently posted information on how data are to be handled for the New Mexico VAM system. Click here to review Skandera’s post.

With these two internet postings in mind, I decided to investigate the quality of data available for evaluating teachers at my local school.

According to researchers (and basic common sense), the more data you have, the better. According to the VAM literature, everyone (critics and proponents alike) seems to agree that at least three years of test scores are the minimum necessary for a legitimate [valid] calculation. We should have that much data per teacher as well.

As you will soon see, even this seemingly simple requirement is often not met in a real life school.

To calculate any VAM-type score, data are needed. Specifically, the underlying student test scores and teacher data on which everything else depends are crucial.

One type of “problem data” are data that scramble several people together into one number. For example, if you and I team-teach a class of students, it is not possible to tell how much I contributed versus how much you contributed to the final student score [regardless of what the statisticians say they can do in terms of fractional effects]. Anyone who has ever worked on a team project knows that not everyone contributes equally. Any division of score is purely arbitrary (aka “made up”) and indefensible. This is the situation referenced by the Arizona teacher who loses her students to math tutors for months at a time.

A second type of “problem data” are data that change kind mid-stream. Suppose we have a teacher who teaches 3rd grade one year and is then switched to 6th grade the next. No one who has taught would ever believe that teaching 3rd graders is the same as teaching 6th graders. Just mentally imagine using a 3rd grade approach with 6th grade boys, and I think you can see the problems. So, any teacher who switches grades is likely to have questionable data if all of her scores are considered the same.

A third type of “problem data” are not really a problem but are simply problematic given the absence of information. Teachers who leave the school, are missing data for certain students [which occurs in non-random patterns], etc. often do not have data to support accurate VAM calculations.

Finally, a final type of “problem data” are data that are limited by too few observations. A teacher who has exactly one-year experience would have only one set of test scores, yet this is too few for any meaningful calculation. The New Mexico approach, as explained in the Skandera posting, then, is to “fill in” the data with surrogate [observational] data.

If one is presented with two observations – why not just use the surrogate data only? We already have teacher observations and evaluations without the added expense of VAM calculations with specialized software. And how does using these less precise data help a VAM become valid? It would be like the difference between taking your child’s temperature with a precise in-ear thermometer or simply putting the back of your hand against their cheek. Both measurements can probably tell you whether your child is sick, but both are not equally accurate.

Regardless, it appears that if we want to have a good statistical calculation, we need to make sure we have the following:

  1. At least three years of teaching data.
  2. No team teaching or sharing of classes or students.
  3. No grade changing (the most recent three years should be at the same grade)
  4. Data must include the current year.

With these four very modest assumptions for ensuring at least minimally good data, how do we fare in real life?

I decided to chart the information of my local school, in light of the VAM data requirements, and I was shocked by what I found.

The results of real world teacher churn are shown below. Each line shows a teacher, their grade-level, the number of years teaching at that grade level, and their data status for a VAM calculation in school-year 2013-2014.

The chart includes all teachers from one school in grades 3 through 6. These are the only grades that currently take the New Mexico state-wide standardized test. The data are for the time period Aug 2010 to Feb 2014.

The “Teacher” and “Grade” columns are self-explanatory. The “Years Active” column shows the dates when the teacher taught at the school. The “Years at Same Grade Level” column shows the number of years a teacher has taught at a consistent grade level.

The “Data Status” column explains briefly why the teachers’ data may be invalid. More specifically in the “Data status” column, “Team teacher” means that the teacher shares duties with another teacher for the same set of students; “No longer teaching” means the teacher may have taught 3rd-6th grades in the past but is not currently teaching any of these grades; “Insufficient data” means the teacher has taught less than three years and does not have sufficient data for valid statistical calculations; “Data not current” means the teacher has taught 3rd-6th grade but not in the year of the VAM calculation; “Grade change” means the teacher changed grade levels sometime during the 2010-2014 school years; and “Valid data” means the data for the teacher appears to be reliable.

As you might guess, all of the teachers’ names have been changed; all of the other data are correct.

 Table of Teacher Data Quality

Teacher    (n=28)

Grade

Years Active

Years at Same Grade Level

Data Status

Govan
3
2010-2014
4
Team teacher
Grubb
3
2010-2014
4
Team teacher
Durling
3
2010-2014
4
Team teacher
Jen
3
2010-2012
2
No longer teaching 3rd-6th
Bohanon
3
2012-2014
2
Insufficient data
Mcanulty
3
2013-2014
1
Insufficient data
Saum
4
2010-2012
2
No longer teaching 3rd-6th
Wirtz
4
2010-2013
3
No longer teaching 3rd-6th
Mccaslin
4
2010-2011
1
No longer teaching 3rd-6th
Finamore
4
2010-2012
2
No longer teaching 3rd-6th
Sharrow
Sharrow
4
5
2011-2012
2012-2014
1
2
Grade change
Insufficient data
Kime
4
2012-2014
2
Insufficient data
Blish
4
2012-2014
2
Insufficient data
Obregon
4
2012-2014
2
Insufficient data
Fraise
4
2013-2014
1
Insufficient data
Franzoni
Franzoni
Franzoni
4
5
6
2013-2014
2010-2013
2012-2013
1
3
1
Grade change
Grade change
Grade change
Henderson
5
2010-2012
2
Insufficient data
Regan
5
2010-2014
4
Team teacher
Kalis
5
2011-2013
2
No longer teaching 3rd-6th
Combest
5
2013-2014
1
Insufficient data
Meister
5
2013-2014
1
Insufficient data
Treacy
6
2010-2011
1
No longer teaching 3rd-6th
Sprayberry
6
2010-2014
4
Team teacher
Locust
6
2010-2011
1
No longer teaching 3rd-6th
Condron
6
2011-2014
3
Valid data
Monteiro
6
2011-2014
3
Team teacher
Arnwine
6
2011-2012
1
No longer teaching 3rd-6th
Sebree
6
2013-12014
1
Insufficient data

As demonstrated in the Table above, 28 teachers taught 65 classes of 3rd through 6th grade. Note that two teachers (Sharrow and Franzoni) taught several grades during this time period. Remember, these are the only grades (3rd – 6th) that are given the New Mexico standardized test.

Once we exclude questionable data, we are reduced to exactly one (1/28 = 3.6%) solitary teacher (Condron) who taught three classes of 6th graders during the last four years for whom VAM data were available. All of the other data required for the VAM calculation would be provided by surrogate data.

According to the Skandera posting, the missing data would be replaced with the much more subjective and less reliable observational data. It is unclear how the “team teacher” data would be assessed. There is no scientific means to disaggregate the results for the two teachers. Any assignment of “value-added” for team teachers would be [almost] completely arbitrary [as typically based on teacher self-reported proportions of time taught].

Thus, we can see that for the 28 teachers listed in the above table, the calculations would then be based on the following data substitutions:

Table of Surrogate Data Substitutions

Data Status

Number of Teachers

Surrogate Data Information

Notes on Data Quality

No longer teaching 3rd-6th
9 (32.1%)
Excluded from calculation
Teachers that no longer teach the students in question would be excluded from the VAM calculation.
Team teacher
6 (21.4%)
Arbitrary division of credit for VAM score
Arbitrary (aka “self-reported”) division of responsibility
Insufficient data (missing one year of test data)
6 (21.4%)
The missing data would be replaced by observational data.
These teachers would have 1/3 of their VAM calculation based on observational data.
Insufficient data (missing two years of test data)
5 (17.9%)
The missing data would be replaced by observational data.
These teachers would have 2/3 of their VAM calculation based on observational data.
Grade change
1 (3.6%)
One teacher (Franzoni) is still teaching 3rd-6th after a grade change.
Either her 4th and 5th grade data would be combined in some fashion, or the data for her most recent teaching (6th grade) would be based on two years of observational data. In either case, the quality of her data would be very suspect.
Valid data
1 (3.6%)
There are at least three years of valid data.
The data would be valid for the VAM calculation.

Again, only one teacher, as demonstrated, has valid data. I can guarantee that these major problems with the integrity or quality of these data were NOT be publicized when the teacher VAM scores were released. Teachers were still held responsible, regardless.

Remember that for all of the other teachers on campus (K-2, computer, fine arts, PE, etc.), their VAM scores would be derived from the campus average. This average, as well, would be contaminated by the very problematic data revealed in these tables.

Does this approach of calculating VAM scores with such questionable and shaky data seem fair or equitable to the teachers or to this school? I believe not.

An AZ Teacher’s Perspective on Her “Value-Added”

This came to me from a teacher in my home state – Arizona. Read not only what is becoming a too familiar story, but also her perspective about whether she is the only one who is “adding value” (and I use that term very loosely here) to her students’ learning and achievement.

She writes:

Initially, the focus of this note was going to be my 6-year long experience with a seemingly ever-changing educational system.  I was going to list, with some detail, all the changes that I have seen in my brief time as a K-6 educator, the end-user of educational policy and budget cuts.  Changes like (in no significant order):

  • Math standards (2008?)
  • Common Core implementation and associated instructional shifts (2010?)
  • State accountability system (2012?)
  • State requirements related to ELD classrooms (2009?)
  • Teacher evaluation system (to include a new formula of classroom observation instrument and value-added measures) (2012-2014)
  • State laws governing teacher evaluation/performance, labeling and contracts (2010?)

have happened in a span of, not much more than, three years. And all these changes have happened against a backdrop of budget cuts severe enough to, in my school district, render librarians, counselors, and data coordinators extinct.  In this note, I was going to ask, rhetorically: “What other field or industry has seen this much change this quickly and why?” or “How can any field or industry absorb this much change effectively?”

But then I had a flash of focus just yesterday during a meeting with my school administrators, and I knew immediately the simple message I wanted to relay about the interaction of high-stakes policies and the real world of a school.

At my school, we have entered what is known as “crunch time”—the three-month long period leading up to state testing.  The purpose of the meeting was to roll out a plan, commonly used by my school district, to significantly increase test scores in math via a strategy of leveled grouping. The plan dictates that my homeroom students will be assigned to groups based on benchmark testing data and will then be sent out of my homeroom to other teachers for math instruction for the next three months. In effect, I will be teaching someone else’s students, and another teacher will be teaching my students.

But, wearisomely, sometime after this school year, a formula will be applied to my homeroom students’ state test scores in order to determine close to 50% of my performance. And then another formula (to include classroom observations) will be applied to convert this performance into a label (ineffective, developing, effective, highly effective) that is then reported to the state.  And so my question now is (not rhetorically!), “Whose performance is really being measured by this formula—mine or the teachers who taught my students math for three months of the school year?” At best, professional reputations are at stake–at worse, employment is.