Mapping America’s Teacher Evaluation Plans Under ESSA

One of my doctoral students — Kevin Close, one of my former doctoral students — Clarin Collins, and I just had a study published in the practitioner journal Phi Delta Kappan that I wanted to share out with all of you, especially before the study is no longer open-access or free (see the full article as currently available here). As the title of this post (which is the same as the title of the article) indicates, the study is about research the three of us conducted, by surveying every state (or interviewing leaders at every state’s department of education), about how each state’s changed their teacher evaluation systems post the passage of the Every Student Succeeds Act (ESSA).

In short, we found states have reduced their use of growth or value-added models (VAMs) within their teacher evaluation systems. In addition, states that are still using such models are using them in much less consequential ways, while many states are offering more alternatives for measuring the relationships between student achievement and teacher effectiveness. Additionally, state teacher evaluation plans also contain more language supporting formative teacher feedback (i.e., a noteworthy change from states’ prior summative and oft-highly consequential teacher evaluation systems). State departments of education also seem to be allowing districts to develop and implement more flexible teacher evaluation systems, with states simultaneously acknowledging challenges with being able to support increased local control, and localized teacher evaluation systems, especially when varied local systems present challenges with being able to support various local systems and compare data across schools and districts, in effect.

Again, you can read more here. See also the longer version of this study, if interested, here.

Litigating Algorithms, Beyond Education

This past June, I presented at a conference at New York University (NYU) called Litigating Algorithms. Most attendees were lawyers, law students, and the like, all of whom were there to discuss the multiple ways that they have collectively and independently been challenging governmental uses of algorithm-based, decision-making systems (i.e., like VAMs) across disciplines. I was there to present about how VAMs have been used by states and school districts in education, as well as present the key issues with VAMs as litigated via the lawsuits in which I have been engaged (e.g., Houston, New Mexico, New York, Tennessee, and Texas). The conference was sponsored by the AI Now Institute, also at NYU, which has as its mission to examine the social implications of artificial intelligence (AI), and in collaboration with the Center on Race, Inequality, and the Law, affiliated with the NYU School of Law.

Anyhow, they just released their report from this conference and I thought it important to share out with all of you, also in that it details the extent to which similar AI systems are being used across disciplines beyond education, and it details how such uses (misuses and abuses) are being litigated in court.

See the press release below, and see the full report here.

—–

Litigating Algorithms 2019 U.S. Report – New Challenges to Government Use of Algorithmic Decision Systems

Today the AI Now Institute and NYU Law’s Center on Race, Inequality, and the Law published new research on the ways litigation is being used as a tool to hold government accountable for using algorithmic tools that produce harmful results.

Algorithmic decision systems (ADS) are often sold as offering a number of benefits, from mitigating human bias and error, to cutting costs and increasing efficiency, accuracy, and reliability. Yet proof of these advantages is rarely offered, even as evidence of harm increases. Within health care, criminal justice, education, employment, and other areas, the implementation of these technologies has resulted in numerous problems with profound effects on millions of peoples’ lives.

More than 19,000 Michigan residents were incorrectly disqualified from food-assistance benefits by an errant ADS. A similar system automatically and arbitrarily cut Oregonians’ disability benefits. And an ADS falsely labeled 40,000 workers in Michigan as having committed unemployment fraud. These are a handful of examples that make clear the profound human consequences of the use of ADS, and the urgent need for accountability and validation mechanisms. 

In recent years, litigation has become a valuable tool for understanding the concrete and real impacts of flawed ADS and holding government accountable when it harms us. 

The Report picks up where our 2018 report left off, revisiting the first wave of U.S. lawsuits brought against government use of ADS, and examining what progress, if any, has been made.  We also explore a new wave of legal challenges that raise significant questions, including:

  1. What access, if any, criminal defense attorneys should have to law enforcement ADS in order to challenge allegations leveled by the prosecution; 
  2. The profound human consequences of erroneous or vindictive uses of governmental ADS; and 
  3. The evolution of the Illinois Biometric Information Privacy Act, America’s most powerful biometric privacy law, and what its potential impact on ADS accountability might be. 

This report offers concrete insights from actual cases involving plaintiffs and lawyers seeking justice in the face of harmful ADS. These cases illuminate many ways that ADS are perpetuating concrete harms, and the ways ADS companies are pushing against accountability and transparency.

The report also outlines several recommendations for advocates and other stakeholders interested in using litigation as a tool to hold government accountable for its use of ADS.

Citation: Richardson, R., Schultz, J. M., & Southerland, V. M. (2019). Litigating algorithms 2019 US report: New challenges to government use of algorithmic decision systems. New York, NY: AI Now Institute. Retrieved from https://ainowinstitute.org/litigatingalgorithms-2019-us.html

More on the VAM (Ab)Use in Florida

In my most recent post, about it being “Time to Organize in Florida” (see here), I wrote about how in Florida teachers were (two or so weeks after the start of the school year) being removed from teaching in Florida schools if their state-calculated, teacher-level VAM scores deemed them as teachers who “needed improvement” or were “unsatisfactory.” Yes – they were being removed from teaching in low performing schools IF their VAM scores, and VAM scores alone, deemed them as not adding value.

A reporter from the Tampa Bay Times with whom I spoke on this story just published his article all about this, titled “Florida’s ‘VAM Score’ for Rating Teachers is Still Around, and Still Hated” (see the full article here, and see the article’s full reference below). This piece captures the situation in Florida better than my prior post; hence, please give it a read.

Again, click here to read.

Full citation: Solochek, J. S. (2019). Florida’s ‘VAM score’ for rating teachers is still around, and still hated. Tampa Bay Times. Retrieved from https://www.tampabay.com/news/gradebook/2019/09/23/floridas-vam-score-for-rating-teachers-is-still-around-and-still-hated/

Time to Organize in Florida

A few weeks ago, a Florida reporter reached out to me for information about the nation’s value-added models (VAMs), but also as specific to the state of Florida. It seems that teachers in Florida were (and perhaps still are) being removed from teaching in Florida schools if their state-calculated, teacher-level VAM scores deemed them as teachers who “needed improvement” or were “unsatisfactory.”

More specifically, the state of Florida is using its state-level VAM to rate teachers’ VAM-based performance, using state exams in mathematics and language arts. If the teachers ultimately deemed in need of improvement or unsatisfactory teach in one of the state’s “turnaround” schools (i.e., a school that is required by the state to have a turnaround plan in place), those teachers are to be removed from the school and placed elsewhere. This is happening by state law, whereby the law dictates that no turnaround school may have a higher percentage of low value-added teachers than the district as a whole, which the state has apparently interpreted that to mean no low value-added teachers in these schools, at all.

Of course, some of the issues being raised throughout the state are not only about the VAMs themselves, as well as the teachers being displaced (e.g., two weeks or so after the school year resumed), but also about how all of this has caused other disruptions (e.g., students losing their teachers a few weeks after the beginning of the school year). Related, many principals have rejected these on-goings, expressly noting that they want to keep many if not most/all of the teachers being moved from their schools, as “valued” by them. I have also heard directly from a few Florida principals/school administrators about these same matters. See other articles about this here and here.

Hence, I’m writing this blog post to not only let others know about what is going on in Florida right now, despite the fact that the rest of the nation is (overall) taking some major steps back away from the uses (and abuses) of VAMs, especially in such high-stakes ways.

But I’m also writing this blog post to (hopefully) inspire those in Florida (including teachers, principals, etc.) to organize. Organize yourselves, perhaps with the assistance and guidance of your unions, professional organizations, legal groups (perhaps, also as affiliated), and the like. What is happening in Florida, as per state law, can very likely be legally challenged.

Overall, we (including many others in similar court cases in New Mexico, New York, and Texas) did quite well, overall, in the courts fighting the unjustifiable and indefensible uses of VAMs for similar purpose. Hence, I truly believe it is just a matter of time, with some organizing, that the teachers in Florida also realize some relief. There are also many of us out there who are more than ready and willing to help.

The Utility of Student Perception Surveys to Give Teachers Feedback : An Introduction to the My Teacher Questionnaire

This is another guest post for the followers of this blog.

In short, Rikkert van der Lans of the University of Groningen’s Department of Teacher Education, emailed me a few months ago about an article I published with one of my PhD students titled “Student perception surveys for K-12 teacher evaluation in the United States: A survey of surveys.” In this piece, he was interested in our review of the “many untested [student] questionnaires that are applied by schools [to evaluate teachers],” and “thought [I] might also be interested [in his and his colleagues’] work around the ‘My Teacher’ questionnaire.” Apparently, it has been applied globally across 15 different countries and, importantly, not only given it is research-based but also researched with psychometric characteristics actually warranting its use. Hence, I asked him to write a guest post, particularly for those of you who, post the U.S.’s passage of the Every Student Succeeds Act (ESSA; see prior posts about ESSA here and here), are looking to implement a researched/validated instrument for student evaluation purposes. Below is his post.

*****

Thank you Audrey Amrein-Beardsley for inviting me to write this blog post. I live in the Netherlands and despite living across the Atlantic, I recognize many of the issues identified by you and your coauthor with student surveys, including their increased use, their novelty, and the small knowledge base about how to use them (reliably and validly). In my writing, I mentioned our own validated survey: the “My Teacher” questionnaire (MTQ) which currently is in use in 15 countries,1 with English and Spanish2 versions also having been developed.

In many ways, the MTQ is similar to other survey instruments, which is a good thing but not much of a selling point. So, let me introduce some evidence of validity unique to the MTQ and related to the topics (1) formative feedback and (2) use of multiple measures. Unique to the MTQ is the evidence in support of an interpretation of scores in terms of teachers’ stage of development (for details see these publications 3, 4, 5). I have myself used the MTQ to give feedback (face-to-face) with over 200 teachers, and what they generally appreciate most of the MTQ is that the outcomes can tell them: 1) what they have already achieved (e.g., ”you are skilled in classroom management and in structuring front class explanations”); 2) where they are now (e.g., “your skill in interactive teaching methods is currently developing”), and; 3) what according to our evidence is the most logical next step for improvement (e.g., focus on training and/or ask advice from colleagues about how to promote classroom interaction on the subject matter, like collaborative group work or having student explain topics to each other).     

The MTQ has been developed to complement the International Comparative Analysis of Learning and Teaching (ICALT) observation instrument. The MTQ provides reliable information about teachers’ teaching quality 6, 7, but it is less sensitive to indicate lesson-to-lesson fluctuations in teaching quality. Therefore, the MTQ is valuable to set professional development goals for teachers, and it is advised to use the ICALT observation instrument to coach and train teachers.

The most recent evidence indicates that the MTQ can be used to inform ICALT observers 8. For example, if the MTQ outcomes in class A suggest that a beginning teacher A “is skilled in classroom management and is currently developing skill in front class explanation” then observers visiting teacher A in class A can be prompted to attend to issues with in front class explanations only. This type of use is only warranted when instruments are administered within the same class, however.

Both the MTQ and the ICALT instruments are freely available for use. The ICALT is published open access here 9. The MTQ is available upon request by mailing to directielo@rug.nl. Any questions can also be sent to this email address.

Again thank you for this opportunity. 

Rikkert van der Lans

University of Groningen

Department of Teacher Education

The Netherlands

Research gate profile: https://www.researchgate.net/profile/Rikkert_Van_Der_Lans2

LinkedIn profile: https://www.linkedin.com/in/rikkert-van-der-lans-986a2910/

Twitter: @RikkertvdLans

States’ Math and Reading Performance After the Implementation of School A-F Letter Grade Policies

It’s been a while! Thanks to the passage of the Every Student Succeeds Act (ESSA; see prior posts about ESSA here, here, and here), the chaos surrounding states’ teacher evaluation systems has exponentially declined. Hence, my posts have declined in effect. As I have written prior, this is good news!

However, there seems to be a new form of test-based accountability on the rise. Some states are now being pressed to move forward with school letter grade policies, also known as A-F policies that help states define and then label school quality, in order to better hold schools and school districts accountable for their students’ test scores. These reform-based policies are being pushed by what was formerly known as the Foundation for Excellence in Education, that was launched while Jeb Bush was Florida’s governor, and what has since been rebranded as ExcelinEd. With Jeb Bush still in ExcelinEd’s Presidential seat, the organization describes itself as a “501(c)(3) nonprofit organization focused on state education reform” that operates on approximately $12 million per year of donations from the Bill & Melinda Gates Foundation, Michael Bloomberg Philanthropies, the Walton Family Foundation, and the Pearson, McGraw-Hill, Northwest Evaluation Association, ACT, College Board, and Educational Testing Service (ETS) testing corporations, among others.

I happened to be on a technical advisory committee for the state of Arizona, advising the state board of education on its A-F policies, when I came to really understand all that was at play, including the politics at play. Because of this role, though, I decided to examine, with two PhD students — Tray Geiger and Kevin Winn — what was just put out via an American Educational Research Association (AERA) press release. Our study, titled “States’ Performance on NAEP Mathematics and Reading Exams After the Implementation of School Letter Grades” is currently under review for publication, but below are some of the important highlights as also highlighted by AERA. These highlights are especially critical for states currently or considering using A-F policies to also hold schools and school districts accountable for their students’ achievement, especially given these policies clearly (as per the evidence) do not work for that which they are intended.

More specifically, 13 states currently use a school letter grade accountability system, with Florida being the first to implement a school letter grade policy in 1998. The other 12 states, and their years of implementation are Alabama (2013), Arkansas (2012), Arizona (2010), Indiana (2011), Mississippi (2012), New Mexico (2012), North Carolina (2013), Ohio (2014), Oklahoma (2011), Texas (2015), Utah (2013), and West Virginia (2015). These 13 states have fared no better or worse than other states in terms of increasing student achievement on the National Assessment of Educational Progress (NAEP) – the nation’s report card, which is also widely considered the nation’s “best” test – post policy implementation. Put differently, we found mixed results as to whether there was a clear, causal relationship between implementation of an A-F accountability system and increased student achievement. There was no consistent positive or negative relationship between policy implementation and NAEP scores on grade 4 and grade 8 mathematics and reading.

More explicitly:

  • For NAEP grade 4 mathematics exams, five of the 13 states (38.5 percent) had net score increases after their A-F systems were implemented; seven states (53.8 percent) had net score decreases after A-F implementation; and one state (7.7 percent) demonstrated no change.
  • Compared to the national average on grade 4 mathematics scores, eight of the 13 states (61.5 percent) demonstrated growth over time greater than that of the national average; three (23.1 percent) demonstrated less growth; and two states (15.4 percent) had comparable growth.
  • For grade 8 mathematics exams, five of the 13 states (38.5 percent) had net score increases after their A-F systems were implemented, yet eight states (61.5 percent) had net score decreases after A-F implementation.
  • Grade 8 mathematics growth compared to the national average varied more than that of grade 4 mathematics. Six of the 13 states (46.2 percent) demonstrated greater growth over time compared to that of the national average; six other states (46.2 percent) demonstrated less growth; and one state (7.7 percent) had comparable growth.
  • For grade 4 reading exams, eight of the 13 states (61.5 percent) had net score increases after A-F implementation; three states (23.1 percent) demonstrated net score decreases; and two states (15.4 percent) showed no change.
  • Grade 4 reading evidenced a pattern similar to that of grade 4 mathematics in that eight of the 13 states (61.5 percent) had greater growth over time compared to the national average, while five of the 13 states (38.5 percent) had less growth.
  • For grade 8 reading, eight states (61.5 percent) had net score increases after their A-F systems were implemented; two states (15.4 percent) had net score decreases; and three states (23.1 percent) showed no change.
  • In grade 8 reading, states evidenced a pattern similar to that of grade 8 mathematics in that the majority of states demonstrated less growth compared to the nation’s average growth. Five of 13 states (38.5 percent) had greater growth over time compared to the national average, while six states (46.2 percent) had less growth, and two states (15.4 percent) exhibited comparable growth.

In sum, the NAEP data slightly favored A-F states on grade 4 mathematics and grade 4 reading; half of the states increased and half of the states decreased in achievement post A-F implementation on grade 8 mathematics; and a plurality of states decreased in achievement post A-F implementation on grade 8 reading. See more study details and results here.

In reality, how these states performed post-implementation is not much different from random, or a flip of the coin. As such, these results should speak directly to other states already, or considering, investing human and financial resources in such state-level, test-based accountability policies.

 

LA Times Value-Added Reporters: Where Are They Now

In two of my older posts (here and here), I wrote about the Los Angeles Times and its controversial move to solicit Los Angeles Unified School District (LAUSD) students’ test scores via an open-records request, calculate LAUSD teachers’ value-added scores themselves, and then publish thousands of LAUSD teachers’ value-added scores along with their “effectiveness” classifications on their Los Angeles Teacher Ratings website. They did this, repeatedly, since 2010, and they did this all despite the major research-based issues surrounding teachers’ value-added estimates (that hopefully followers of this blog know at least somewhat well).

This is also of frustration for me since the authors of the initial articles (Jason Strong and Jason Felch) contacted me back in 2011 regarding whether what they were doing was appropriate, valid, and fair. Despite about one hour’s worth of strong warnings against doing so, Felch and Song thanked me for my time and moved forward regardless. See also others’ concerns about them doing this here, here, here, and here, for example.

Well, Jason Strong now works as communications director for Eli Broad’s Great Public Schools Now, which has as its primary goal to grow charter schools and get 50% of LA students into charters (see here). Jason Felch was fired in 2014 for writing a story about unreported sexual assault violations at Occidental College, and having an “inappropriate relationship” with a source for this story (see here).

So Jason Song and Jason Felch humiliated thousands of LA teachers and possibly contributed to the suicide of one, fifth grade teacher Rigoberto Ruelas, who jumped off a bridge after they publicly labeled him mediocre.

What goes around, comes around…

“You Are More Than Your EVAAS Score!”

Justin Parmenter is a seventh-grade language arts teacher in Charlotte, North Carolina. Via his blog — Notes from the Chalkboard — he writes “musings on public education.” You can subscribe to his blog at the bottom of any of his blog pages, one of which I copied and pasted below for all of you following this blog (now at 43K followers!!).

His recent post is titled “Take heart, NC teachers. You are more than your EVAAS score!” and serves as a solid reminder of what teachers’ value-added scores (and namely in this case teachers’ Education Value-Added Assessment Scores (EVAAS)) cannot tell you, us, or pretty much anybody about anyone’s worth as a teacher. Do give it a read, and do give him a shout out by sharing this with others.

*****

Last night an email from the SAS corporation hit the inboxes of teachers all across North Carolina.  I found it suspicious and forwarded it to spam.

EVAAS is a tool that SAS claims shows how effective educators are by measuring precisely what value each teacher adds to their students’ learning.  Each year teachers board an emotional roller coaster as they prepare to find out whether they are great teachers, average teachers, or terrible teachers–provided they can figure out their logins.

NC taxpayers spend millions of dollars for this tool, and SAS founder and CEO James Goodnight is the richest person in North Carolina, worth nearly $10 billion.  However, over the past few years, more and more research has shown that value added ratings like EVAAS are highly unstable and are unable to account for the many factors that influence our students and their progress. Lawsuits have sprung up from Texas to Tennessee, charging, among other things, that use of this data to evaluate teachers and make staffing decisions violates teachers’ due process rights, since SAS refuses to reveal the algorithms it uses to calculate scores.

By coincidence, the same day I got the email from SAS, I also got this email from the mother of one of my 7th grade students:

Photos attached provided evidence that the student was indeed reading at the dinner table.

The student in question had never thought of himself as a reader.  That has changed this year–not because of any masterful teaching on my part, but just because he had the right book in front of him at the right time.

Here’s my point:  We need to remember that EVAAS can’t measure the most important ways teachers are adding value to our students’ lives.  Every day we are turning students into lifelong independent readers. We are counseling them through everything from skinned knees to school shootings.  We are mediating their conflicts. We are coaching them in sports. We are finding creative ways to inspire and motivate them. We are teaching them kindness and empathy.  We are doing so much more than helping them pass a standardized test at the end of the year.

So if you figure out your EVAAS login today, NC teachers, take heart.  You are so much more than your EVAAS score!

Learning from What Doesn’t Work in Teacher Evaluation

One of my doctoral students — Kevin Close — and I just had a study published in the practitioner journal Phi Delta Kappan that I wanted to share out with all of you, especially before the study is no longer open-access or free (see full study as currently available here). As the title indicates, the study is about how states, school districts, and schools can “Learn from What Doesn’t Work in Teacher Evaluation,” given an analysis that the two of us conducted of all documents pertaining to the four teacher evaluation and value-added model (VAM)-centered lawsuits in which I have been directly involved, and that I have also covered in this blog. These lawsuits include Lederman v. King in New York (see here), American Federation of Teachers et al. v. Public Education Department in New Mexico (see here), Houston Federation of Teachers v. Houston Independent School District in Texas (see here), and Trout v. Knox County Board of Education in Tennessee (see here).

Via this analysis we set out to comb through the legal documents to identify the strongest objections, as also recognized by the courts in these lawsuits, to VAMs as teacher measurement and accountability strategies. “The lessons to be learned from these cases are both important and timely” given that “[u]nder the Every Student Succeeds Act (ESSA), local education leaders once again have authority to decide for themselves how to assess teachers’ work.”

The most pertinent and also common issues as per these cases were as follows:

(1) Inconsistencies in teachers’ VAM-based estimates from one year to the next that are sometimes “wildly different.” Across these lawsuits, issues with reliability were very evident, whereas teachers classified as “effective” one year were either theorized or demonstrated to have around a 25%-59% chance of being classified as “ineffective” the next year, or vice versa, with other permutations also possible. As per our profession’s Standards for Educational and Psychological Testing, reliability should, rather, be observed whereby VAM estimates of teacher effectiveness are more or less consistent over time, from one year to the next, regardless of the type of students and perhaps subject areas that teachers teach.

(2) Bias in teachers’ VAM-based estimates were also of note, whereby documents suggested or evidenced that bias, or rather biased estimates of teachers’ actual effects does indeed exist (although this area was also of most contention and dispute). Specific to VAMs, since teachers are not randomly assigned the students they teach, whether their students are invariably more or less motivated, smart, knowledgeable, or capable can bias students’ test-based data, and teachers’ test-based data when aggregated. Court documents, although again not without counterarguments, suggested that VAM-based estimates are sometimes biased, especially when relatively homogeneous sets of students (i.e., English Language Learners (ELLs), gifted and special education students, free-or-reduced lunch eligible students) are non-randomly concentrated into schools, purposefully placed into classrooms, or both. Research suggests that this also sometimes happens regardless of the the sophistication of the statistical controls used to block said bias.

(3) The gaming mechanisms in play within teacher evaluation systems in which VAMs play a key role, or carry significant evaluative weight, were also of legal concern and dispute. That administrators sometimes inflate the observational ratings of their teachers whom they want to protect, while simultaneously offsetting the weight the VAMs sometimes carry was of note, as was the inverse. That administrators also sometimes lower teachers’ ratings to better align them with their “more objective” VAM counterparts were also at issue. “So argued the plaintiffs in the Houston and Tennessee lawsuits, for example. In those systems, school leaders appear to have given precedence to VAM scores, adjusting their classroom observations to match them. In both cases, administrators admitted to doing so, explaining that they sensed pressure to ensure that their ‘subjective’ classroom ratings were in sync with the VAM’s ‘objective’ scores.” Both sets of behavior distort the validity (or “truthfulness”) of any teacher evaluation system and are in violation of the same, aforementioned Standards for Educational and Psychological Testing that call for VAM scores and observation ratings to be kept separate. One indicator should never be adjusted to offset or to fit the other.

(4) Transparency, or the lack thereof, was also a common issue across cases. Transparency, which can be defined as the extent to which something is accessible and readily capable of being understood, pertains to whether VAM-based estimates are accessible and make sense to those at the receiving ends. “Not only should [teachers] have access to [their VAM-based] information for instructional purposes, but if they believe their evaluations to be unfair, they should be able to see all of the relevant data and calculations so that they can defend themselves.” In no case was this more legally pertinent than in Houston Federation of Teachers v. Houston Independent School District in Texas. Here, the presiding judge ruled that teachers did have “legitimate claims to see how their scores were calculated. Concealing this information, the judge ruled, violated teachers’ due process protections under the 14th Amendment (which holds that no state — or in this case organization — shall deprive any person of life, liberty, or property, without due process). Given this precedent, it seems likely that teachers in other states and districts will demand transparency as well.”

In the main article (here) we also discuss what states are now doing to (hopefully) improve upon their teacher evaluation systems in terms of using multiple measures to help to evaluate teachers more holistically. We emphasize the (in)formative versus the summative and high-stakes functions of such systems, and allowing teachers to take ownership over such systems in their development and implementation. I will leave you all to read the full article (here) for these details.

In sum, though, when rethinking states’ teacher evaluation systems, especially given the new liberties afforded to states via the Every Student Succeeds Act (ESSA), educators, education leaders, policymakers, and the like would do well to look to the past for guidance on what not to do — and what to do better. These legal cases can certainly inform such efforts.

Reference: Close, K., & Amrein-Beardsley, A. (2018). Learning from what doesn’t work in teacher evaluation. Phi Delta Kappan, 100(1), 15-19. Retrieved from http://www.kappanonline.org/learning-from-what-doesnt-work-in-teacher-evaluation/

Can More Teachers Be Covered Using VAMs?

Some researchers continue to explore the potential worth of value-added models (VAMs) for measuring teacher effectiveness. Not that I endorse the perpetual tweaking of this or twisting of that to explore how VAMs might be made “better” for such purposes, also given the abundance of decades research we now have evidencing the plethora of problems with using VAMs for such purposes, I do try to write about current events including current research published on this topic for this blog. Hence, I write here about a study researchers from Mathematica Policy Research released last month, about whether more teachers might be VAM-eligible (download the full study here).

One of the main issues with VAMs is that they can typically be used to measure the effects of only approximately 30% of all public school teachers. The other 70%, which sometimes includes entire campuses of teachers (e.g., early elementary and high school teachers) or teachers who do not teach the core subject areas assessed using large-scale standardized tests (e.g., mathematics and reading/language arts) cannot be evaluated or held accountable using VAM data. This is more generally termed an issue with fairness, defined by our profession’s Standards for Educational and Psychological Testing as the impartiality of “test score interpretations for intended use(s) for individuals from all [emphasis added] relevant subgroups” (p. 219). Issues of fairness arise when a test, or test-based inference or use impacts some more than others in unfair or prejudiced, yet often consequential ways.

Accordingly, in this study researchers explored whether VAMs can be used to evaluate teachers of subject areas that are only tested occasionally and in non-consecutive grade levels (e.g., science and social studies, for example, in grades 4 and 7 or 5 and 8) using teachers’ students’ other, consecutively administered subject area tests (i.e., mathematics and reading/language arts) can be used to help isolate teachers’ contributions to students’ achievement in said excluded subject areas. Indeed, it is true that “states and districts have little information about how value-added models [VAMs] perform in grades when tests in the same subject are not available from the previous year.” Yet, states (e.g., New Mexico) continue to do this without evidence that it works. This is also one point of contention in the ongoing lawsuit there. Hence, the purpose of this study was to explore (using state-level data from Oklahoma) how well doing this works, again, given the use of such proxy pretests “could allow states and districts to increase the number of teachers for whom value-added models [could] be used” (i.e., increase fairness).

However, researchers found that when doing just this (1) VAM estimates that do not account for a same-subject pretests may be less credible than estimates that use same-subject pretests from prior and adjacent grade levels (note that authors do not explicitly define what they mean by credible but infer the term to be synonymous with valid). In addition, (2) doing this may subsequently lead to relatively more biased VAM estimates, even more so than changing some other features of VAMs, and (3) doing this may make VAM estimates less precise, or reliable. Put more succinctly, using mathematics and reading/language arts as pretest scores to help measure (e.g., science and social studies) teachers’ value-added effects yields VAM estimates that are less credible (aka less valid), more biased, and less precise (aka less reliable).

The authors conclude that “some policy makers might interpret [these] findings as firm evidence against using value-added estimates that rely on proxy pretests [may be] too strong. The choice between different evaluation measures always involves trade-offs, and alternatives to value-added estimates [e.g., classroom observations and student learning objectives {SLOs)] also have important limitations.”

Their suggestion, rather, is for “[p]olicymakers [to] reduce the weight given to value-added estimates from models that rely on proxy pretests relative to the weight given to those of other teachers in subjects with pretests.” With all of this, I disagree. Using this or that statistical adjustment, or shrinkage approach, or adjusted weights, or…etc., is as I said before, at this point frivolous.

Reference: Walsh, E., Dotter, D., & Liu, A. Y. (2018). Can more teachers be covered? The accuracy, credibility, and precision of value-added estimates with proxy pre-tests. Washington DC: Mathematica Policy Research. Retrieved from https://www.mathematica-mpr.com/our-publications-and-findings/publications/can-more-teachers-be-covered-the-accuracy-credibility-and-precision-of-value-added-estimates