A new report on current teacher evaluation systems throughout the US was just released by the Network for Public Education. The report is titled, “Teachers Talk Back: Educators on the Impact of Teacher Evaluation,” and below are their findings, followed by a condensed version of their six recommendations (as taken from the Executive Summary, although you can read the full 17-page report, again, here):
FINDINGS
Teachers and principals believe that evaluations based on student test scores, especially Value Added Measures (VAMs), are neither valid nor reliable measures of their work. They believe that VAM scores punish teachers who work with the most vulnerable students. Of the respondents, 83% indicated that the use of test scores in evaluations has had a negative impact on instruction, and 88% said that more time is spent on test prep than ever before. Evaluations based on frameworks and rubrics, such as those created by Danielson and Marzano, have resulted in wasting far too much time. This is damaging the very work evaluation is supposed to improve, as valuable time is diverted to engage in related compliance exercises and paperwork. Of the respondents, 84% reported a significant increase in teacher time spent on evaluations.
The emphasis on improving test scores has overwhelmed every aspect of teachers’ work, forcing them to spend precious collaborative time poring over student data rather than having conversations about students and instruction. Sixty-six percent of respondents reported a negative impact on relationships with their students as a result of the pressure to focus on test scores.
Over half of the respondents (52.08%) reported witnessing evidence of bias against veteran educators. This supports evidence that evaluations are having a disparate impact, contributing to a decline in teachers of color, veteran teachers, and those serving students in poverty. A recent study (ASI, 2015) found that changes to evaluation practices have coincided with a precipitous drop in the number of black teachers in nine major cities.
Teacher professional development tied to the evaluation process is having a stifling effect on teachers, by undermining their sense of autonomy, and limiting their capacity for real professional growth. 85% of respondents indicated that high quality professional development is not connected to their evaluations, and 84% reported a negative effect on conversations between teachers and supervisors. Collegial relationships have also been affected, with 81% of respondents reporting negative changes in conversations with colleagues.
SIX RECOMMENDATIONS
We recommend an immediate halt to the use of test scores as any part of teacher evaluation.
We recommend that teacher collaboration not be tied to evaluation but instead be a teacher-led cooperative process that focuses on their students’ and their own professional learning.
We recommend that the observation process focus on improving instruction—resulting in reflection and dialogue between teacher and observer—the result should be a narrative, not a number.
We recommend that evaluations require less paperwork and documentation so that more time can be spent on reflection and improvement of instruction.
We recommend an immediate review of the impact that evaluations have had on teachers of color and veteran teachers.
We recommend that teachers not be “scored” on professional development activities nor that professional development be dictated by evaluation scores rather than teacher needs.
Again, to read more, please see the full article, as also cited here: The Network for Public Education. (2020). Teachers talk back: Educators on the impact of teacher evaluation. https://networkforpubliceducation.org/6468/
David Berliner and Gene Glass, both dear mentors of mine while I was a PhD student at Arizona State University (ASU) and beyond, are scholar-leaders in the area of educational policy, also with specializations in test-based policies and test uses, misuses, and abuses. Today, via Diane Ravitch’s blog, they published their “thoughts about the value of annual testing in 2021.” Pasted below and also linked to here is their “must read!”
Gene V Glass David C. Berliner
At a recent Education Writers Association seminar, Jim Blew, an
assistant to Betsy DeVos at the Department of Education, opined that the
Department is inclined not to grant waivers to states seeking
exemptions from the federally mandated annual standardized achievement
testing. States like Michigan, Georgia, and South Carolina were seeking a
one year moratorium. Blew insisted that “even during a pandemic [tests]
serve as an important tool in our education system.” He said that the
Department’s “instinct” was to grant no waivers. What system he was
referring to and important to whom are two questions we seek to unravel
here.
Without question, the “system” of the U.S. Department of Education
has a huge stake in enforcing annual achievement testing. It’s not just
that the Department’s relationship is at stake with Pearson Education,
the U.K. corporation that is the major contractor for state testing,
with annual revenues of nearly $5 billion. The Department’s image as a
“get tough” defender of high standards is also at stake. Pandemic be
damned! We can’t let those weak kneed blue states get away with covering
up the incompetence of those teacher unions.
To whom are the results of these annual testings important? Governors? District superintendents? Teachers?
How the governors feel about the test results depends entirely on
where they stand on the political spectrum. Blue state governors praise
the findings when they are above the national average, and they call for
increased funding when they are below. Red state governors, whose
state’s scores are generally below average, insist that the results are a
clear call for vouchers and more charter schools – in a word, choice.
District administrators and teachers live in fear that they will be
blamed for bad scores; and they will.
Fortunately, all the drama and politicking about the annual testing
is utterly unnecessary. Last year’s district or even schoolhouse average
almost perfectly predicts this year’s average. Give us the average
Reading score for Grade Three for any medium or larger size district for
the last year and we’ll give you the average for this year within a
point or two. So at the very least, testing every year is a waste of
time and money – money that might ultimately help cover the salary of
executives like John Fallon, Pearson Education CEO, whose total
compensation in 2017 was more than $4 million.
But we wouldn’t even need to bother looking up a district’s last
year’s test scores to know where their achievement scores are this year.
We can accurately predict those scores from data that cost nothing. It
is well known and has been for many years – just Google “Karl R. White”
1982 – that a school’s average socio-economic status (SES) is an
accurate predictor of its achievement test average. “Accurate” here
means a correlation exceeding .80. Even though a school’s racial
composition overlaps considerably with the average wealth of the
families it serves, adding Race to the prediction equation will improve
the prediction of test performance. Together, SES and Race tell us much
about what is actually going on in the school lives of children: the
years of experience of their teachers; the quality of the teaching
materials and equipment; even the condition of the building they attend.
Don’t believe it? Think about this. In a recent year the free and reduced lunch rate (FRL) at the 42 largest high schools in Nebraska was correlated with the school’s average score in Reading, Math, and Science on the Nebraska State Assessments. The correlations obtained were FRL & Reading r = -.93, FRL & Science r = -.94, and FRL & Math r = -.92. Correlation coefficients don’t get higher than 1.00.
If you can know the schools’ test scores from their poverty rate, why give the test?
In fact, Chris Tienken answered that very question in New Jersey.
With data on household income, % single parent households, and parent
education level in each township, he predicted a township’s rates of
scoring “proficient” on the New Jersy state assessment. In Maple Shade
Township, 48.71% of the students were predicted to be proficient in
Language Arts; the actual proficiency rate was 48.70%. In Mount
Arlington township, 61.4% were predicted proficient; 61.5% were actually
proficient. And so it went. Demographics may not be destiny for
individuuals, but when you want a reliable, quick, inexpensive estimate
of how a school, township, or district is doing in terms of their
achievement scores on a standardized test of acheievement, demographics
really are destiny, until governments at many levels get serious about
addressing the inequities holding back poor and minority schools!
There is one more point to consider here: a school can more easily
“fake” its achievement scores than it can fake its SES and racial
composition. Test scores can be artificially raised by paying a test
prep company, or giving just a tiny bit more time on the test, looking
the other way as students whip out their cell phones during the test, by
looking at the test before hand and sharing some “ideas” with students
about how they might do better on the tests, or examining the tests
after they are given and changing an answer or two here and there. These
are not hypothetical examples; they go on all the time.
However, don’t the principals and superintendents need the test data
to determine which teachers are teaching well and which ones ought to be
fired? That seems logical but it doesn’t work. Our colleague Audrey
Amrein Beardsley and her students have addressed this issue in detail on
the blog VAMboozled. In just one study, a Houston teacher was compared
to other teachers in other schools sixteen different times over four
years. Her students’ test scores indicated that she was better than the
other teachers 8 times and worse than the others 8 times. So, do
achievement tests tell us whether we have identified a great teacher, or
a bad teacher? Or do the tests merely reveal who was in that teacher’s
class that particualr year? Again, the makeup of the class –
demographics like social class, ethnicity, and native language – are
powerful determiners of test scores.
But wait. Don’t the teachers need the state standardized test results to know how well their students are learning, what they know and what is still to be learned? Not at all. By Christmas, but certainly by springtime when most of the standardized tests are given, teachers can accurately tell you how their students will rank on those tests. Just ask them! And furthermore, they almost never get the information about their students’ achievement until the fall following the year they had those students in class making the information value of the tests nil!
In a pilot study by our former ASU student Annapurna Ganesh, a dozen
2nd and 3rd grade teachers ranked their children in terms of their
likely scores on their upcoming Arizona state tests. Correlations were
uniformly high – as high in one class as +.96! In a follow up study,
with a larger sample, here are the correlations found for 8 of the
third-grade teachers who predicted the ranking of their students on that
year’s state of Arizona standardized tests:
In
this third grade sample, the lowest rank order coefficient between a
teacher’s ranking of the students and the student’s ranking on the state
Math or Reading test was +.72! Berliner took these results to the
Arizona Department of Education, informing them that they could get the
information they wanted about how children are doing in about 10 minutes
and for no money! He was told that he was “lying,” and shown out of the
office. The abuse must go on. Contracts must be honored.
Predicting rank can’t tell you the national percentile of this child
or that, but that information is irrelevant to teachers anyway. Teachers
usually know which child is struggling, which is soaring, and what both
of them need. That is really the information that they need!
Thus far as we argue against the desire our federal Department of Education to reinstitute achievement testing in each state, we neglected to mention a test’s most important characteristic—its validity. We mention here, briefly, just one type of validity, content validity. To have content validity students in each state have to be exposed to/taught the curriculum for which the test is appropriate. The US Department of Education seems not to have noticed that since March 2020 public schooling has been in a bit of an upheaval! The chances that each district, in each state, has provided equal access to the curriculm on which a states’ test is based, is fraught under normal circumstances. In a pandemic it is a remarkably stupid assumption! We assert that no state achievement test will be content valid if given in the 2020-2021 school year. Furthermore, those who help in administering and analyzing such tests are likely in violation of the testing standards of the American Psychological Association, the American Educational Research Association, and the National Council on Measurement in Education. In addition to our other concerns with state standardized tests, there is no defensible use of an invalid test. Period.
We are not opposed to all testing, just to stupid testing. The
National Assessment Governing Board voted 12 to 10 in favor of
administering NAEP in 2021. There is some sense to doing so. NAEP tests
fewer than 1 in 1,000 students in grades 4, 8, and 12. As a valid
longitudinal measure, the results could tell us the extent of the
devastation of the Corona virus.
We end this essay with some good news. The DeVos Department of Education position on Spring 2021 testing is likely to be utterly irrelevant. She and assistant Blew are likely to be watching the operation of the Department of Education from the sidelines after January 21, 2021. We can only hope that members of a new administration read this and understand that some of the desperately needed money for American public schools can come from the huge federal budget for standardized testing. Because in seeking the answer to the question “Why bother testing in 2021?” we have necessarily confronted the more important question: “Why ever bother to administer these mandated tests?”
We hasten to add that we are not alone in this opinion. Among
measurement experts competent to opine on such things, our colleagues at
the National Education Policy Center likewise question the wisdom of a
2021 federal government mandated testing.
At the beginning of this week, in the esteemed, online, open-access, and peer-reviewed journal of which I am the Lead Editor — Education Policy Analysis Archives — a special issue on for which I also served as the Guest Editor was published. The special issue is about Policies and Practices of Promise in Teacher Evaluation and, more specifically, about how after the federal passage of the Every Student Succeeds Act (ESSA) in 2016, state leaders have (or have not) changed their teacher evaluation systems, potentially for the better. Changing for the better is defined throughout this special issue as aligning with the theoretical and empirical research currently available in the literature base surrounding contemporary teacher evaluation systems, as well as the theoretical and empirical research that is presented in the ten pieces included in this special issue.
The pieces include: one introduction, a set of two peer-reviewed theoretical commentaries, and seven empirical articles, via which authors present or discuss teacher evaluation policies and practices that may help us move (hopefully, well) beyond high-stakes teacher evaluation systems, especially as solely or primarily based on teachers’ impacts on growing their students’ standardized test scores over time (e.g., via the use of SGMs or VAMs). Below are all of the articles included.
In February of 2017, the controversial National Council on Teacher Quality (NCTQ) — created by the conservative Thomas B. Fordham Institute and funded (in part) by the Bill & Melinda Gates Foundation as “part of a coalition for ‘a better orchestrated agenda’ for accountability, choice, and using test scores to drive the evaluation of teachers” (see here) — issued a report about states’ teacher evaluation systems titled: “Running in Place: How New Teacher Evaluations Fail to Live Up to Promises.” See another blog post about a similar (and also slanted) study NCTQ conducted two years prior here. The NCTQ recently published another — a “State of the States: Teacher & Principal Evaluation Policy.” Like I did in those two prior posts, I summarize this report, only as per their teacher evaluation policy findings and assertions, below.
In 2009, only 15 states required objective measures of student growth (e.g., VAMs) in teacher evaluations; by 2015 this number increased nearly threefold to 43 states. However, as swiftly as states moved to make these changes, many of them have made a hasty retreat. Now there are 34 states requiring such measures. These modifications to these nine states’ evaluation systems are “poorly supported by research literature” which, of course, is untrue. Of note, as well, is that there are no literature cited to support this very statement.
For an interesting and somewhat interactive chart capturing what states are doing in the areas of their teacher and principal evaluation systems, however, you might want to look at NCTQ’s Figure 3 (again, within the full report, here). Not surprisingly, NCTQ subtotals these indicators by state and essentially categorizes states by the extent to which they have retreated from such “research-backed policies.”
You can also explore states’ laws, rules, and regulations, that range from data about teacher preparation, licensing, and evaluation to data about teacher compensation, professional development, and dismissal policies via NCTQ’s State Teacher Policy Database here.
Do states use data from state standardize tests to evaluate their teachers? See the (promising) evidence of states backing away from research-backed policies here (as per NCTQ’s Figure 5):
Also of interest is the number of states in which student surveys are being used to evaluate teachers, which is something reportedly trending across states, but perhaps not so much as currently thought (as per NCTQ’s Figure 9).
The NCTQ also backs the “research-backed benefits” of using such surveys, primarily (and again not surprisingly) in that they correlate (albeit at very weak-to-weak magnitudes) with the more objective measures (e.g., VAMs) still being pushed by the NCTQ. The NCTQ also, entirely, overlooks the necessary conditions required to make the data derived from student surveys, as well as their use, “reliable and valid” as over-simplistically claimed.
The rest of the report includes the NCTQ’s findings and assertions regarding states’ principal evaluation systems. If these are of interest, please scroll to the lower part of the document, again, available here.
Citation: Ross, E. & Walsh, K. (2019). State of the States 2019: Teacher and Principal Evaluation Policy. Washington, DC: National Council on Teacher Quality (NCTQ).
In a recent blog (see here), I posted about a teacher evaluation brief written by Alyson Lavigne and Thomas Good for Division 15 of the American Psychological Association (see here). There, Lavigne and Good voiced their concerns about inadequate teacher evaluation practices that did not help teachers improve instruction, and they described in detail the weaknesses of testing and observation practices used in current teacher evaluation practices.
In their book, Enhancing Teacher Education, Development, and Evaluation, they discuss other factors which diminish the value of teachers and teaching. They note that for decades many various federal documents, special commissions, summits, and foundation reports periodically issue reports that blatantly assert (with limited or no evidence) that American schools and teachers are tragically flawed and at times the finger has even been pointed at our students (e.g., A Nation at Risk chided students for their poor effort and performance). These reports, ranging from the Sputnik fear to the Race to the Top crisis, have pointed to an immediate and dangerous crisis. The cause of the crisis: Our inadequate schools that places America at scientific, military, or economic peril.
Given the plethora of media reports that follow these pronouncements of school crises (and pending doom) citizens are taught at least implicitly that schools are a mess, but the solutions are easy…if only teachers worked hard enough. Thus, when reforms fail, many policy makers scapegoat teachers as inadequate or uncaring. Lavigne and Good contend that these sweeping reforms (and their failures) reinforce the notion that teachers are inadequate. As the authors note, most teachers do an excellent job in supporting student growth and that they should be recognized for this accomplishment. In contrast, and unfortunately, teachers are scapegoated for conditions (e.g., poverty) that they cannot control.
They reiterate (and effectively emphasize) that an unexplored collateral damage (beyond the enormous cost and wasted resources of teachers and administrators) is the impact that sweeping and failed reform has upon citizens’ willingness to invest in public education. Policy makers and the media must recognize that teachers are competent and hard working and accomplish much despite the inadequate conditions in which they work.
Recently, the Educational Psychology Division of the American Psychological Association (APA) endorsed a set of recommendations, captured withing a research brief for policymakers, pertaining to best practices when evaluating teachers. The brief, that can be accessed here, was authored by Alyson Lavigne, Assistant Professor at Utah State, and Tom Good, Professor Emeritus at the University of Arizona.
In general, they recommend that states’/districts teacher evaluation efforts emphasize improving teaching in informed and formative ways verses categorizing and stratifying teachers in terms of their effectiveness in outcome-based and summative ways. As per recent evidence (see, for example, here), post the passage of the Every Student Succeeds Act (ESSA) in 2016, it seems states and districts are already heading in this direction.
Otherwise, they note that prior emphases on using teachers’ students’ test scores via, for example, the use of value-added models (VAMs) to hold teachers accountable for their effects on student achievement and simultaneously using observational systems (the two most common teacher evaluation measures of teacher evaluation’s recent past) is “problematic and [has] not improved student achievement” as a result of states’ and districts’ past efforts in these regards. Both teacher evaluation measures “fail to recognize the complexity of teaching or how to measure it.”
More specifically in terms of VAMs: (1) VAM scores do not adequately compare teachers given the varying contexts in which teachers teach and the varying factors that influence teaching and student learning; (2) Teacher effectiveness often varies over time making it difficult to achieve appropriate reliability (i.e., consistency) to justify VAM use, especially for high-stakes decision-making purposes; (3) VAMs can only attempt to capture effects for approximately 30% of all teachers, raising serious issues with fairness and uniformity; (4) VAM scores do not help teachers improve their instruction, also in that often teachers and their administrators do not have access, have late access, and simply do not understand their VAM-based data in order to use them in formative ways; and (5) Using VAMs discourages collegial exchange and sharing of ideas and resources.
More specifically in terms of observations: (1) Given classroom teaching is so complex, dynamic, and contextual, these measures are problematic given no systems that are currently available capture all aspects of good teaching; (2) Observing and providing teachers with feedback warrants significant time, attention, and resources but oft-receives little in all regards; (3) Principals have still not been prepared well enough to observe or provide useful feedback to teachers; and (4) The common practice of three formal observations/year/teacher does not adequately account for the fact that teacher practice and performance varies over time, across subject areas and students, and the like. I would add here a (5) in that these observational system have also been evidenced as biased in that, for example, teachers representing certain racial and ethnic backgrounds might be more likely than others to receive lower observational scores (see prior posts on these studies here, here and here).
In consideration of the above, what they recommend in terms of moving teacher evaluation systems forward follows:
Eliminate high-stakes teacher evaluations based only on student achievement data and especially limited observations (all should consider if and how additional observers, beyond just principals, might be leveraged);
Provide opportunities for teachers to be heard, for example, in terms of when and how they might be evaluated and to what ends;
Improve teacher evaluation systems in fundamental ways using technology, collaboration, and other innovations to transform teaching practice;
Emphasize formative feedback within and across teacher evaluation systems in that “improving instruction should be at least as important as evaluating instruction.”
You can see more of their criticisms of the current and recommendations for the future, again, in the full report here.
In December 2015 in New Mexico, via a preliminary injunction set forth by state District Judge David K. Thomson, all consequences attached to teacher-level value-added model (VAM) scores (e.g., flagging the files of teachers with low VAM scores) were suspended throughout the state until the state (and/or others external to the state) could prove to the state court that the system was reliable, valid, fair, uniform, and the like. The trial during which this evidence was to be presented was set, and re-set, and re-set again, never to actually occur. More specifically, after December 2015 and through 2018, multiple depositions and hearings occurred. In April 2019, the case was reassigned to a new judge (via a mass reassignment state policy), again, while the injunction was still in place.
Thereafter, teacher evaluation was a hot policy issue during the state’s 2018 gubernatorial election. The now-prior state governor, Republican Susana Martinez, who essentially ordered and helped shape the state’s teacher evaluation system at issue during this lawsuit, had reached the maximum number of terms served and could not run again. All candidates running to replace her had grave concerns about the state’s teacher evaluation system. Democrat Michelle Lujan Grisham ending up winning.
Two days after Grisham was sworn in, she signed an Executive Order for the entire state system to be amended, including no longer using value-added data to evaluate teachers. Her Executive Order also stipulated that the state department was to work with teachers, administrators, parents, students, and the like, to determine more appropriate methods of measuring teacher effectiveness. While the education task force charged with this task is still in the process of finalizing the state’s new system, it is important to note that now, although actually beginning in the 2018-2019 school year, teachers are being evaluated via (primarily) classroom observations and student/family surveys. The value-added component (and a teacher attendance component that was also the source of contention during this lawsuit) were removed entirely from the state’s teacher evaluation framework.
Likewise, the plaintiffs (the lawyers, teachers, and administrators with whom I worked on this case) are no longer moving forward with the 2015 lawsuit as Grisham’s Executive Order also rendered this lawsuit as moot.
The initial victory that we achieved in 2015 ultimately yielded a victory in the end. Way to go New Mexico!
*This post was co-authored by one of my PhD students – Tray Geiger – who is finishing up his dissertation about this case.
The title of this post captures the key findings of a study that has come across my desk now over 25 times during the past two weeks; hence, I decided to summarize and share out, also as significant to our collective understandings about value-added models (VAMs).
The study — “Teacher Effects on Student Achievement and Height: A Cautionary Tale” — was recently published by the National Bureau of Economic Research (NBER) (Note 1) and authored by Marianne Bitler (Professor of Economics at the University of California, Davis), Sean Corcoran (Associate Professor of Public Policy and Education at Vanderbilt University), Thurston Domina (Professor at the University of North Carolina at Chapel Hill), and Emily Penner (Assistant Professor at the University of California, Irvine).
In short, study researchers used administrative data from New York City Public Schools to estimate the “value” teachers “add” to student achievement, and (also in comparison) to student height. The assumption herein, of course, is that teachers’ cannot plausibly or literally “grow” their students’ heights. If they were found to do so using a VAM (also oft-referred to as “growth” models, hereafter referred to more generally as VAMs), this would threaten the overall validity of the output derive via any such VAM, given VAMs’ sole purposes are to measure teacher effects on “growth” in student achievement and only student achievement over time. Put differently, if a VAM was found to “grow” students’ height, this would ultimately negate the validity of any such VAM given the very purposes for which VAMs have been adopted, implemented, and used, misused, and abused across states, especially over the last decade.
Notwithstanding, study researchers found that “the standard deviation of teacher effects on height is nearly as large as that for math and reading achievement” (Abstract). More specifically, they found that the “estimated teacher ‘effects’ on height [were] comparable in magnitude to actual teacher effects on math and ELA achievement, 0.22 [standard deviations] compared to 0.29 [standard deviations] and 0.26 [standard deviations], respectively (p. 24).
Put differently, teacher effects, as measured by a commonly used VAM, were about the same in terms of the extent to which teachers “added value” to their students’ growth in achievement over time and their students’ physical heights. Clearly, this raises serious questions about the overall validity of this (and perhaps all) VAMs in terms of not only what they are intended to do, and what they did (at least in this study) as well. To yield such spurious results (i.e., results that are nonsensical and more likely due to noise than anything else) threatens the overall validity of the output derived via these models, as well as the extent to which their output can or should be trusted. This is clearly an issue with validity, or rather the validity of the inferences to be drawn from this (and perhaps/likely any other) VAM.
Ultimately the authors conclude that the findings from their paper should “serve as a cautionary tale” for the use of VAMs in practice. With all due respect to my colleagues, in my opinion their findings are much more serious than those that might merely warrant caution. Only one other study of which I am aware (Note 2), as akin to the study conducted here, could be as damming to the validity of VAMs and their too often “naïve application[s]” (p. 24).
Citation: Bitler, M., Corcoran, S., Domina, T., & Penner, E. (2019, November). Teacher effects on student achievement and height: A cautionary tale. National Bureau of Economic Research (NBER) Working Paper No. 26480. Retrieved from https://www.nber.org/papers/w26480.pdf
Note 1: As I have oft-commented in prior posts about papers published by the NBER, it is important to note that NBER papers such as these (i.e., “working papers”) have not been internally reviewed (e.g., by NBER Board Directors), nor have they been peer-reviewed or vetted. Rather, such “working papers” are widely circulated for discussion and comment, prior to what the academy of education would consider appropriate vetting. While listed in the front matter of this piece are highly respected scholars who helped critique and likely improve this paper, this is not the same as putting any such piece through a double-blinded, peer reviewed, process. Hence, caution is also warranted here when interpreting study results.
Note 2: Rothstein (2009, 2010) conducted a falsification test by which he tested, also counter-intuitively, whether a teacher in the future could cause, or have an impact on his/her students’ levels of achievement in the past. Rothstein demonstrated that given non-random student placement (and tracking) practices, VAM-based estimates of future teachers could be used to predict students’ past levels of achievement. More generally, Rothstein demonstrated that both typical and complex VAMs demonstrated counterfactual effects and did not mitigate bias because students are consistently and systematically grouped in ways that explicitly bias value-added estimates. Otherwise, the backwards predictions Rothstein demonstrated could not have been made.
Citations: Rothstein, J. (2009). Student sorting and bias in value-added estimation: Selection on observables and unobservables. Education Finance and Policy, (4)4, 537-571. doi:http://dx.doi.org/10.1162/edfp.2009.4.4.537
Rothstein, J. (2010). Teacher quality in educational production: Tracking, decay, and student achievement. Quarterly Journal of Economics. 175-214. doi:10.1162/qjec.2010.125.1.175
One of my doctoral students — Kevin Close, one of my former doctoral students — Clarin Collins, and I just had a study published in the practitioner journal Phi Delta Kappanthat I wanted to share out with all of you, especially before the study is no longer open-access or free (see the full article as currently available here). As the title of this post (which is the same as the title of the article) indicates, the study is about research the three of us conducted, by surveying every state (or interviewing leaders at every state’s department of education), about how each state’s changed their teacher evaluation systems post the passage of the Every Student Succeeds Act (ESSA).
In short, we foundstates have reduced their use of growth or value-added models (VAMs) within their teacher evaluation systems. In addition, states that are still using such models are using them in much less consequential ways, while many states are offering more alternatives for measuring the relationships between student achievement and teacher effectiveness. Additionally, state teacher evaluation plans also contain more language supporting formative teacher feedback (i.e., a noteworthy change from states’ prior summative and oft-highly consequential teacher evaluation systems). State departments of education also seem to be allowing districts to develop and implement more flexible teacher evaluation systems, with states simultaneously acknowledging challenges with being able to support increased local control, and localized teacher evaluation systems, especially when varied local systems present challenges with being able to support various local systems and compare data across schools and districts, in effect.
Again, you can read more here. See also the longer version of this study, if interested, here.
This past June, I presented at a conference at New York University (NYU) called Litigating Algorithms. Most attendees were lawyers, law students, and the like, all of whom were there to discuss the multiple ways that they have collectively and independently been challenging governmental uses of algorithm-based, decision-making systems (i.e., like VAMs) across disciplines. I was there to present about how VAMs have been used by states and school districts in education, as well as present the key issues with VAMs as litigated via the lawsuits in which I have been engaged (e.g., Houston, New Mexico, New York, Tennessee, and Texas). The conference was sponsored by the AI Now Institute, also at NYU, which has as its mission to examine the social implications of artificial intelligence (AI), and in collaboration with the Center on Race, Inequality, and the Law, affiliated with the NYU School of Law.
Anyhow, they just released their report from this conference and I thought it important to share out with all of you, also in that it details the extent to which similar AI systems are being used across disciplines beyond education, and it details how such uses (misuses and abuses) are being litigated in court.
See the press release below, and see the full report here.
—–
Litigating Algorithms 2019 U.S. Report – New Challenges to Government Use of Algorithmic Decision Systems
Today the AI Now Institute and NYU Law’s Center on Race, Inequality, and the Law published new research on the ways litigation is being used as a tool to hold government accountable for using algorithmic tools that produce harmful results.
Algorithmic decision systems (ADS) are often sold as offering a number of benefits, from mitigating human bias and error, to cutting costs and increasing efficiency, accuracy, and reliability. Yet proof of these advantages is rarely offered, even as evidence of harm increases. Within health care, criminal justice, education, employment, and other areas, the implementation of these technologies has resulted in numerous problems with profound effects on millions of peoples’ lives.
More than 19,000 Michigan residents were incorrectly disqualified from food-assistance benefits by an errant ADS. A similar system automatically and arbitrarily cut Oregonians’ disability benefits. And an ADS falsely labeled 40,000 workers in Michigan as having committed unemployment fraud. These are a handful of examples that make clear the profound human consequences of the use of ADS, and the urgent need for accountability and validation mechanisms.
In recent years, litigation has become a valuable tool for understanding the concrete and real impacts of flawed ADS and holding government accountable when it harms us.
The Report picks up where our 2018 report left off, revisiting the first wave of U.S. lawsuits brought against government use of ADS, and examining what progress, if any, has been made. We also explore a new wave of legal challenges that raise significant questions, including:
What access, if any, criminal defense attorneys should have to law enforcement ADS in order to challenge allegations leveled by the prosecution;
The profound human consequences of erroneous or vindictive uses of governmental ADS; and
The evolution of the Illinois Biometric Information Privacy Act, America’s most powerful biometric privacy law, and what its potential impact on ADS accountability might be.
This report offers concrete insights from actual cases involving plaintiffs and lawyers seeking justice in the face of harmful ADS. These cases illuminate many ways that ADS are perpetuating concrete harms, and the ways ADS companies are pushing against accountability and transparency.
The report also outlines several recommendations for advocates and other stakeholders interested in using litigation as a tool to hold government accountable for its use of ADS.
Citation: Richardson, R., Schultz, J. M., & Southerland, V. M. (2019). Litigating algorithms 2019 US report: New challenges to government use of algorithmic decision systems. New York, NY: AI Now Institute. Retrieved from https://ainowinstitute.org/litigatingalgorithms-2019-us.html
The views expressed herein and throughout all pages associated with vamboozled.com are solely those of the authors and may not reflect those of Arizona State University (ASU) or Mary Lou Fulton Teachers College (MLFTC). While the authors and others associated with vamboozled.com are affiliated with ASU and MLFTC, all opinions, views, original entries, errors, and the like should be attributable to the authors and content developers of this blog, not whatsoever to ASU or MLFTC.