Last Saturday Night Live’s VAM-Related Skit

For those of you who may have missed it last Saturday, Melissa McCarthy portrayed Sean Spicer — President Trump’s new White House Press Secretary and Communications Director — in one of the funniest of a very funny set of skits recently released on Saturday Night Live. You can watch the full video, compliments of YouTube, here:

In one of the sections of the skit, though, “Spicer” introduces “Betsy DeVos” — portrayed by Kate McKinnon and also just today confirmed as President Trump’s Secretary of Education — to answer some very simple questions about today’s public schools which she, well, very simply could not answer. See this section of the clip starting at about 6:00 (of the above 8:00 minute total skit).

In short, “the man” reporter asks “DeVos” how she values “growth versus proficiency in [sic] measuring progress in students.” Literally at a loss of words, “DeVos” responds that she really doesn’t “know anything about school.” She rambles on, until “Spicer” pushes her off of the stage 40-or-so seconds later.

Humor set aside, this was the one question Saturday Night Live writers wrote into this skit, which reminds us that what we know more generally as the purpose of VAMs is still alive and well in our educational rhetoric as well as popular culture. As background, this question apparently came from Minnesota Sen. Al Franken’s prior, albeit similar question during DeVos’s confirmation hearing.

Notwithstanding, Steve Snyder – the editorial director of The 74 — an (allegedly) non-partisan, honest, and fact-based backed by Editor-in-Chief Campbell Brown (see prior posts about this news site here and here) — took the opportunity to write a “featured” piece about this section of the script (see here). The purpose of the piece was, as the title illustrates, to help us “understand” the skit, as well as it’s important meaning for all of “us.”

Snyder notes that Saturday Night Live writers, with their humor, might have consequently (and perhaps mistakenly) “made their viewers just a little more knowledgeable about how their child’s school works,” or rather should work, as “[g]rowth vs. proficiency is a key concept in the world of education research.” Thereafter, Snyder falsely asserts that more than 2/3rds of educational researchers agree that VAMs are a good way to measure school quality. If you visit the actual statistic cited in this piece, however, as “non-partison, honest, and fact-based” that it is supposed to be, you would find (here) that this 2/3rds consists of 57% of responding American Education Finance Association (AEFA) members, and AEFA members alone, who are certainly not representative of “educational researchers” as claimed.

Regardless, Snyder asks: “Why are researchers…so in favor of [these] growth measures?” Because this disciplinary subset does not represent educational researchers writ large, but only a subset, Snyder.

As it is with politics today, many educational researchers who define themselves as aligned with the disciplines of educational finance or educational econometricians are substantively more in favor of VAMs than those who align more with the more general disciplines of educational research and educational measurement, methods, and statistics, in general. While this is somewhat of a sweeping generalization, which is not wise as I also argue and also acknowledge in this piece, there is certainly more to be said here about the validity of the inferences drawn here, and (too) often driven via the “media” like The 74.

The bottom line is to question and critically consume everything, and everyone who feels qualified to write about particular things without enough expertise in most everything, including in this case good and professional journalism, this area of educational research, and what it means to make valid inferences and then responsibly share them out with the public.

The Late Stephen Jay Gould on IQ Testing (with Implications for Testing Today)

One of my doctoral students sent me a YouTube video I feel compelled to share with you all. It is an interview with one of my all time favorite and most admired academics — Stephen Jay Gould. Gould, who passed away at age 60 from cancer, was a paleontologist, evolutionary biologist, and scientist who spent most of his academic career at Harvard. He was “one of the most influential and widely read writers of popular science of his generation,” and he was also the author of one of my favorite books of all time: The Mismeasure of Man (1981).

In The Mismeasure of Man Gould examined the history of psychometrics and the history of intelligence testing (e.g., the methods of nineteenth century craniometry, or the physical measures of peoples’ skulls to “objectively” capture their intelligence). Gould examined psychological testing and the uses of all sorts of tests and measurements to inform decisions (which is still, as we know, uber-relevant today) as well as “inform” biological determinism (i.e., “the view that “social and economic differences between human groups—primarily races, classes, and sexes—arise from inherited, inborn distinctions and that society, in this sense, is an accurate reflection of biology). Gould also examined in this book the general use of mathematics and “objective” numbers writ large to measure pretty much anything, as well as to measure and evidence predetermined sets of conclusions. This book is, as I mentioned, one of the best. I highly recommend it to all.

In this seven-minute video, you can get a sense of what this book is all about, as also so relevant to that which we continue to believe or not believe about tests and what they really are or are not worth. Thanks, again, to my doctoral student for finding this as this is a treasure not to be buried, especially given Gould’s 2002 passing.

Why Standardized Tests Should Not Be Used to Evaluate Teachers (and Teacher Education Programs)

David C. Berliner, Regents’ Professor Emeritus here at Arizona State University (ASU), who also just happens to be my former albeit forever mentor, recently took up research on the use of test scores to evaluate teachers, for example, using value-added models (VAMs). While David is world-renowned for his research in educational psychology, and more specific to this case, his expertise on effective teaching behaviors and how to capture and observe them, he has also now ventured into the VAM-related debates.

Accordingly, he recently presented his newest and soon-to-be-forthcoming published research on using standardized tests to evaluate teachers, something he aptly termed in the title of his presentation “A Policy Fiasco.” He delivered his speech to an audience in Melbourne, Australia, and you can click here for the full video-taped presentation; however, given the whole presentation takes about one hour to watch, although I must say watching the full hour is well worth it, I highlight below what are his highlights and key points. These should certainly be of interest to you all as followers of this blog, and hopefully others.

Of main interest are his 14 reasons, “big and small’ for [his] judgment that assessing teacher competence using standardized achievement tests is nearly worthless.”

Here are his fourteen reasons:

  1. “When using standardized achievement tests as the basis for inferences about the quality of teachers, and the institutions from which they came, it is easy to confuse the effects of sociological variables on standardized test scores” and the effects teachers have on those same scores. Sociological variables (e.g., chronic absenteeism) continue to distort others’ even best attempts to disentangle them from the very instructional variables of interest. This, what we also term as biasing variables, are important not to inappropriately dismiss, as purportedly statistically “controlled for.”
  2. In law, we do not hold people accountable for the actions of others, for example, when a child kills another child and the parents are not charged as guilty. Hence, “[t]he logic of holding [teachers and] schools of education responsible for student achievement does not fit into our system of law or into the moral code subscribed to by most western nations.” Related, should medical school or doctors, for that matter, be held accountable for the health of their patients? One of the best parts of his talk, in fact, is about the medical field and the corollaries Berliner draws between doctors and medical schools, and teachers and colleges of education, respectively (around the 19-25 minute mark of his video presentation).
  3. Professionals are often held harmless for their lower success rates with clients who have observable difficulties in meeting the demands and the expectations of the professionals who attend to them. In medicine again, for example, when working with impoverished patients, “[t]here is precedent for holding [doctors] harmless for their lowest success rates with clients who have observable difficulties in meeting the demands and expectations of the [doctors] who attend to them, but the dispensation we offer to physicians is not offered to teachers.”
  4. There are other quite acceptable sources of data, besides tests, for judging the efficacy of teachers and teacher education programs. “People accept the fact that treatment and medicine may not result in the cure of a disease. Practicing good medicine is the goal, whether or not the patient gets better or lives. It is equally true that competent teaching can occur independent of student learning or of the achievement test scores that serve as proxies for said learning. A teacher can literally “save lives” and not move the metrics used to measure teacher effectiveness.
  5. Reliance on standardized achievement test scores as the source of data about teacher quality will inevitably promote confusion between “successful” instruction and “good” instruction. “Successful” instruction gets test scores up. “Good” instruction leaves lasting impressions, fosters further interest by the students, makes them feel competent in the area, etc. Good instruction is hard to measure, but remains the goal of our finest teachers.
  6. Related, teachers affect individual students greatly, but affect standardized achievement test scores very little. All can think of how their own teachers impacted their lives in ways that cannot be captured on a standardized achievement test.  Standardized achievement test scores are much more related to home, neighborhood and cohort than they are to teachers’ instructional capabilities. In more contemporary terms, this is also due the fact that large-scale standardized tests have (still) never been validated to measure student growth over time, nor have they been validated to attribute that growth to teachers. “Teachers have huge effects, it’s just that the tests are not sensitive to them.”
  7. Teacher’s effects on standardized achievement test scores fade quickly, barely discernable after a few years. So we might not want to overly worry about most teachers’ effects on their students—good or bad—as they are hard to detect on tests after two or so years. To use these ephemeral effects to then hold teacher education programs accountable seems even more problematic.
  8. Observational measures of teacher competency and achievement tests of teacher competency do not correlate well. This suggest nothing more than that one or both of these measures, and likely the latter, are malfunctioning in their capacities to measure the teacher effectiveness construct. See other Vamboozled posts about this here, here, and here.
  9. Different standardized achievement tests, both purporting to measure reading, mathematics, or science at the same grade level, will give different estimates of teacher competency. That is because different test developers have different visions of what it means to be competent in each of these subject areas. Thus one achievement test in these subject areas could find a teacher exemplary, but another test of those same subject areas would find the teacher lacking. What then? Have we an unstable teacher or an ill-defined subject area?
  10. Tests can be administered early or late in the fall, early or late in the spring, and the dates they are given influence the judgments about whether a teacher is performing well or poorly. Teacher competency should not be determined by minor differences in the date of testing, but that happens frequently.
  11. No standardized achievement tests have provided proof that their items are instructionally sensitive. If test items do not, because they cannot “react to good instruction,” how can one make a claim that the test items are “tapping good instruction?”
  12. Teacher effects show up more dramatically on teacher made tests than on standardized achievement tests because the former are based on the enacted curriculum, while the latter are based on the desired curriculum. You get seven times more instructionally sensitive tests the closer the test is to the classroom (i.e., teacher made tests).
  13. The opt-out testing movement invalidates inferences about teachers and schools that can be made from standardized achievement test results. Its not bad to remove these kids from taking these tests, and perhaps it is even necessary in our over-tested schools, but the tests and the VAM estimates derived via these tests, are far less valid when that happens. This is because the students who opt out are likely different in significant ways from those who do take the tests. This severely limits the validity claims that are made.
  14. Assessing new teachers with standardized achievement tests is likely to yield many false negatives. That is, the assessments would identify teachers early in their careers as ineffective in improving test scores, which is, in fact, often the case for new teachers. Two or three years later that could change. Perhaps the last thing we want to do in a time of teacher shortage is discourage new teachers while they acquire their skills.

Stanford Professor Haertel: Short Video about VAM Reliability and Bias

I just came across this 3-minute video that you all might/should find of interest (click here for direct link to this video on YouTube; click here to view the video’s original posting on Stanford’s Center for Opportunity Policy in Education (SCOPE)).

Featured is Stanford’s Professor Emeritus – Dr. Edward Haertel – describing what he sees as two major flaws in the use of VAMs for teacher evaluation and accountability. These are two flaws serious enough, he argues, to prevent others from using VAM scores to make high-stakes decisions about really any of America’s public school teachers. “Like all measurements, these scores are imperfect. They are appropriate and useful for some purposes, but not for others. Viewed from a measurement perspective, value-added scores have limitations that make them unsuitable for high-stakes personnel decisions.”

The first problem is the unreliability of VAM scores which is attributed to noise from the data. The effect of a teacher is important, but weak when all of the other contributing factors are taken into account. The separation of the effect of a teacher from all the other effects is very difficult. This isn’t a flaw that can be fixed by more sophisticated statistical models; it is innate to the data collected.

The second problem is that the models must account for bias. The bias is the difference in circumstances faced by a teacher in a strong school and a teacher in a high-needs school. The instructional history of a student includes out of school support, peer support, and the academic learning climate of the school and VAMs do not take these important factors into account.

EVAAS, Value-Added, and Teacher Branding

I do not think I ever shared this video out, and now following up on another post, about the potential impact these videos should really have, I thought now is an appropriate time to share. “We can be the change,” and social media can help.

My former doctoral student and I put together this video, after conducting a study with teachers in the Houston Independent School District and more specifically four teachers whose contracts were not renewed due in large part to their EVAAS scores in the summer of 2011. This video (which is really a cartoon, although it certainly lacks humor) is about them, but also about what is happening in general in their schools, post the adoption and implementation (at approximately $500,000/year) of the SAS EVAAS value-added system.

To read the full study from which this video was created, click here. Below is the abstract.

The SAS Educational Value-Added Assessment System (SAS® EVAAS®) is the most widely used value-added system in the country. It is also self-proclaimed as “the most robust and reliable” system available, with its greatest benefit to help educators improve their teaching practices. This study critically examined the effects of SAS® EVAAS® as experienced by teachers, in one of the largest, high-needs urban school districts in the nation – the Houston Independent School District (HISD). Using a multiple methods approach, this study critically analyzed retrospective quantitative and qualitative data to better comprehend and understand the evidence collected from four teachers whose contracts were not renewed in the summer of 2011, in part given their low SAS® EVAAS® scores. This study also suggests some intended and unintended effects that seem to be occurring as a result of SAS® EVAAS® implementation in HISD. In addition to issues with reliability, bias, teacher attribution, and validity, high-stakes use of SAS® EVAAS® in this district seems to be exacerbating unintended effects.

The Flaws of Using Value-Added Models for Teacher Assessment: Video By Edward Haertel

Within this blog, my team and I have tried to make available various resources for our various followers (which, by the way, are over 13,000 in number; see, for example, here).

These resource include, but are not limited to, our lists of research articles (see the “Top 15″ articles here, the “Top 25″ articles here, all articles published in AERA journals here, and all suggested research articles, books, etc., click here.), our list of VAM scholars (whom I/we most respect, even if their research-based opinions differ), VAMboozlers (who represent the opposite, with my/our level of respect questionable), and internet videos all housed and archived here. This includes, for example what I still believe is the best video yet on all of these issues combined — HBO’s Last Week Tonight with John Oliver on Standardized Testing (which includes a decent section, also, on value-added models).

But one video we have included in our collection we have not explicitly made public. It was, however, just posted on another website, and this reminded us that it indeed deserves special attention, and a special post.

The video featured here, as well as on this blog here, features Dr. Edward Haertel — National Academy of Education member and Professor Emeritus at Stanford University — talking about “The Flaws of Using Value-Added Models for Teacher Assessment.” The video is just over three minutes; do give it a watch here.

Teacher Won’t be Bullied by Alhambra (AZ) School Officials

Lisa Elliott, a National Board Certified Teacher (NBCT) and 18-year veteran teacher who has devoted her 18-year professional career to the Alhambra Elementary School District — a Title I school district (i.e., having at least 40% of the student population from low-income families) located in the Phoenix/Glendale area — expresses in this video how she refuses to be bullied by her district’s misuse of standardized test scores.

Approximately nine months ago she was asked to resign her teaching position by the district’s interim superintendent – Dr. Michael Rivera – due to her students’ low test scores for the 2013-2014 school year, and despite her students exceeding expectations on other indicators of learning and achievement. She “respectfully declined” submitting her resignation letter because, for a number of reasons, including that her “children are more than a test score.” Unfortunately, however, other excellent teachers in her district just left…

Yong Zhao’s Stand-Up Speech

Yong Zhao — Professor in the Department of Educational Methodology, Policy, and Leadership at the University of Oregon — was a featured speaker at the recent annual conference of the Network for Public Education (NPE). He spoke about “America’s Suicidal Quest for Outcomes,” as in, test-based outcomes.

I strongly recommend you take almost an hour (i.e., 55 minutes) out of your busy days and sit back and watch what is the closest thing to a stand-up speech I’ve ever seen. Zhao offers a poignant but also very entertaining and funny take on America’s public schools, surrounded by America’s public school politics and situated in America’s pop culture. The full transcription of Zhao’s speech is also available here, as made available by Mercedes Schneider, for any and all who wish to read it: Yong_Zhao NPE Transcript

Zhao speaks of democracy, and embraces his freedom of speech in America (v. China) that permits him to speak out. He explains why he pulled his son out of public school, thanks to No Child Left Behind (NCLB), yet he criticizes G. W. Bush for causing his son to (since college graduation) live in his basement. Hence, Zhao’s “readiness” to leave the basement is much more important than any other performance “readiness” measure being written into the plethora of educational policies surrounding “readiness” (e.g., career and college readiness, pre-school readiness).

Zhao uses what happened to Easter Island’s Rapa Nui civilization that led to their extinction as an analogy for what may happen to us post Race to the Top, given both sets of people are/were driven by false hopes of the gods raining down on them prosperity, should they successfully compete for success and praise. Like the Rapa Nui built monumental statues in their race to “the top” (literally), the unintended consequences that came about as a result (e.g., the exploitation of their natural resources) destroyed their civilization. Zhao argues the same thing is happening in our country with test scores being the most sought after monuments, again, despite the consequences.

Zhao calls for mandatory lists of side effects that come along with standardized testing, similar to something I wrote years ago in an article titled “Buyer, Be Aware: The Value-Added Assessment Model is One Over-the-Counter Product that May Be Detrimental to Your Health.” In this article I pushed for a Federal Drug Administration (FDA) approach to educational research, that would serve as a model to protect the intellectual health of the U.S. A simple approach that legislators and education leaders would have to follow when they passed legislation or educational policies whose benefits and risks are known, or unknown.

Otherwise, he calls all educators (and educational policymakers) to continuously ask themselves one question when test scores rise: “What did you give up to achieve this rise in scores.” When you choose something, what do you lose?

Do give it a watch!

Help Florida Teacher Luke Flint “Tell His Story” about His VAM Scores

This is a great (although unfortunate) YouTube video capturing Indian River County, Florida teacher Luke Flint’s “Story” about the VAM scores he just received from the state as based on the state’s value-added formula.

This is a must watch, and a must share, as his “Story” has potential to “add value” in the best of ways, that is, in terms of further informing debates about how these VAMs actually “work” in practice.