“Virginia SGP” Overruled

You might recall from a post I released approximately 1.5 years ago a story about how a person who self-identifies as “Virginia SGP,” who is also now known as Brian Davison — a parent of two public school students in the affluent Loudoun, Virginia area (hereafter referred to as Virginia SGP), sued the state of Virginia in an attempt to force the release of teachers’ student growth percentile (SGP) data for all teachers across the state.

More specifically, Virginia SGP “pressed for the data’s release because he thinks parents have a right to know how their children’s teachers are performing, information about public employees that exists but has so far been hidden. He also want[ed] to expose what he sa[id was] Virginia’s broken promise to begin [to use] the data to evaluate how effective the state’s teachers are.” The “teacher data should be out there,” especially if taxpayers are paying for it.

In January of 2016, a Richmond, Virginia judge ruled in Virginia SGP’s favor. The following April, a Richmond Circuit Court judge ruled that the Virginia Department of Education was to also release Loudoun County Public Schools’ SGP scores by school and by teacher, including teachers’ identifying information. Accordingly, the judge noted that the department of education and the Loudoun school system failed to “meet the burden of proof to establish an exemption’ under Virginia’s Freedom of Information Act [FOIA]” preventing the release of teachers’ identifiable information (i.e., beyond teachers’ SGP data). The court also ordered VDOE to pay Davison $35,000 to cover his attorney fees and other costs.

As per an article published last week, the Virginia Supreme Court overruled this former ruling, noting that the department of education did not have to provide teachers’ identifiable information along with teachers’ SGP data, after all.

See more details in the actual article here, but ultimately the Virginia Supreme Court concluded that the Richmond Circuit Court “erred in ordering the production of these documents containing teachers’ identifiable information.” The court added that “it was [an] error for the circuit court to order that the School Board share in [Virginia SGP’s] attorney’s fees and costs,” pushing that decision (i.e., the decision regarding how much to pay, if anything at all, in legal fees) back down to the circuit court.

Virginia SGP plans to ask for a rehearing of this ruling. See also his comments on this ruling here.

Last Saturday Night Live’s VAM-Related Skit

For those of you who may have missed it last Saturday, Melissa McCarthy portrayed Sean Spicer — President Trump’s new White House Press Secretary and Communications Director — in one of the funniest of a very funny set of skits recently released on Saturday Night Live. You can watch the full video, compliments of YouTube, here:

In one of the sections of the skit, though, “Spicer” introduces “Betsy DeVos” — portrayed by Kate McKinnon and also just today confirmed as President Trump’s Secretary of Education — to answer some very simple questions about today’s public schools which she, well, very simply could not answer. See this section of the clip starting at about 6:00 (of the above 8:00 minute total skit).

In short, “the man” reporter asks “DeVos” how she values “growth versus proficiency in [sic] measuring progress in students.” Literally at a loss of words, “DeVos” responds that she really doesn’t “know anything about school.” She rambles on, until “Spicer” pushes her off of the stage 40-or-so seconds later.

Humor set aside, this was the one question Saturday Night Live writers wrote into this skit, which reminds us that what we know more generally as the purpose of VAMs is still alive and well in our educational rhetoric as well as popular culture. As background, this question apparently came from Minnesota Sen. Al Franken’s prior, albeit similar question during DeVos’s confirmation hearing.

Notwithstanding, Steve Snyder – the editorial director of The 74 — an (allegedly) non-partisan, honest, and fact-based backed by Editor-in-Chief Campbell Brown (see prior posts about this news site here and here) — took the opportunity to write a “featured” piece about this section of the script (see here). The purpose of the piece was, as the title illustrates, to help us “understand” the skit, as well as it’s important meaning for all of “us.”

Snyder notes that Saturday Night Live writers, with their humor, might have consequently (and perhaps mistakenly) “made their viewers just a little more knowledgeable about how their child’s school works,” or rather should work, as “[g]rowth vs. proficiency is a key concept in the world of education research.” Thereafter, Snyder falsely asserts that more than 2/3rds of educational researchers agree that VAMs are a good way to measure school quality. If you visit the actual statistic cited in this piece, however, as “non-partison, honest, and fact-based” that it is supposed to be, you would find (here) that this 2/3rds consists of 57% of responding American Education Finance Association (AEFA) members, and AEFA members alone, who are certainly not representative of “educational researchers” as claimed.

Regardless, Snyder asks: “Why are researchers…so in favor of [these] growth measures?” Because this disciplinary subset does not represent educational researchers writ large, but only a subset, Snyder.

As it is with politics today, many educational researchers who define themselves as aligned with the disciplines of educational finance or educational econometricians are substantively more in favor of VAMs than those who align more with the more general disciplines of educational research and educational measurement, methods, and statistics, in general. While this is somewhat of a sweeping generalization, which is not wise as I also argue and also acknowledge in this piece, there is certainly more to be said here about the validity of the inferences drawn here, and (too) often driven via the “media” like The 74.

The bottom line is to question and critically consume everything, and everyone who feels qualified to write about particular things without enough expertise in most everything, including in this case good and professional journalism, this area of educational research, and what it means to make valid inferences and then responsibly share them out with the public.

Ohio Rejects Subpar VAM, for Another VAM Arguably Less Subpar?

From a prior post coming from Ohio (see here), you may recall that Ohio state legislators recently introduced a bill to review its state’s value-added model (VAM), especially as it pertains to the state’s use of their VAM (i.e., the Education Value-Added Assessment System (EVAAS); see more information about the use of this model in Ohio here).

As per an article published last week in The Columbus Dispatch, the Ohio Department of Education (ODE) apparently rejected a proposal made by the state’s pro-charter school Ohio Coalition for Quality Education and the state’s largest online charter school, all of whom wanted to add (or replace) this state’s VAM with another, unnamed “Similar Students” measure (which could be the Student Growth Percentiles model discussed prior on this blog, for example, here, here, and here) used in California.

The ODE charged that this measure “would lower expectations for students with different backgrounds, such as those in poverty,” which is not often a common criticism of this model (if I have the model correct), nor is it a common criticism of the model they already have in place. In fact, and again if I have the model correct, these are really the only two models that do not statistically control for potentially biasing factors (e.g., student demographic and other background factors) when calculating teachers’ value-added; hence, their arguments about this model may be in actuality no different than that which they are already doing. Hence, statements like that made by Chris Woolard, senior executive director of the ODE, are false: “At the end of the day, our system right now has high expectations for all students. This (California model) violates that basic principle that we want all students to be able to succeed.”

The models, again if I am correct, are very much the same. While indeed the California measurement might in fact consider “student demographics such as poverty, mobility, disability and limited-English learners,” this model (if I am correct on the model) does not statistically factor these variables out. If anything, the state’s EVAAS system does, even though EVAAS modelers claim they do not do this, by statistically controlling for students’ prior performance, which (unfortunately) has these demographics already built into them. In essence, they are already doing the same thing they now protest.

Indeed, as per a statement made by Ron Adler, president of the Ohio Coalition for Quality Education, not only is it “disappointing that ODE spends so much time denying that poverty and mobility of students impedes their ability to generate academic performance…they [continue to] remain absolutely silent about the state’s broken report card and continually defend their value-added model that offers no transparency and creates wild swings for schools across Ohio” (i.e., the EVAAS system, although in all fairness all VAMs and the SGP yield the “wild swings’ noted). See, for example, here.

What might be worse, though, is that the ODE apparently found that, depending on the variables used in the California model, it produced different results. Guess what! All VAMs, depending on the variables used, produce different results. In fact, using the same data and different VAMs for the same teachers at the same time also produce (in some cases grossly) different results. The bottom line here is if any thinks that any VAM is yielding estimates from which valid or “true” statements can be made are fooling themselves.

New Mexico Lawsuit Update

The ongoing lawsuit in New Mexico has, once again (see here and here), been postponed to October of 2017 due to what are largely (and pretty much only) state (i.e., Public Education Department (PED)) delays. Whether the delays are deliberate are uncertain but being involved in this case… The (still) good news is that the preliminary injunction granted to teachers last fall (see here) still holds so that teachers cannot (or are not to) face consequences as based on the state’s teacher evaluation system.

For more information, this is the email the American Federation of Teachers – New Mexico (AFT NM) and the Albuquerque Teachers Federation (ATF) sent out to all of their members yesterday:

Yesterday, both AFT NM/ATF and PED returned to court to address the ongoing legal battle against the PED evaluation system. Our lawyer proposed that we set a court date ASAP. The PED requested a date for next fall citing their busy schedule as the reason. As a result, the court date is now late October 2017.

While we are relieved to have a final court date set, we are dismayed at the amount of time that our teachers have to wait for the final ruling.

In a statement to the press, ATF President Ellen Bernstein reflected on the current state of our teachers in regards to the evaluation system, “Even though they know they can’t be harmed in their jobs right now, it bothers them in the core of their being, and nothing I can say can take that away…It’s a cloud over everybody.”

AFT NM President Stephanie Ly, said, “It is a shame our educators still don’t have a legitimate evaluation system. The PED’s previous abusive evaluation system was thankfully halted through an injunction by the New Mexico courts, and the PED has yet to create an evaluation system that uniformly and fairly evaluates educators, and have shown no signs to remedy this situation. The PED’s actions are beyond the pale, and it is simply unjust.”

While we await trial, we want to thank our members who sent in their evaluation data to help our case. Remind your colleagues that they may still advance in licensure by completing a dossier; the PED’s arbitrary rating system cannot negatively affect a teacher’s ability to advance thanks to the injunction won by AFT NM/ATF last fall. That injunction will stay in place until a ruling is issued on our case next October.

In Solidarity,

Stephanie Ly

New Mexico Is “At It Again”

“A Concerned New Mexico Parent” sent me yet another blog entry for you all to stay apprised of the ongoing “situation” in New Mexico and the continuous escapades of the New Mexico Public Education Department (NMPED). See “A Concerned New Mexico Parent’s” prior posts here, here, and here, but in this one (s)he writes what follows:

Well, the NMPED is at it again.

They just released the teacher evaluation results for the 2015-2016 school year. And, the report and media press releases are a something.

Readers of this blog are familiar with my earlier documentation of the myriad varieties of scoring formulas used by New Mexico to evaluate its teachers. If I recall, I found something like 200 variations in scoring formulas [see his/her prior post on this here with an actual variation count at n=217].

However, a recent article published in the Albuquerque Journal indicates that, now according to the NMPED, “only three types of test scores are [being] used in the calculation: Partnership for Assessment of Readiness for College and Careers [PARCC], end-of-course exams, and the [state’s new] Istation literacy test.” [Recall from another article released last January that New Mexico’s Secretary of Education Hanna Skandera is also the head of the governing board for the PARCC test].

Further, the Albuquerque Journal article author reports that the “PED also altered the way it classifies teachers, dropping from 107 options to three. Previously, the system incorporated many combinations of criteria such as a teacher’s years in the classroom and the type of standardized test they administer.”

The new state-wide evaluation plan is also available in more detail here. Although I should also add that there has been no published notification of the radical changes in this plan. It was just simply and quietly posted on NMPED’s public website.

Important to note, though, is that for Group B teachers (all levels), the many variations documented previously have all been replaced by end-of-course (EOC) exams. Also note that for Group A teachers (all levels) the percentage assigned to the PARCC test has been reduced from 50% to 35%. (Oh, how the mighty have fallen …). The remaining 15% of the Group A score is to be composed of EOC exam scores.

There are only two small problems with this NMPED simplification.

First, in many districts, no EOC exams were given to Group B teachers in the 2015-2016 school year, and none were given in the previous year either. Any EOC scores that might exist were from a solitary administration of EOC exams three years previously.

Second, for Group A teachers whose scores formerly relied solely on the PARCC test for 50% of their score, no EOC exams were ever given.

Thus, NMPED has replaced their policy of evaluating teachers on the basis of students they don’t teach to this new policy of evaluating teachers on the basis of tests they never administered!

Well done, NMPED (not…)

Luckily, NMPED still cannot make any consequential decisions based on these data, again, until NMPED proves to the court that the consequential decisions that they would still very much like to make (e.g., employment, advancement and licensure decisions) are backed by research evidence. I know, interesting concept…

Using VAMs “In Not Very Intelligent Ways:” A Q&A with Jesse Rothstein

The American Prospect — a self-described “liberal intelligence” magazine — featured last week a question and answer, interview-based article with Jesse Rothstein — Professor of Economics at University of California – Berkeley — on “The Economic Consequences of Denying Teachers Tenure.” Rothstein is a great choice for this one in that indeed he is an economist, but one of a few, really, who is deep into the research literature and who, accordingly, has a balanced set of research-based beliefs about value-added models (VAMs), their current uses in America’s public schools, and what they can and cannot do (theoretically) to support school reform. He’s probably most famous for a study he conducted in 2009 about how the non-random, purposeful sorting of students into classrooms indeed biases (or distorts) value-added estimations, pretty much despite the sophistication of the statistical controls meant to block (or control for) such bias (or distorting effects). You can find this study referenced here, and a follow-up to this study here.

In this article, though, the interviewer — Rachel Cohen — interviews Jesse primarily about how in California a higher court recently reversed the Vergara v. California decision that would have weakened teacher employment protections throughout the state (see also here). “In 2014, in Vergara v. California, a Los Angeles County Superior Court judge ruled that a variety of teacher job protections worked together to violate students’ constitutional right to an equal education. This past spring, in a 3–0 decision, the California Court of Appeals threw this ruling out.”

Here are the highlights in my opinion, by question and answer, although there is much more information in the full article here:

Cohen: “Your research suggests that even if we got rid of teacher tenure, principals still wouldn’t fire many teachers. Why?”

Rothstein: “It’s basically because in most cases, there’s just not actually a long list of [qualified] people lining up to take the jobs; there’s a shortage of qualified teachers to hire.” In addition, “Lots of schools recognize it makes more sense to keep the teacher employed, and incentivize them with tenure…”I’ve studied this, and it’s basically economics 101. There is evidence that you get more people interested in teaching when the job is better, and there is evidence that firing teachers reduces the attractiveness of the job.”

Cohen: Your research suggests that even if we got rid of teacher tenure, principals still wouldn’t fire many teachers. Why?

Rothstein: It’s basically because in most cases, there’s just not actually a long list of people lining up to take the jobs; there’s a shortage of qualified teachers to hire. If you deny tenure to someone, that creates a new job opening. But if you’re not confident you’ll be able to fill it with someone else, that doesn’t make you any better off. Lots of schools recognize it makes more sense to keep the teacher employed, and incentivize them with tenure.

Cohen: “Aren’t most teachers pretty bad their first year? Are we denying them a fair shot if we make tenure decisions so soon?”

Rothstein: “Even if they’re struggling, you can usually tell if things will turn out to be okay. There is quite a bit of evidence for someone to look at.”

Cohen: “Value-added models (VAM) played a significant role in the Vergara trial. You’ve done a lot of research on these tools. Can you explain what they are?”

Rothstein: “[The] value-added model is a statistical tool that tries to use student test scores to come up with estimates of teacher effectiveness. The idea is that if we define teacher effectiveness as the impact that teachers have on student test scores, then we can use statistics to try to then tell us which teachers are good and bad. VAM played an odd role in the trial. The plaintiffs were arguing that now, with VAM, we have these new reliable measures of teacher effectiveness, so we should use them much more aggressively, and we should throw out the job statutes. It was a little weird that the judge took it all at face value in his decision.”

Cohen: “When did VAM become popular?”

Rothstein: “I would say it became a big deal late in the [George W.] Bush administration. That’s partly because we had new databases that we hadn’t had previously, so it was possible to estimate on a large scale. It was also partly because computers had gotten better. And then VAM got a huge push from the Obama administration.”

Cohen: “So you’re skeptical of VAM.”

Rothstein: “I think the metrics are not as good as the plaintiffs made them out to be. There are bias issues, among others.”

Cohen: “During the Vergara trials you testified against some of Harvard economist Raj Chetty’s VAM research, and the two of you have been going back and forth ever since. Can you describe what you two are arguing about?”

Rothstein: “Raj’s testimony at the trial was very focused on his work regarding teacher VAM. After the trial, I really dug in to understand his work, and I probed into some of his assumptions, and found that they didn’t really hold up. So while he was arguing that VAM showed unbiased results, and VAM results tell you a lot about a teacher’s long-term outcomes, I concluded that what his approach really showed was that value-added scores are moderately biased, and that they don’t really tell us one way or another about a teacher’s long-term outcomes” (see more about this debate here).

Cohen: “Could VAM be improved?”

Rothstein: “It may be that there is a way to use VAM to make a better system than we have now, but we haven’t yet figured out how to do that. Our first attempts have been trying to use them in not very intelligent ways.”

Cohen: “It’s been two years since the Vergara trial. Do you think anything’s changed?”

Rothstein: “I guess in general there’s been a little bit of a political walk-back from the push for VAM. And this retreat is not necessarily tied to the research evidence; sometimes these things just happen. But I’m not sure the trial court opinion would have come out the same if it were held today.”

Again, see more from this interview, also about teacher evaluation systems in general, job protections, and the like in the full article here.

Citation: Cohen, R. M. (2016, August 4). Q&A: The economic consequences of eenying teachers tenure. The American Prospect. Retrieved from http://prospect.org/article/qa-economic-consequences-denying-teachers-tenure

“Virginia SGP” Wins in Court Against State

Virginia SGP, also known as Brian Davison — a parent of two public school students in the affluent Loudoun, Virginia area (hereafter referred to as Virginia SGP) — has been an avid (and sometimes abrasive) commentator about value-added models (VAMs), defined generically, on this blog (see, for example, here, here, and here), on Diane Ravitch’s blog (see, for example, here, here, and here), and elsewhere (e.g., Virginia SGP’s Facebook page here). He is an advocate and promoter of the use of VAMs (which are in this particular case Student Growth Percentiles (SGPs); see differences between VAMs and SGPs here and here) to evaluate teachers, and he is an advocate and promoter of the release of teachers’ SGP scores to parents and the general public for their consumption and use.

Related, and as described in a Washington Post article published in March of 2016, Virginia SGP “…Pushed [Virginia] into Debate of Teacher Privacy vs. Transparency for Parents” as per teachers’ SPG data. This occurred via a lawsuit Virginia SGP filed against the state, attempting to force the release of teachers’ SGP data for all teachers across the state. More specifically, and akin to what happened in 2010 when the Los Angeles Times published the names and VAM-based ratings of thousands of teachers teaching in the Los Angeles Unified School District (LAUSD), Virginia SGP “pressed for the data’s release because he thinks parents have a right to know how their children’s teachers are performing, information about public employees that exists but has so far been hidden. He also wants to expose what he says is Virginia’s broken promise to begin using the data to evaluate how effective the state’s teachers are.” He thinks that “teacher data should be out there,” especially if taxpayers are paying for it.

In January, a Richmond, Virginia judge ruled in Virginia SGP’s favor, despite the state’s claims that Virginia school districts, despite the state’s investments, had reportedly not been using the SGP data, “calling them flawed and unreliable measures of a teacher’s effectiveness.” And even though this ruling was challenged by state officials and the Virginia Education Association thereafter, Virginia SGP posted via his Facebook page the millions of student records the state released in compliance with the court, with teacher names and other information redacted.

This past Tuesday, however, and despite the challenges to the court’s initial ruling, came another win for Virginia SGP, as well as another loss for the state of Virginia. See the article “Judge Sides with Loudoun Parent Seeking Teachers’ Names, Student Test Scores,” published yesterday in a local Loudon, Virginia news outlet.

The author of this article, Danielle Nadler, explains more specifically that, “A Richmond Circuit Court judge has ruled that [the] VDOE [Virginia Department of Education] must release Loudoun County Public Schools’ Student Growth Percentile [SGP] scores by school and by teacher…[including] teacher identifying information.” The judge noted that “that VDOE and the Loudoun school system failed to ‘meet the burden of proof to establish an exemption’ under Virginia’s Freedom of Information Act [FOIA].” The court also ordered VDOE to pay Davison $35,000 to cover his attorney fees and other costs. This final order was dated April 12, 2016.

“Davison said he plans to publish the information on his ‘Virginia SGP’ Facebook page. Students will not be identified, but some of the teachers will. ‘I may mask the names of the worst performers when posting rankings/lists but other members of the public can analyze the data themselves to discover who those teachers are,” Virginia SGP said.

I’ve exchanged messages with Virginia SGP prior to this ruling and since, and since I’ve explicitly invited him to also comment via this blog. While with this objective and subsequent ruling I disagree, although I do believe in transparency, it is nonetheless newsworthy in the realm of VAMs and for followers/readers of this blog. Comment now and/or do stay tuned for more.

Kane Is At It, Again: “Statistically Significant” Claims Exaggerated to Influence Policy

In a recent post, I critiqued a fellow academic and value-added model (VAM) supporter — Thomas Kane, an economics professor from Harvard University who also directed the $45 million worth of Measures of Effective Teaching (MET) studies for the Bill & Melinda Gates Foundation. Kane has been the source of multiple posts on this blog (see also here, here, and here) as he is a very public figure, very often backing, albeit often not in non-peer-reviewed technical reports and documents, series of exaggerated, and “research-based” claims. In this prior post, I more specifically critiqued the overstated claims he made in a recent National Public Radio (NPR) interview titled: “There Is No FDA For Education. Maybe There Should Be.”

Well, a colleague recently emailed me another such document authored by Kane (and co-written with four colleagues), titled: “Teaching Higher: Educators’ Perspectives on Common Core Implementation.” While this one is quite methodologically sound (i.e., as assessed via a thorough read of the main text of the document, including all footnotes and appendices), it is Kane’s set of claims, again, that are of concern, especially knowing that this report, even though it too has not yet been externally vetted or reviewed, will likely have a policy impact. The main goal of this report is clearly (although not made explicit) to endorse, promote, and in many ways save the Common Core State Standards (CCSS). I emphasize the word save in that clearly, and especially since the passage of the Every Student Succeeds Act (ESSA), many states have rejected the still highly controversial Common Core. I also should note that researchers in this study clearly conducted this study with similar a priori conclusions in mind (i.e., that the Common Core should be saved/promoted); hence, future peer review of this piece may be out of the question as the bias evident in the sets of findings would certainly be a “methodological issue,” again, likely preventing a peer-reviewed publication (see, for example, the a priori conclusion that “[this] study highlights an important advantage of having a common set of standards and assessments across multiple states,” in the abstract (p. 3).

First I will comment on the findings regarding the Common Core, as related to value-added models (VAMs). Next, I will comment on Section III of the report, about “Which [Common Core] Implementation Strategies Helped Students Succeed?” (p. 17). This is where Kane and colleagues “link[ed] teachers’ survey responses [about the Common Core] to their students’ test scores on the 2014–2015 PARCC [Partnership for Assessment of Readiness for College and Careers] and SBAC [Smarter Balanced Assessment Consortium] assessments [both of which are aligned to the Common Core Standards]… This allowed [Kane et al.] to investigate which strategies and which of the [Common Core-related] supports [teachers] received were associated with their performance on PARCC and SBAC,” controlling for a variety of factors including teachers’ prior value-added (p. 17).

With regards to the Common Core sections, Kane et al. lay claims like: “Despite the additional work, teachers and principals in the five states [that have adopted the Common Core = Delaware, Maryland, Massachusetts, New Mexico, and Nevada] have largely embraced [emphasis added] the new standards” (p. 3). They mention nowhere, however, the mediating set of influences interfering with such a claim, that likely lead to this claim entirely or at least in part – that many teachers across the nation have been forced, by prior federal and current state mandates (e.g., in New Mexico), to “embrace the new standards.” Rather, Kane et al. imply throughout the document that this “embracement” is a sure sign that teachers and principals are literally taking the Common Core into and within their open arms. The same interference is at play with their similar claim that “Teachers in the five study states have made major changes [emphasis in the original] in their lesson plans and instructional materials to meet the CCSS” (p. 3). Compliance is certainly a intervening factor, again, likely contaminating and distorting the validity of both of these claims (which are two of the four total claims highlighted throughout the document’s (p. 3)).

Elsewhere, Kane et al. claim that “The new standards and assessments represent a significant challenge for teachers and students” (p. 6), along with an accompanying figure they use to illustrate how proficiency (i.e., the percent of students labeled as proficient) on these five states’ prior tests has decreased, indicating more rigor or a more “significant challenge for teachers and students” thanks to the Common Core. What they completely ignore again, however, is that the cut scores used to define “proficiency” are arbitrary per state, as was their approach to define “proficiency” across states in comparison (see footnote four). What we also know from years of research on such tests is that whenever a state introduces a “new and improved” test (e.g., the PARCC and SBAC tests), which is typically tied to “new and improved standards” (e.g., the Common Core), lower “proficiency” rates are observed. This has happened countless times across states, and certainly prior to the introduction of the PARCC and SBAC tests. Thereafter, the state typically responds with the same types of claims, that “The new standards and assessments represent a significant challenge for teachers and students.” These claims are meant to signal to the public that at last “we” are holding our teachers and students accountable for their teaching and learning, but thereafter, again, proficiency cut scores are arbitrarily redefined (among other things), and then five or ten years later “new and improved” tests and standards are needed again. In other words, this claim is nothing new and it should not be interpreted as such, but it should rather be interpreted as aligned with Einstein’s definition of insanity (i.e., repeating the same behaviors over and over again in the hopes that different results will ultimately materialize) as this is precisely what we as a nation have been doing since the minimum competency era in the early 1980s.

Otherwise, Kane et al.’s other two claims were related to “Which [Common Core] Implementation Strategies Helped Students Succeed” (p. 17), as mentioned. They assert first that “In mathematics, [they] identified three markers of successful implementation: more professional development days, more classroom observations with explicit feedback tied to the Common Core, and the inclusion of Common Core-aligned student outcomes in teacher evaluations. All were associated with statistically significantly [emphasis added] higher student performance on the PARCC and [SBAC] assessments in mathematics” (p. 3, see also p. 20). They assert second that “In English language arts, [they] did not find evidence for or against any particular implementation strategies” (p. 3, see also p. 20).

What is highly problematic about these claims is that the three correlated implementation strategies noted, again as significantly associated with teachers’ students’ test-based performance on the PARCC and SBAC mathematics assessments, were “statistically significant” (determined by standard p or “probability” values under which findings that may have happened due to chance are numerically specified). But, they were not really practically significant, at all. There IS a difference whereby “statistically significant” findings may not be “practically significant,” or in this case “policy relevant,” at all. While many misinterpret “statistical significance” as an indicator of strength or importance, it is not. Practical significance is.

As per the American Statistical Association’s (ASA) recently released “Statement on P-Values,” statistical significance “is not equivalent to scientific, human, or economic significance…Any effect, no matter how tiny, can produce a small p-value [i.e., “statistical significance”] if the sample size or measurement precision is high enough” (p. 10); hence, one must always check for practical significance when making claims about statistical significance, like Kane et al. actually do here, but do here in a similar inflated vein.

As their Table 6 shows (p. 20), the regression coefficients related to these three areas of “statistically significant” influence on teachers’ students’ test-based performance on the new PARCC and SBAC mathematics tests (i.e., more professional development days, more classroom observations with explicit feedback tied to the Common Core, and the inclusion of Common Core-aligned student outcomes in teacher evaluations) yielded the following coefficients, respectively: 0.045 (p < 0.01), 0.044 (p < 0.05), and 0.054 (p < 0.01). They then use as an example the 0.044 (p < 0.05) coefficient (as related to more classroom observations with explicit feedback tied to the Common Core) and explain that “a difference of one standard deviation in the observation and feedback index was associated with an increase of 0.044 standard deviations in students’ mathematics test scores—roughly the equivalent of 1.4 scale score points on the PARCC assessment and 4.1 scale score points on the SBAC.”

In order to generate sizable and policy relevant improvement in test scores, (e.g., by half of a standard deviation), the observation and feedback index should jump up by 11 standard deviations! In addition, given that scale score points do not equal raw or actual test items (e.g., scale score-to-actual test item relationships are typically in the neighborhood of 4 or 5 scale scores points to 1 actual test item), this likely also means that Kane’s interpretations (i.e., mathematics scores were roughly the equivalent of 1.4 scale score points on the PARCC and 4.1 scale score points on the SBAC) actually mean 1/4th or 1/5th of a test item in mathematics on the PARCC and 4/5th of or 1 test item on the SBAC. This hardly “Provides New Evidence on Strategies Related to Improved Student Performance,” unless you define improved student performance as something as little as 1/5th of a test item.

This is also not what Kane et al. claim to be “a moderately sizeable effect!” (p. 21). These numbers should not even be reported, much less emphasized as policy relevant/significant, unless perhaps equivalent to at least 0.25 standard deviations on the test (as a quasi-standard/accepted minimum). Likewise, the same argument can be made about the other three coefficients derived via these mathematics tests. See also similar claims that they asserted (e.g., that “students perform[ed] better when teachers [were] being evaluated based on student achievement” (p. 21).

Because the abstract (and possibly conclusions) section are the main sections of this paper that are likely to have the most educational/policy impact, especially when people do not read all of the text, footnotes, and abstract contents of this entire document, this is irresponsible, and in many ways contemptible. This is also precisely the reason why, again, Kane’s calls for a Federal Drug Administration (FDA) type of entity for education are also so ironic (as explained in my prior post here).

Kane’s (Ironic) Calls for an Education Equivalent to the Federal Drug Administration (FDA)

Thomas Kane, an economics professor from Harvard University who also directed the $45 million worth of Measures of Effective Teaching (MET) studies for the Bill & Melinda Gates Foundation, has been the source of multiple posts on this blog (see, for example, here, here, and here). He is consistently backing, with his Harvard affiliation in tow, VAMs, as per a series of exaggerated, but as he argues, research-based claims.

However, much of the work that Kane cites, he himself has written on the topic (sometimes with coauthors). Likewise, the strong majority of these works have not been published in peer reviewed journals, but rather as technical reports or book chapters, often in his own books (see for example Kane’s curriculum vitae (CV) here). This includes the technical reports derived via his aforementioned MET studies, now in technical report and book form, but now completed three years ago in 2013, and still not externally vetted and published in any said journal. Lack of publication of the MET studies might be due to some of the methodological issues within these particular studies, however (see published and unpublished, (in)direct criticisms of these studies here, here, and here). Although Kane does also cite some published studies authored by others, again, in support of VAMs, the studies Kane cites are primarily/only authored by econometricians (e.g., Chetty, Friedman, and Rockoff) and, accordingly, largely unrepresentative of the larger literature surrounding VAMs.

With that being said, and while I do try my best to stay aboveboard as an academic who takes my scholarly and related blogging activities seriously, sometimes it is hard to, let’s say, not “go there” when more than deserved. Now is one of those times for what I believe is a fair assessment of one example of Kane’s unpublished and externally un-vetted works.

Last week on National Public Radio (NPR), Kane was interviewed by Eric Westervelt in a series titled “There Is No FDA For Education. Maybe There Should Be.” Ironically, in 2009 I made this claim in an article that I authored and that was published in Education Leadership. I began the piece noting that “The value-added assessment model is one over-the-counter product that may be detrimental to your health.” I ended the article noting that “We need to take our education health as seriously as we take our physical health…[hence, perhaps] the FDA approach [might] also serve as a model to protect the intellectual health of the United States. [It] might [also] be a model that legislators and education leaders follow when they pass legislation or policies whose benefits and risks are unknown” (Amrein-Beardsley, 2009).

Never did I foresee, however, how much my 2009 calls for such an approach similar to Kane’s recent calls during this interview would ironically apply, now, and precisely because of academics like Kane. In other words, ironic is that Kane is now calling for “rigorous vetting” of educational research, as he claims is being done with medical research, but those rules apparently do not apply to his own work, and the scholarly works he perpetually cites in favor of VAMs.

Take also, for example, some of the more specific claims Kane expressed in this particular NPR interview.

Kane claims that “The point of education research is to identify effective interventions for closing the achievement gap.” Although elsewhere in the same interview Kane claims that “American education research [has also] mostly languished in an echo chamber for much of the last half century.” While NPR’s Westervelt criticizes Kane for making a “pretty scathing and strong indictment” of America’s education system, what Kane does not understand writ large is that the very solutions for which Kane advocates – using VAM-based measurements to fire and hire bad and good teachers, respectively – are really no different than the “stronger accountability” measures upon which we have relied for the last 40 years (since the minimum competency testing era) within this alleged “echo chamber.” Kane himself, then, IS one of the key perpetrators and perpetuators of the very “echo chamber” of which he speaks, and also scorns and indicts. Kane simply does not realize (or acknowledge) that this “echo chamber” continues in its persistence because of those who continue to preserve a stronger accountability for educational reform logic, and who continue to reinvent “new and improved” accountability measures and policies to support this logic, accordingly.

Kane also claims that he does not “point fingers at the school officials out there” for not being able to reform our schools for the better, and in this particular case, for closing the achievement gap. But unfortunately, Kane does. Recall the article Kane authored and that the New York Daily News published, titled “Teachers Must Look in the Mirror,” in which Kane insults New York’s administrators, writing that the fact that 96% of teachers in New York were given the two highest ratings last year – “effective” or “highly effective” – was “a sure sign that principals have not been honest to date.” Enough said.

The only thing with which I do agree with Kane as per this NPR interview is that we as an academic community can certainly do a better job of translating research into action, although I would add that only externally-reviewed published research is that which is worthy of actually turning into action, in practice. Hence, my repeated critiques of research studies on this blog, that are sometimes not even internally vetted or reviewed before being released to the public, and then the media, and then the masses as “truth.” Recall, for example, when the Chetty et al. studies (mentioned prior as also often cited by Kane) were cited in President Obama’s 2012 State of the Union address, regardless of the fact that the Chetty et al. studies had not even been internally reviewed by their sponsoring agency – the National Bureau of Education Research (NBER) – prior to their public release? This is also regardless of the fact that the federal government’s “What Works Clearinghouse”- also positioned in this NPR interview by Kane as the closest entity we have so far to an educational FDA – noted grave concerns about this study at the same time (as have others since; see, for example, here, here, here, and here).

Now this IS bad practice, although on this, Kane is also guilty.

To put it lightly, and in sum, I’m confused by Kane’s calls for such an external monitoring/quality entity akin to the FDA. While I could not agree more with his statement of need, as I also argued in my 2009 piece mentioned prior, it is because of educational academics like Kane that I have to continue to make such claims and calls for such monitoring and regulation.

Reference: Amrein-Beardsley, A. (2009). Buyers be-aware: What you don’t know can hurt you. Educational Leadership, 67(3), 38-42.

Is Alabama the New, New Mexico?

In Alabama, the Grand Old Party (GOP) has put forth a draft bill to be entitled as an act and ultimately called the Rewarding Advancement in Instruction and Student Excellence (RAISE) Act of 2016. The purpose of the act will be to…wait for it…use test scores to grade and pay teachers annual bonuses (i.e., “supplements”) as per their performance. More specifically, the bill is to “provide a procedure for observing and evaluating teachers” to help make “significant differentiation[s] in pay, retention, promotion, dismissals, and other staffing decisions, including transfers, placements, and preferences in the event of reductions in force, [as] primarily [based] on evaluation results.” Related, Alabama districts may no longer use teachers’ “seniority, degrees, or credentials as a basis for determining pay or making the retention, promotion, dismissal, and staffing decisions.” Genius!

Accordingly, Larry Lee whose blog is based on the foundation that “education is everyone’s business,” sent me this bill to review, and critique, and help make everyone’s business. I attach it here for others who are interested, but I also summarize and critique it’s most relevant (but also contemptible) issues below.

For the Alabama teachers who are eligible, they are (after a staggered period of time) to be primarily evaluated (i.e., for up to 45% of a teacher’s total evaluation score) on the extent to which they purportedly cause student growth in achievement, with student growth being defined as the teachers’ purported impacts on “[t]he change in achievement for an individual student between two or more points in time.” Teachers are also to be observed at least twice per year (i.e., for up to 45% of a teacher’s total evaluation score), by their appropriate and appropriately trained evaluators/supervisors, and an unnamed and undefined set of parent and student surveys are to be used to evaluate the teachers (i.e., up to 15% of a teacher’s total evaluation score).

Again, no real surprises here as the adoption of such measures is common among states like Alabama (and New Mexico), but when these components are explained in more detail is where things really go awry.

“For grade levels and subjects for which student standardized assessment data is not available and for teachers for whom student standardized assessment data is not available, the [state’s] department [of education] shall establish a list of preapproved options for governing boards to utilize to measure student growth.” This is precisely what has gotten the whole state of New Mexico wrapped up in, and currently losing their ongoing lawsuit (see my most recent post on this here). While providing districts with menus of preapproved assessment options might make sense to policymakers, any self respecting researcher or even assessment commoner should know why this is entirely inappropriate. To read more about this, the best research study explaining why doing just this will set any state up for lawsuits comes from Brown University’s John Papay in his highly esteemed and highly cited “Different tests, different answers: The stability of teacher value-added estimates across outcome measures” article. The title of this research article alone should explain enough why simply positioning and offering up such tests in such casual (and quite careless) ways makes way for legal recourse.

Otherwise, the only test mentioned that is also to be used to measure teachers’ purported impacts on student growth is the ACT Aspire – the ACT test corporation’s “college and career readiness” test that is aligned to and connected with their more familiar college-entrance ACT. This, too, was one of the sources of the aforementioned lawsuit in New Mexico in terms of what we call content validity, in that states cannot simply pull in tests that are not adequately aligned with a state’s curriculum (e.g., I could find no information about the alignment of the ACT Aspire to Alabama’s curriculum here, which is also highly problematic as this information should definitely be available) and that have not been validated for such purposes (i.e., to measure teachers’ impacts on student growth).

Regardless of the tests, however, all of the secondary measures to be used to evaluate Alabama teachers (e.g., student and parent survey scores, observational scores) are also to be “correlated with impacts on student achievement results.” We’ve also increasingly seen this becoming the case across the nation, whereas state/district leaders are not simply assessing whether these indicators are independently correlated, which they should be if they all, in fact, help to measure our construct of interest = teacher effectiveness, but state/district leaders are rather manufacturing and forcing these correlations via what I have termed “artificial conflation” strategies (see also a recent post here about how this is one of the fundamental and critical points of litigation in Houston).

The state is apparently also set on going “all in” on evaluating their principals in many of the same ways, although I did not critique those sections for this particular post.

Most importantly, though, for those of you who have access to such leaders in Alabama, do send them this post so they might be a bit more proactive, and appropriately more careful and cautious, before going down this poor educational policy path. While I do embrace my professional responsibility as a public scholar to be called to court to testify about all of this when such high-stakes consequences are ultimately, yet inappropriately based upon invalid inferences, I’d much rather be proactive in this regard and save states and states’ taxpayers their time and money, respectively.

Accordingly, I see the state is also to put out a request for proposals to retain an external contractor to help them measure said student growth and teachers’ purported impacts on it. I would also be more than happy to help the state negotiate this contract, much more wisely than so many other states and districts have negotiated similar contracts thus far (e.g., without asking for reliability and validity evidence as a contractual deliverable)…should this poor educational policy actually come to fruition.