Oklahoma Eliminates VAM, and Simultaneously Increases Focus on Professional Development

Approximately two weeks ago, House leaders in the state of Oklahoma unanimously passed House Bill 2957, in which the state’s prior requirement to use value-added model (VAM) based estimates for teacher evaluation and accountability purposes, as written into the state’s prior Teacher and Leader Effectiveness (TLE) evaluation system, was eliminated. The new bill has been sent to Oklahoma’s Governor Fallin for her final signature.

As per the State’s Superintendent of Public Instruction, Joy Hofmeister: “Amid this difficult budget year when public education has faced a variety of challenges, House Bill 2957 is a true bright spot of this year’s legislative session…By giving districts the option of removing the quantitative portion of teacher evaluations, we not only increase local control but lift outcomes by supporting our teachers while strengthening their professional development and growth in the classroom.”

As per the press release issued by one of the bill’s sponsors, State Representative Michael Rogers, the bill is to “retain the qualitative measurements, which evaluate teachers based on classroom instruction and learning environment. The measure also creates a professional development component to be used as another qualitative tool in the evaluation process. The Department of Education will create the professional development component to be introduced during the 2018-2019 school year. “Local school boards are in the best position to evaluate what tools their districts should be using to evaluate teachers and administrators,” he said. “This bill returns that to our local schools and removes the ‘one-size-fits-all’ approach dictated by government bureaucrats. This puts the focus back to the education of our students where it belongs.” School districts will still have the option of continuing to use VAMs or other numerically-based student growth measures when evaluating teachers, however, if they choose to do so, and agree to also pay for the related expenses.

Oklahoma State Representative Scooter Park said that “HB2957 is a step in the right direction – driven by the support of Superintendents across the state, we can continue to remove the costly and time-consuming portions of the TLE system such as unnecessary data collection requirements as well as open the door for local school districts to develop their own qualitative evaluation system for their teachers according to their choice of a valid, reliable, research based and evidence-based qualitative measure.”

Oklahoma State Senator John Ford, added that this bill was proposed, and this decision was made, “After gathering input from a variety of stakeholders through a lengthy and thoughtful review process.”

I am happy to say that I was a contributor during this review process, presenting twice to legislators, educators, and others at the Oklahoma State Capitol this past fall. See one picture of these presentations here.

OK_Picture

See more here, and a related post on Diane Ravitch’s blog here. See here more information about the actual House Bill 2957. See also a post about Hawaii recently passing similar legislation in the blog, “Curmudgucation,” here. See another post about other states moving in similar directions here.

Special Issue of “Educational Researcher” (Paper #9 of 9): Amidst the “Blooming Buzzing Confusion”

Recall that the peer-reviewed journal Educational Researcher (ER) – published a “Special Issue” including nine articles examining value-added measures (VAMs). I have reviewed the last of nine articles (#9 of 9), which is actually a commentary titled “Value Added: A Case Study in the Mismatch Between Education Research and Policy.” This commentary is authored by Stephen Raudenbush – Professor of Sociology and Public Policy Studies at the University of Chicago.

Like with the last two commentaries reviewed here and here, Raudenbush writes of the “Special Issue” that, in this topical area, “[r]esearchers want their work to be used, so we flirt with the idea that value-added research tells us how to improve schooling…[Luckily, perhaps] this volume has some potential to subdue this flirtation” (p. 138).

Raudenbush positions the research covered in this “Special Issue,” as well as the research on teacher evaluation and education in general, as being conducted amidst the “blooming buzzing confusion” (p. 138) surrounding the messy world through which we negotiate life. This is why “specific studies don’t tell us what to do, even if they sometimes have large potential for informing expert judgment” (p. 138).

With that being said, “[t]he hard question is how to integrate the new research on teachers with other important strands of research [e.g., effective schools research] in order to inform rather than distort practical judgment” (p. 138). Echoing Susan Moore Johnson’s sentiments, reviewed as article #6 here, this is appropriately hard if we are to augment versus undermine “our capacity to mobilize the “social capital” of the school to strengthen the human capital of the teacher” (p. 138).

On this note, and “[i]n sum, recent research on value added tells us that, by using data from student perceptions, classroom observations, and test score growth, we can obtain credible evidence [albeit weakly related evidence, referring to the Bill & Melinda Gates Foundation’s MET studies] of the relative effectiveness of a set of teachers who teach similar kids [emphasis added] under similar conditions [emphasis added]…[Although] if a district administrator uses data like that collected in MET, we can anticipate that an attempt to classify teachers for personnel decisions will be characterized by intolerably high error rates [emphasis added]. And because districts can collect very limited information, a reliance on district-level data collection systems will [also] likely generate…distorted behavior[s]..in which teachers attempt to “game” the
comparatively simple indicators,” or system (p. 138-139).

Accordingly, “[a]n effective school will likely be characterized by effective ‘distributed’ leadership, meaning that expert teachers share responsibility for classroom observation, feedback, and frequent formative assessments of student learning. Intensive professional development combined with classroom follow-up generates evidence about teacher learning and teacher improvement. Such local data collection efforts [also] have some potential to gain credibility among teachers, a virtue that seems too often absent” (p. 140).

This, might be at least a significant part of the solution.

“If the school is potentially rich in information about teacher effectiveness and teacher improvement, it seems to follow that key personnel decisions should be located firmly at the school level..This sense of collective efficacy [accordingly] seems to be a key feature of…highly effective schools” (p. 140).

*****

If interested, see the Review of Article #1 – the introduction to the special issue here; see the Review of Article #2 – on VAMs’ measurement errors, issues with retroactive revisions, and (more) problems with using standardized tests in VAMs here; see the Review of Article #3 – on VAMs’ potentials here; see the Review of Article #4 – on observational systems’ potentials here; see the Review of Article #5 – on teachers’ perceptions of observations and student growth here; see the Review of Article (Essay) #6 – on VAMs as tools for “egg-crate” schools here; see the Review of Article (Commentary) #7 – on VAMs situated in their appropriate ecologies here; and see the Review of Article #8, Part I – on a more research-based assessment of VAMs’ potentials here and Part II on “a modest solution” provided to us by Linda Darling-Hammond here.

Article #9 Reference: Raudenbush, S. W. (2015). Value added: A case study in the mismatch between education research and policy. Educational Researcher, 44(2), 138-141. doi:10.3102/0013189X15575345

 

 

 

Special Issue of “Educational Researcher” (Paper #8 of 9, Part II): A Modest Solution Offered by Linda Darling-Hammond

One of my prior posts was about the peer-reviewed journal Educational Researcher (ER)’sSpecial Issue” on VAMs and the commentary titled “Can Value-Added Add Value to Teacher Evaluation?” contributed to the “Special Issue” by Linda Darling-Hammond – Professor of Education, Emeritus, at Stanford University.

In this post, I noted that Darling-Hammond “added” a lot of “value” in one particular section of her commentary, in which she offerec a very sound set of solutions, using VAMs for teacher evaluations or not. Given it’s rare in this area of research to focus on actual solutions, and this section is a must read, I paste this small section here for you all to read (and bookmark, especially if you are currently grappling with how to develop good evaluation systems that must meet external mandates, requiring VAMs).

Here is Darling-Hammond’s “Modest Proposal” (p. 135-136):

What if, instead of insisting on the high-stakes use of a single approach to VAM as a significant percentage of teachers’ ratings, policymakers were to acknowledge the limitations that have been identified and allow educators to develop more thoughtful
approaches to examining student learning in teacher evaluation? This might include sharing with practitioners honest information about imprecision and instability of the measures they receive, with instructions to use them cautiously, along with other evidence that can help paint a more complete picture of how students are learning in a teacher’s classroom. An appropriate warning might alert educators to the fact that VAM ratings
based on state tests are more likely to be informative for students already at grade level, and least likely to display the gains of students who are above or below grade level in their knowledge and skills. For these students, other measures will be needed.

What if teachers could create a collection of evidence about their students’ learning that is appropriate for the curriculum and students being taught and targeted to goals the teacher is pursuing for improvement? In a given year, one teacher’s evidence set might include gains on the vertically scaled Developmental Reading Assessment she administers to students, plus gains on the English language proficiency test for new English learners,
and rubric scores on the beginning and end of the year essays her grade level team assigns and collectively scores.

Another teacher’s evidence set might include the results of the AP test in Calculus with a pretest on key concepts in the course, plus pre- and posttests on a unit regarding the theory of limits which he aimed to improve this year, plus evidence from students’ mathematics projects using trigonometry to estimate the distance of a major landmark from their home. VAM ratings from a state test might be included when appropriate, but they would not stand alone as though they offered incontrovertible evidence about teacher effectiveness.

Evaluation ratings would combine the evidence from multiple sources in a judgment model, as Massachusetts’ plan does, using a matrix to combine and evaluate several pieces of student learning data, and then integrate that rating with those from observations and professional contributions. Teachers receive low or high ratings when multiple indicators point in the same direction. Rather than merely tallying up disparate percentages and urging administrators to align their observations with inscrutable VAM scores, this approach would identify teachers who warrant intervention while enabling pedagogical discussions among teachers and evaluators based on evidence that connects what teachers do with how their students learn. A number of studies suggest that teachers become more effective as they receive feedback from standards-based observations and as they develop ways to evaluate their students’ learning in relation to their practice (Darling-Hammond, 2013).

If the objective is not just to rank teachers and slice off those at the bottom, irrespective of accuracy, but instead to support improvement while providing evidence needed for action, this modest proposal suggests we might make more headway by allowing educators to design systems that truly add value to their knowledge of how students are learning in relation to how teachers are teaching.

*****

If interested, see the Review of Article #1 – the introduction to the special issue here; see the Review of Article #2 – on VAMs’ measurement errors, issues with retroactive revisions, and (more) problems with using standardized tests in VAMs here; see the Review of Article #3 – on VAMs’ potentials here; see the Review of Article #4 – on observational systems’ potentials here; see the Review of Article #5 – on teachers’ perceptions of observations and student growth here; see the Review of Article (Essay) #6 – on VAMs as tools for “egg-crate” schools here; and see the Review of Article (Commentary) #7 – on VAMs situated in their appropriate ecologies here; and see the Review of Article #8, Part I – on a more research-based assessment of VAMs’ potentials here.

Article #8, Part II Reference: Darling-Hammond, L. (2015). Can value-added add value to teacher evaluation? Educational Researcher, 44(2), 132-137. doi:10.3102/0013189X15575346

Pennsylvania Governor Rejects “Teacher Performance” v. Teacher Seniority Bill

Yesterday, the Governor of Pennsylvania vetoed the “Protecting Excellent Teachers Act” bill that would lessen the role of seniority for teachers throughout the state. Simultaneously, the bill would increase the role of “observable” teacher effects, via teachers’ “performance ratings” as determined at least in part via the use of value-added model (VAM) estimates (i.e., using the popular Education Value-Added Assessment System (EVAAS)). These “performance ratings” at issue are to be used for increased consequential purposes (e.g., teacher terminations/layoffs, even if solely for economic reasons).

Governor Wolff is reported as saying that “the state should spend its time investing in improving teachers and performance standards, not paving the way for layoffs. In his veto message, he noted that the evaluation system was designed to identify a teacher’s weaknesses and then provide the opportunity to improve.” He is quoted as adding, “Teachers who do not improve after being given the opportunity and tools to do so are the ones who should no longer be in the classroom…This [emphasis added] is the system we should be using to remove ineffective teachers.”

The bill, passed by both the House and Senate, and supported by the state School Boards Association among others, is apparently bound to resurface, however. Also because Republicans are charging the Governor with “resisting reform at the same time he wants more funding for education.” Increased funding is not going to happen without increased accountability, apparently, and according to Republican leaders.
Read more here, as per the article originally printed in The Philadelphia Inquirer.

Special Issue of “Educational Researcher” (Paper #8 of 9, Part I): A More Research-Based Assessment of VAMs’ Potentials

Recall that the peer-reviewed journal Educational Researcher (ER) – published a “Special Issue” including nine articles examining value-added measures (VAMs). I have reviewed the next of nine articles (#8 of 9), which is actually a commentary titled “Can Value-Added Add Value to Teacher Evaluation?” This commentary is authored by Linda Darling-Hammond – Professor of Education, Emeritus, at Stanford University.

Like with the last commentary reviewed here, Darling-Hammond reviews some of the key points taken from the five feature articles in the aforementioned “Special Issue.” More specifically, though, Darling-Hammond “reflect[s] on [these five] articles’ findings in light of other work in this field, and [she] offer[s her own] thoughts about whether and how VAMs may add value to teacher evaluation” (p. 132).

She starts her commentary with VAMs “in theory,” in that VAMs COULD accurately identify teachers’ contributions to student learning and achievement IF (and this is a big IF) the following three conditions were met: (1) “student learning is well-measured by tests that reflect valuable learning and the actual achievement of individual students along a vertical scale representing the full range of possible achievement measures in equal interval units” (2) “students are randomly assigned to teachers within and across schools—or, conceptualized another way, the learning conditions and traits of the group of students assigned to one teacher do not vary substantially from those assigned to another;” and (3) “individual teachers are the only contributors to students’ learning over the period of time used for measuring gains” (p. 132).

None of things are actual true (or near to true, nor will they likely ever be true) in educational practice, however. Hence, the errors we continue to observe that continue to prevent VAM use for their intended utilities, even with the sophisticated statistics meant to mitigate errors and account for the above-mentioned, let’s call them, “less than ideal” conditions.

Other pervasive and perpetual issues surrounding VAMs as highlighted by Darling-Hammond, per each of the three categories above, pertain to (1) the tests used to measure value-added is that the tests are very narrow, focus on lower level skills, and are manipulable. These tests in their current form cannot effectively measure the learning gains of a large share of students who are above or below grade level given a lack of sufficient coverage and stretch. As per Haertel (2013, as cited in Darling-Hammond’s commentary), this “translates into bias against those teachers working with the lowest-performing or the highest-performing classes’…and “those who teach in tracked school settings.” It is also important to note here that the new tests created by the Partnership for Assessing Readiness for College and Careers (PARCC) and Smarter Balanced, multistate consortia “will not remedy this problem…Even though they will report students’ scores on a vertical scale, they will not be able to measure accurately the achievement or learning of students who started out below or above grade level” (p.133).

With respect to (2) above, on the equivalence (or rather non-equivalence) of groups of student across teachers classrooms who are the ones whose VAM scores are relativistically compared, the main issue here is that “the U.S. education system is the one of most segregated and unequal in the industrialized world…[likewise]…[t]he country’s extraordinarily high rates of childhood poverty, homelessness, and food insecurity are not randomly distributed across communities…[Add] the extensive practice of tracking to the mix, and it is clear that the assumption of equivalence among classrooms is far from reality” (p. 133). Whether sophisticated statistics can control for all of this variation is one of most debated issues surrounding VAMs and their levels of outcome bias, accordingly.

And as per (3) above, “we know from decades of educational research that many things matter for student achievement aside from the individual teacher a student has at a moment in time for a given subject area. A partial list includes the following [that are also supposed to be statistically controlled for in most VAMs, but are also clearly not controlled for effectively enough, if even possible]: (a) school factors such as class sizes, curriculum choices, instructional time, availability of specialists, tutors, books, computers, science labs, and other resources; (b) prior teachers and schooling, as well as other current teachers—and the opportunities for professional learning and collaborative planning among them; (c) peer culture and achievement; (d) differential summer learning gains and losses; (e) home factors, such as parents’ ability to help with homework, food and housing security, and physical and mental support or abuse; and (e) individual student needs, health, and attendance” (p. 133).

“Given all of these influences on [student] learning [and achievement], it is not surprising that variation among teachers accounts for only a tiny share of variation in achievement, typically estimated at under 10%” (see, for example, highlights from the American Statistical Association’s (ASA’s) Position Statement on VAMs here). “Suffice it to say [these issues]…pose considerable challenges to deriving accurate estimates of teacher effects…[A]s the ASA suggests, these challenges may have unintended negative effects on overall educational quality” (p. 133). “Most worrisome [for example] are [the] studies suggesting that teachers’ ratings are heavily influenced [i.e., biased] by the students they teach even after statistical models have tried to control for these influences” (p. 135).

Other “considerable challenges” include: VAM output are grossly unstable given the swings and variations observed in teacher classifications across time, and VAM output are “notoriously imprecise” (p. 133) given the other errors observed as caused, for example, by varying class sizes (e.g., Sean Corcoran (2010) documented with New York City data that the “true” effectiveness of a teacher ranked in the 43rd percentile could have had a range of possible scores from the 15th to the 71st percentile, qualifying as “below average,” “average,” or close to “above average). In addition, practitioners including administrators and teachers are skeptical of these systems, and their (appropriate) skepticisms are impacting the extent to which they use and value their value-added data, noting that they value their observational data (and the professional discussions surrounding them) much more. Also important is that another likely unintended effect exists (i.e., citing Susan Moore Johnson’s essay here) when statisticians’ efforts to parse out learning to calculate individual teachers’ value-added causes “teachers to hunker down and focus only on their own students, rather than working collegially to address student needs and solve collective problems” (p. 134). Related, “the technology of VAM ranks teachers against each other relative to the gains they appear to produce for students, [hence] one teacher’s gain is another’s loss, thus creating disincentives for collaborative work” (p. 135). This is what Susan Moore Johnson termed the egg-crate model, or rather the egg-crate effects.

Darling-Hammond’s conclusions are that VAMs have “been prematurely thrust into policy contexts that have made it more the subject of advocacy than of careful analysis that shapes its use. There is [good] reason to be skeptical that the current prescriptions for using VAMs can ever succeed in measuring teaching contributions well (p. 135).

Darling-Hammond also “adds value” in one whole section (highlighted in another post forthcoming here), offering a very sound set of solutions, using VAMs for teacher evaluations or not. Given it’s rare in this area of research we can focus on actual solutions, this section is a must read. If you don’t want to wait for the next post, read Darling-Hammond’s “Modest Proposal” (p. 135-136) within her larger article here.

In the end, Darling-Hammond writes that, “Trying to fix VAMs is rather like pushing on a balloon: The effort to correct one problem often creates another one that pops out somewhere else” (p. 135).

*****

If interested, see the Review of Article #1 – the introduction to the special issue here; see the Review of Article #2 – on VAMs’ measurement errors, issues with retroactive revisions, and (more) problems with using standardized tests in VAMs here; see the Review of Article #3 – on VAMs’ potentials here; see the Review of Article #4 – on observational systems’ potentials here; see the Review of Article #5 – on teachers’ perceptions of observations and student growth here; see the Review of Article (Essay) #6 – on VAMs as tools for “egg-crate” schools here; and see the Review of Article (Commentary) #7 – on VAMs situated in their appropriate ecologies here.

Article #8, Part I Reference: Darling-Hammond, L. (2015). Can value-added add value to teacher evaluation? Educational Researcher, 44(2), 132-137. doi:10.3102/0013189X15575346

“Arbitrary and Capricious:” Sheri Lederman Wins Lawsuit in NY’s State Supreme Court

Recall the New York lawsuit pertaining to Long Island teacher Sheri Lederman? She just won in New York’s State Supreme court, and boy did she win big, also for the cause!

Sheri is a teacher, who by all accounts other than her 2013-2014 “ineffective” growth score of a 1/20, is a terrific 4th grade, 18-year veteran teacher. However, after receiving her “ineffective” growth rating and score, she along with her attorney and husband Bruce Lederman, sued the state of New York to challenge the state’s growth-based teacher evaluation system and Sheri’s individual score. See prior posts about Sheri’s case here, herehere and here.

The more specific goal of her case was to seek a judgment: (1) setting aside or vacating Sheri’s individual growth score and rating her as “ineffective,” and (2) declare that the New York endorsed and implemented growth measures in use was/is “arbitrary and capricious.” The “overall gist” was that Sheri contended that the system unfairly penalized teachers whose students consistently scored well and could not demonstrated growth upwards (e.g., teachers of gifted or other high achieving students). This concern/complaint is common elsewhere.

As per a State Supreme Court ruling, just released today as written by Acting Supreme Court Justice Judge Roger McDonough (May 10, 2016), and at 15 pages in length and available in full here, Sheri won her case. She won it against John King — the then New York State Education Department Commissioner and the now US Secretary of Education (who recently replaced Arne Duncan as US Secretary of Education). The Court concluded that Sheri (her husband, her team of experts, and other witnesses) effectively established that her growth score and rating for 2013-2014 was “arbitrary and capricious,” with “arbitrary and capricious” being defined as actions “taken without sound basis in reason or regard to the facts.”

More specifically, the Court’s conclusion was founded upon: (1) the convincing and detailed evidence of VAM bias against teachers at both ends of the spectrum (e.g. those with high-performing students or those with low-performing students); (2) the disproportionate effect of petitioner’s small class size and relatively large percentage of high-performing students; (3) the functional inability of high-performing students to demonstrate growth akin to lower-performing students; (4) the wholly unexplained swing in petitioner’s growth score from 14 [i.e., her growth score the year prior] to 1, despite the presence of statistically similar scoring students in her respective classes; and, most tellingly, (5) the strict imposition of rating constraints in the form of a “bell curve” that places teachers in four categories via pre-determined percentages regardless of whether the performance of students dramatically rose or dramatically fell from the previous year.”

As per an email I received earlier today from Bruce (i.e., Sheri’s husband/attorney who prosecuted her case), the Court otherwise “declined to make an overall ruling on the [New York growth] rating system in general because of new regulations in effect” [e.g., that the state’s growth model is currently under review]…[Nontheless, t]he decision should qualify as persuasive authority for other teachers challenging growth scores throughout the County [and Country]. [In addition, the] Court carefully recite[d] all our expert affidavits [i.e., from Professors Darling-Hammond, Pallas, Amrein-Beardsley, Sean Corcoran and Jesse Rothstein as well as Drs. Burris and Lindell].” Noted as well were the “absence of any meaningful’ challenge to [Sheri’s] experts’ conclusions, especially about the dramatic swings noticed between her, and potentially others’ scores, and the other ‘litany of expert affidavits submitted on [Sheris’] behalf].”

“It is clear that the evidence all of these amazing experts presented was a key factor in winning this case since the Judge repeatedly said both in Court and in the decision that we have a “high burden” to meet in this case.” [In addition,] [t]he Court wrote that the court “does not lightly enter into a critical analysis of this matter … [and] is constrained on this record, to conclude that [the] petitioner [i.e., Sheri] has met her high burden.”

To Bruce’s/our knowledge, this is the first time a judge has set aside an individual teacher’s VAM rating based upon such a presentation in court.

Thanks to all who helped in this endeavor. Onward!

Louisiana: Another State “Rethinking” Its Teacher Evaluation System

Recall from a prior post that lawmakers, particularly in the southern states, are beginning to reconsider the roles that test scores and value-added measures should play in their states’ teacher evaluation systems. The tides do seem to be turning. See also a related post about lawmakers in Alabama “shelving” their new teacher accountability system here, and a related article about how in Georgia their new teacher evaluation systems is being “overhauled” here.

As per a news article released out of Louisiana just this week, it seems Louisiana is following suit. Apparently, the Louisiana Senate Education Committee is advancing a bill (i.e., Senate Bill 342) to “revise” its teacher evaluation system as well (see here). This “heavily negotiated” bill, backed by Louisiana Governor John Bel Edwards, is to also seriously “tweak” the way teachers are to be evaluated throughout the state. The bill already has big “easy approval,” passing in the Louisiana Senate Education Committee without objection, facing next the full Louisiana Senate.

Under current rules, 50% of a teacher’s evaluation is to based on growth or value-added in student achievement over time. Under the new rules this is to be cut back to 35%, as currently applied to approximately 20,000 of Louisiana’s 50,000 teachers (40% of the state’s teacher population).

While some might like the percentage reduced to an even lower percentage than 35%, this still seems to be at least one step in the right direction. Whether high-stakes consequences will still be attached to such output, along with the observational and other testing data to account for the other 65%, is to be seen. This would certainly be a step in the wrong direction, unless the state can demonstrate system reliability, validity, fairness, and the like (see definitions here), prior to the attachment of such consequences.

In that regard, sometimes it doesn’t matter what (arbitrary) weights are applied to this or that, it is what is done with the output overall that matters the most. This also seems to be increasingly “true” in legal terms.

Virginia SGP’s Side of the Story

In one of my most recent posts I wrote about how Virginia SGP, aka parent Brian Davison, won in court against the state of Virginia, requiring them to release teachers’ Student Growth Percentile (SGP) scores. Virginia SGP is a very vocal promoter of the use of SGPs to evaluate teachers’ value-added (although many do not consider the SGP model to be a value-added model (VAM); see general differences between VAMs and SGPs here). Regardless, he sued the state of Virginia to release teachers’ SGP scores so he could make them available to all via the Internet. He did this, more specifically, so parents and perhaps others throughout the state would be able to access and then potentially use the scores to make choices about who should and should not teach their kids. See other posts about this story here and here.

Those of us who are familiar with Virginia SGP and the research literature writ large know that, unfortunately, there’s much that Virginia SGP does not understand about the now loads of research surrounding VAMs as defined more broadly (see multiple research article links here). Likewise, Virginia SGP, as evidenced below, rides most of his research-based arguments on select sections of a small handful of research studies (e.g., those written by economists Raj Chetty and colleagues, and Thomas Kane as part of Kane’s Measures of Effective Teaching (MET) studies) that do not represent the general research on the topic. He simultaneously ignores/rejects the research studies that empirically challenge his research-based claims (e.g., that there is no bias in VAM-based estimates, and that because Chetty, Friedman, and Rockoff “proved this,” it must be true, despite the research studies that have presented evidence otherwise (see for example here, here, and here).

Nonetheless, given that him winning this case in Virginia is still noteworthy, and followers of this blog should be aware of this particular case, I invited Virginia SGP to write a guest post so that he could tell his side of the story. As we have exchanged emails in the past, which I must add have become less abrasive/inflamed as time has passed, I recommend that readers read and also critically consume what is written below. Let’s hope that we might have some healthy and honest dialogue on this particular topic in the end.

From Virginia SGP:

I’d like to thank Dr. Amrein-Beardsley for giving me this forum.

My school district recently announced its teacher of the year. John Tuck teaches in a school with 70%+ FRL students compared to a district average of ~15% (don’t ask me why we can’t even those #’s out). He graduated from an ordinary school with a degree in liberal arts. He only has a Bachelors and is not a National Board Certified Teacher (NBCT). He is in his ninth year of teaching specializing in math and science for 5th graders. Despite the ordinary background, Tuck gets amazing student growth. He mentors, serves as principal in the summer, and leads the school’s leadership committees. In Dallas, TX, he could have risen to the top of the salary scale already, but in Loudoun County, VA, he only makes $55K compared to a top salary of $100K for Step 30 teachers. Tuck is not rewarded for his talent or efforts largely because Loudoun eschews all VAMs and merit-based promotion.

This is largely why I enlisted the assistance of Arizona State law school graduate Lin Edrington in seeking the Virginia Department of Education’s (VDOE) VAM (SGP) data via a Freedom of Information Act (FOIA) suit (see pertinent files here).

VAMs are not perfect. There are concerns about validity when switching from paper to computer tests. There are serious concerns about reliability when VAMs are computed with small sample sizes or are based on classes not taught by the rated teacher (as appeared to occur in New Mexico, Florida, and possibly New York). Improper uses of VAMs give reformers a bad name. This was not the case in Virginia. SGPs were only to be used when appropriate with 2+ years of data and 40+ scores recommended.

I am a big proponent of VAMs based on my reviews of the research. We have the Chetty/Friedman/Rockoff (CFR) studies, of course, including their recent paper showing virtually no bias (Table 6). The following briefing presented by Professor Friedman at our trial gives a good layman’s overview of their high level findings. When teachers are transferred to a completely new school but their VAMs remain consistent, that is very convincing to me. I understand some point to the cautionary statement of the ASA suggesting districts apply VAMs carefully and explicitly state their limitations. But the ASA definitely recommends VAMs for analyzing larger samples including schools or district policies, and CFR believe their statement failed to consider updated research.

To me, the MET studies provided some of the most convincing evidence. Not only are high VAMs on state standardized tests correlated to higher achievement on more open-ended short-answer and essay-based tests of critical thinking, but students of high-VAM teachers are more likely to enjoy class (Table 14). This points to VAMs measuring inspiration, classroom discipline, the ability to communicate concepts, subject matter knowledge and much more. If a teacher engages a disinterested student, their low scores will certainly rise along with their VAMs. CFR and others have shown this higher achievement carries over into future grades and success later in life. VAMs don’t just measure the ability to identify test distractors, but the ability of teachers to inspire.

So why exactly did the Richmond City Circuit Court force the release of Virginia’s SGPs? VDOE applied for and received a No Child Left Behind (NCLB) waiver like many other states. But in court testimony provided in December of 2014, VDOE acknowledged that districts were not complying with the waiver by not providing the SGP data to teachers or using SGPs in teacher evaluations despite “assurances” to the US Department of Education (USDOE). When we initially received a favorable verdict in January of 2015, instead of trying to comply with NCLB waiver requirements, my district of Loudoun County Publis Schools (LCPS) laughed. LCPS refused to implement SGPs or even discuss them.

There was no dispute that the largest Virginia districts had committed fraud when I discussed these facts with the US Attorney’s office and lawyers from the USDOE in January of 2016, but the USDOE refused to support a False Claim Act suit. And while nearly every district stridently refused to use VAMs [i.e., SGPs], the Virginia Secretary of Education was falsely claiming in high profile op-eds that Virginia was using “progress and growth” in the evaluation of schools. Yet, VDOE never used the very measure (SGPs) that the ESEA [i.e., NCLB] waivers required to measure student growth. The irony is that if these districts had used SGPs for just 1% of their teachers’ evaluations after the December of 2014 hearing, their teachers’ SGPs would be confidential today. I could only find one county that utilized SGPs, and their teachers’ SGPs are exempt. Sometimes fraud doesn’t pay.

My overall goals are threefold:

  1. Hire more Science Technology Engineering and Mathematics (STEM) majors to get kids excited about STEM careers and effectively teach STEM concepts
  2. Use growth data to evaluate policies, administrators, and teachers. Share the insights from the best teachers and provide professional development to ineffective ones
  3. Publish private sector equivalent pay so young people know how much teachers really earn (pensions often add 15-18% to their salaries). We can then recruit more STEM teachers and better overall teaching candidates

What has this lawsuit and activism cost me? A lot. I ate $5K of the cost of the VDOE SGP suit even after the award[ing] of fees. One local school board member has banned me from commenting on his “public figure” Facebook page (which I see as a free speech violation), both because I questioned his denial of SGPs and some other conflicts of interests I saw, although indirectly related to this particular case. The judge in the case even sanctioned me $7K just for daring to hold him accountable. And after criticizing LCPS for violating Family Educational Rights and Privacy Act (FERPA) by coercing kids who fail Virginia’s Standards of Learning tests (SOLs) to retake them, I was banned from my kids’ school for being a “safety threat.”

Note that I am a former Naval submarine officer and have held Department of Defense (DOD) clearances for 20+ years. I attended a meeting this past Thursday with LCPS officials in which they [since] acknowledged I was no safety threat. I served in the military, and along with many I have fought for the right to free speech.

Accordingly, I am no shrinking violet. Despite having LCPS attorneys sanction perjury, the Republican Commonwealth Attorney refused to prosecute and then illegally censored me in public forums. So the CA will soon have to sign a consent order acknowledging violating my constitutional rights (he effectively admitted as much already). And a federal civil rights complaint against the schools for their retaliatory ban is being drafted as we speak. All of this resulted from my efforts to have public data released and hold LCPS officials accountable to state and federal laws. I have promised that the majority of any potential financial award will be used to fund other whistle blower cases, [against] both teachers and reformers. I have a clean background and administrators still targeted me. Imagine what they would do to someone who isn’t willing to bear these costs!

In the end, I encourage everyone to speak out based on your beliefs. Support your case with facts not anecdotes or hastily conceived opinions. And there are certainly efforts we can all support like those of Dr. Darling-Hammond. We can hold an honest debate, but please remember that schools don’t exist to employ teachers/principals. Schools exist to effectively educate students.

Ohio Bill to Review Value-Added Component of Schools’ A-F Report Cards

Ohio state legislators just last week introduced a bill to review the value-added measurements required when evaluating schools as per the state’s A-F school report cards (as based on Florida’s A-F school report card model). The bill is to be introduced by political members of the Republican side of the House who, more specifically, want officials and/or others to review how the state comes up with their school report card grades, with emphasis on the state’s specific value-added (i.e., Education Value-Added Assessment System (EVAAS)) component.

According to one article here, “especially confusing” with Ohio’s school reports cards is the school-level value added section. At the school level, value-added means essentially the same thing — the measurement of how well a school purportedly grew its students from one year to the next, when students’ growth in test scores over time are aggregated beyond the classroom and to the school-wide level. While value-added estimates are still to count for 35-50% of a teacher’s individual evaluation throughout the state, this particular bill has to do with school-level value-added only.

While most in the House, Democrats included, seem to be in favor of the idea of reviewing the value-added component (e.g., citing parent/user confusion, lack of transparency, common questions posed to the state and others about this specific component that they cannot answer), at least one Democrat is questioning Republicans’ motives (e.g., charging that Republicans might have ulterior motives to not hold charter schools accountable using VAMs and to simultaneously push conservative agendas further).

Regardless, that lawmakers in at least the state of Ohio are now admitting that they have too little understanding of how the value-added system works, and also works in practice, seems to be a step in the right direction. Let’s just hope the intentions of those backing the bill are in the right place, as also explained here. Perhaps the fact that the whole bill is one paragraph in length speaks to the integrity and forthrightness of the endeavor — perhaps not.

Otherwise, the Vice President for Ohio policy and advocacy for the Thomas B. Fordham Institute — a strong supporter of value added — is quoted as saying that “it makes sense to review the measurement…There are a lot of myths and misconceptions out there, and the more people know, the more people will understand the important role looking at student growth plays in the accountability system.”  One such “myth” he cites is that, “[t]here are measures on our state report card that correlate with demographics, but value added isn’t one of them.” In fact, and rather, we have evidence directly from the state of Ohio contradicting this claim that he calls a “myth” — that, indeed, bias is alive and well in Ohio (as well as elsewhere), especially when VAM-based estimates are aggregated at the school level (see a post with figures illustrating bias in Ohio here).

On that note, I just hope that whomever they invite for this forthcoming review, if the bill is passed, is well-informed, very knowledgeable of the literature surrounding value-added in general but also in breadth and depth, and is not representing a vendor or any particular think tank, philanthropic, or other entity with a clear agenda. Balance, at minimum for this review, is key.