Massachusetts Also Moving To Remove Growth Measures from State’s Teacher Evaluation Systems

Since the passage of the Every Student Succeeds Act (ESSA) last January, in which the federal government handed back to states the authority to decide whether to evaluate teachers with or without students’ test scores, states have been dropping the value-added measure (VAM) or growth components (e.g., the Student Growth Percentiles (SGP) package) of their teacher evaluation systems, as formerly required by President Obama’s Race to the Top initiative. See my most recent post here, for example, about how legislators in Oklahoma recently removed VAMs from their state-level teacher evaluation system, while simultaneously increasing the state’s focus on the professional development of all teachers. Hawaii recently did the same.

Now, it seems that Massachusetts is the next at least moving in this same direction.

As per a recent article in The Boston Globe (here), similar test-based teacher accountability efforts are facing increased opposition, primarily from school district superintendents and teachers throughout the state. At issue is whether all of this is simply “becoming a distraction,” whether the data can be impacted or “biased” by other statistically uncontrollable factors, and whether all teachers can be evaluated in similar ways, which is an issue with “fairness.” Also at issue is “reliability,” whereby a 2014 study released by the Center for Educational Assessment at the University of Massachusetts Amherst, in which researchers examined student growth percentiles, found the “amount of random error was substantial.” Stephen Sireci, one of the study authors and UMass professor, noted that, instead of relying upon the volatile results, “You might as well [just] flip a coin.”

Damian Betebenner, a senior associate at the National Center for the Improvement of Educational Assessment Inc. in Dover, N.H. who developed the SGP model in use in Massachusetts, added that “Unfortunately, the use of student percentiles has turned into a debate for scapegoating teachers for the ills.” Isn’t this the truth, to the extent that policymakers got a hold of these statistical tools, after which they much too swiftly and carelessly singled out teachers for unmerited treatment and blame.

Regardless, and recently, stakeholders in Massachusetts lobbied the Senate to approve an amendment to the budget that would no longer require such test-based ratings in teachers’ professional evaluations, while also passing a policy statement urging the state to scrap these ratings entirely. “It remains unclear what the fate of the Senate amendment will be,” however. “The House has previously rejected a similar amendment, which means the issue would have to be resolved in a conference committee as the two sides reconcile their budget proposals in the coming weeks.”

Not surprisingly, Mitchell Chester, Massachusetts Commissioner for Elementary and Secondary Education, continues to defend the requirement. It seems that Chester, like others, is still holding tight to the default (yet still unsubstantiated) logic helping to advance these systems in the first place, arguing, “Some teachers are strong, others are not…If we are not looking at who is getting strong gains and those who are not we are missing an opportunity to upgrade teaching across the system.”

“Arbitrary and Capricious:” Sheri Lederman Wins Lawsuit in NY’s State Supreme Court

Recall the New York lawsuit pertaining to Long Island teacher Sheri Lederman? She just won in New York’s State Supreme court, and boy did she win big, also for the cause!

Sheri is a teacher, who by all accounts other than her 2013-2014 “ineffective” growth score of a 1/20, is a terrific 4th grade, 18-year veteran teacher. However, after receiving her “ineffective” growth rating and score, she along with her attorney and husband Bruce Lederman, sued the state of New York to challenge the state’s growth-based teacher evaluation system and Sheri’s individual score. See prior posts about Sheri’s case here, herehere and here.

The more specific goal of her case was to seek a judgment: (1) setting aside or vacating Sheri’s individual growth score and rating her as “ineffective,” and (2) declare that the New York endorsed and implemented growth measures in use was/is “arbitrary and capricious.” The “overall gist” was that Sheri contended that the system unfairly penalized teachers whose students consistently scored well and could not demonstrated growth upwards (e.g., teachers of gifted or other high achieving students). This concern/complaint is common elsewhere.

As per a State Supreme Court ruling, just released today as written by Acting Supreme Court Justice Judge Roger McDonough (May 10, 2016), and at 15 pages in length and available in full here, Sheri won her case. She won it against John King — the then New York State Education Department Commissioner and the now US Secretary of Education (who recently replaced Arne Duncan as US Secretary of Education). The Court concluded that Sheri (her husband, her team of experts, and other witnesses) effectively established that her growth score and rating for 2013-2014 was “arbitrary and capricious,” with “arbitrary and capricious” being defined as actions “taken without sound basis in reason or regard to the facts.”

More specifically, the Court’s conclusion was founded upon: (1) the convincing and detailed evidence of VAM bias against teachers at both ends of the spectrum (e.g. those with high-performing students or those with low-performing students); (2) the disproportionate effect of petitioner’s small class size and relatively large percentage of high-performing students; (3) the functional inability of high-performing students to demonstrate growth akin to lower-performing students; (4) the wholly unexplained swing in petitioner’s growth score from 14 [i.e., her growth score the year prior] to 1, despite the presence of statistically similar scoring students in her respective classes; and, most tellingly, (5) the strict imposition of rating constraints in the form of a “bell curve” that places teachers in four categories via pre-determined percentages regardless of whether the performance of students dramatically rose or dramatically fell from the previous year.”

As per an email I received earlier today from Bruce (i.e., Sheri’s husband/attorney who prosecuted her case), the Court otherwise “declined to make an overall ruling on the [New York growth] rating system in general because of new regulations in effect” [e.g., that the state’s growth model is currently under review]…[Nontheless, t]he decision should qualify as persuasive authority for other teachers challenging growth scores throughout the County [and Country]. [In addition, the] Court carefully recite[d] all our expert affidavits [i.e., from Professors Darling-Hammond, Pallas, Amrein-Beardsley, Sean Corcoran and Jesse Rothstein as well as Drs. Burris and Lindell].” Noted as well were the “absence of any meaningful’ challenge to [Sheri’s] experts’ conclusions, especially about the dramatic swings noticed between her, and potentially others’ scores, and the other ‘litany of expert affidavits submitted on [Sheris’] behalf].”

“It is clear that the evidence all of these amazing experts presented was a key factor in winning this case since the Judge repeatedly said both in Court and in the decision that we have a “high burden” to meet in this case.” [In addition,] [t]he Court wrote that the court “does not lightly enter into a critical analysis of this matter … [and] is constrained on this record, to conclude that [the] petitioner [i.e., Sheri] has met her high burden.”

To Bruce’s/our knowledge, this is the first time a judge has set aside an individual teacher’s VAM rating based upon such a presentation in court.

Thanks to all who helped in this endeavor. Onward!

“Virginia SGP” Wins in Court Against State

Virginia SGP, also known as Brian Davison — a parent of two public school students in the affluent Loudoun, Virginia area (hereafter referred to as Virginia SGP) — has been an avid (and sometimes abrasive) commentator about value-added models (VAMs), defined generically, on this blog (see, for example, here, here, and here), on Diane Ravitch’s blog (see, for example, here, here, and here), and elsewhere (e.g., Virginia SGP’s Facebook page here). He is an advocate and promoter of the use of VAMs (which are in this particular case Student Growth Percentiles (SGPs); see differences between VAMs and SGPs here and here) to evaluate teachers, and he is an advocate and promoter of the release of teachers’ SGP scores to parents and the general public for their consumption and use.

Related, and as described in a Washington Post article published in March of 2016, Virginia SGP “…Pushed [Virginia] into Debate of Teacher Privacy vs. Transparency for Parents” as per teachers’ SPG data. This occurred via a lawsuit Virginia SGP filed against the state, attempting to force the release of teachers’ SGP data for all teachers across the state. More specifically, and akin to what happened in 2010 when the Los Angeles Times published the names and VAM-based ratings of thousands of teachers teaching in the Los Angeles Unified School District (LAUSD), Virginia SGP “pressed for the data’s release because he thinks parents have a right to know how their children’s teachers are performing, information about public employees that exists but has so far been hidden. He also wants to expose what he says is Virginia’s broken promise to begin using the data to evaluate how effective the state’s teachers are.” He thinks that “teacher data should be out there,” especially if taxpayers are paying for it.

In January, a Richmond, Virginia judge ruled in Virginia SGP’s favor, despite the state’s claims that Virginia school districts, despite the state’s investments, had reportedly not been using the SGP data, “calling them flawed and unreliable measures of a teacher’s effectiveness.” And even though this ruling was challenged by state officials and the Virginia Education Association thereafter, Virginia SGP posted via his Facebook page the millions of student records the state released in compliance with the court, with teacher names and other information redacted.

This past Tuesday, however, and despite the challenges to the court’s initial ruling, came another win for Virginia SGP, as well as another loss for the state of Virginia. See the article “Judge Sides with Loudoun Parent Seeking Teachers’ Names, Student Test Scores,” published yesterday in a local Loudon, Virginia news outlet.

The author of this article, Danielle Nadler, explains more specifically that, “A Richmond Circuit Court judge has ruled that [the] VDOE [Virginia Department of Education] must release Loudoun County Public Schools’ Student Growth Percentile [SGP] scores by school and by teacher…[including] teacher identifying information.” The judge noted that “that VDOE and the Loudoun school system failed to ‘meet the burden of proof to establish an exemption’ under Virginia’s Freedom of Information Act [FOIA].” The court also ordered VDOE to pay Davison $35,000 to cover his attorney fees and other costs. This final order was dated April 12, 2016.

“Davison said he plans to publish the information on his ‘Virginia SGP’ Facebook page. Students will not be identified, but some of the teachers will. ‘I may mask the names of the worst performers when posting rankings/lists but other members of the public can analyze the data themselves to discover who those teachers are,” Virginia SGP said.

I’ve exchanged messages with Virginia SGP prior to this ruling and since, and since I’ve explicitly invited him to also comment via this blog. While with this objective and subsequent ruling I disagree, although I do believe in transparency, it is nonetheless newsworthy in the realm of VAMs and for followers/readers of this blog. Comment now and/or do stay tuned for more.

Educator Evaluations (and the Use of VAM) Unlikely to be Mandated in Reauthorization of ESEA

In invited a colleague of mine – Kimberly Kappler Hewitt (Assistant Professor, University of North Carolina, Greensboro) – to write a guest post for you all, and she did on her thoughts regarding what is currently occurring on Capitol Hill regarding the reauthorization of the Elementary and Secondary Education Act (ESEA). Here is what she wrote:

Amidst what is largely a bitterly partisan culture on Capitol Hill, Republicans and Democrats agree that teacher evaluation is unlikely to be mandated in the reauthorization of the Elementary and Secondary Education Act (ESEA), the most recent iteration of which is No Child Left Behind (NCLB), signed into law in 2001. See here for an Education Week article by Lauren Camera on the topic.

In another piece on the topic (here), the same author Camera explains: “Republicans, including Chairman Lamar Alexander, R-Tenn., said Washington shouldn’t mandate such policies, while Democrats, including ranking member Patty Murray, D-Wash., were wary of increasing the role student test scores play in evaluations and how those evaluations are used to compensate teachers.” However, under draft legislation introduced by Senator Lamar Alexander (R-Tenn.), Chairman of the Senate Health, Education, Labor, and Pensions Committee, Title II funding would turn into federal block grants, which could be used by states for educator evaluation. Regardless, excluding a teacher evaluation mandate from ESEA reauthorization may undermine efforts by the Obama administration to incorporate student test score gains as a significant component of educator evaluation.

Camera further explains: “Should Congress succeed in overhauling the federal K-12 law, the lack of teacher evaluation requirements will likely stop in its tracks the Obama administration’s efforts to push states to adopt evaluation systems based in part on student test scores and performance-based compensation systems.”

Under the Obama administration, in order for states to obtain a waiver from NCLB penalties and to receive a Race to the Top Grant, they had to incorporate—as a significant component—student growth data in educator evaluations. Influenced by these powerful policy levers, forty states and the District of Columbia require objective measures of student learning to be included in educator evaluations—a sea change from just five years ago (Doherty & Jacobs/National Council on Teacher Quality, 2013). Most states use either some type of value-added model (VAM) or student growth percentile (SGP) model to calculate a teacher’s contribution to student score changes.

The Good, the Bad, and the Ugly

As someone who is skeptical about the use of VAMs and SGPs for evaluating educators, I have mixed feelings about the idea that educator evaluation will be left out of ESEA reauthorization. I believe that student growth measures such as VAMs and SGPs should be used not as a calculable component of an educator’s evaluation but as a screener to flag educators who may need further scrutiny or support, a recommendation made by a number of student growth measure (SGM) experts (e.g., Baker et al., 2010; Hill, Kapitula, & Umland, 2011; IES, 2010; Linn, 2008).

Here are two thoughts about the consequences of not incorporating policy on educator evaluation in the reauthorization of ESEA:

  1. Lack of clear federal vision for educator evaluation devolves to states the debate. There are strong debates about what the nature of educator evaluation can and should be, and education luminaries such as Linda Darling Hammond and James Popham have weighed in on the issue (see here and here, respectively). If Congress does not address educator evaluation in ESEA legislation, the void will be filled by disparate state policies. This in itself is neither good nor bad. It does, however, call into question the longevity of the efforts the Obama administration has made to leverage educator evaluation as a way to increase teacher quality. Essentially, the lack of action on the part of Congress regarding educator evaluation devolves the debates to the state level, which means that heated—and sometimes vitriolic—debates about educator evaluation will endure, shifting attention away from other efforts that could have a more powerful and more positive effect on student learning.
  2. Possibility of increases in inequity. ESEA was first passed in 1965 as part of President Johnson’s War on Poverty. ESEA was intended to promote equity for students from poverty by providing federal funding to districts serving low-income students. The idea was that the federal government could help to level the playing field, so to speak, for students who lacked the advantages of higher income students. My own research suggests that the use of VAM for educator evaluation potentially exacerbates inequity in that some teachers avoid working with certain groups of students (e.g., students with disabilities, gifted students, and students who are multiple grade levels behind) and at certain schools, especially high-poverty schools, based on the perception that teaching such students and in such schools will result in lower value-added scores. Without federal legislation that provides clear direction to states that student test score data should not be used for high-stakes evaluation and personnel decisions, states may continue to use data in this manner, which could exacerbate the very inequities that ESEA was originally designed to address.

While it is a good thing, in my mind, that ESEA reauthorization will not mandate educator evaluation that incorporates student test score data, it is a bad (or at least ugly) thing that Congress is abdicating the role of promoting sound educator evaluation policy.

References

Baker, A. L., Barton, P. E., Darling-Hammond, L., Haertel, E., Ladd, H. F., Linn, R. L., . . . Shepard, L. A. (2010). Problems with the use of student test scores to evaluate teachers. EPI Briefing Paper. Washington, D.C.

Hill, H. C., Kapitula, L., & Umland, K. (2011). A validity argument approach to evaluating teacher value-added scores. American Educational Research Journal, 48(3), 794-831.

Doherty, K. M., & Jacobs, S./National Council on Teacher Quality (2013). State of the states 2013: Connect the dots: Using evaluation of teacher effectiveness to inform policy and practice. Washington, D. C.: National Council on Teacher Quality.

Institute of Education Sciences. (2010). Error rates in measuring teacher and school performance based on students’ test score gains. Washington, D.C.: U. S. Department of Education.

Linn, R. L. (2008). Methodological issues in achieving school accountability. Journal of Curriculum Studies, 40(6), 699-711.

Research Study: Missing Data and VAM-Based Bias

A new Assistant Professor here at ASU, from outside the College of Education but in the College of Mathematical and Natural Sciences also specializes in value-added modeling (and statistics). Her name is Jennifer Broatch, she is a rising star in this area of research, and she just sent me an article I missed, just read, and certainly found worth sharing with you all.

The peer-reviewed article, published in Statistics and Public Policy this past November, is fully cited and linked below so that you all can read it in full. But in terms of its CliffsNotes version, researchers evidenced the following two key findings:

First, researchers found that, “VAMs that include shorter test score histories perform fairly well compared to those with longer score histories.” The current thinking is that we need at least two if not three years of data to yield reliable estimates, or estimates that are consistent over time (which they should be). These authors argue that with three years of data the amount of data that go missing are not worth shooting for that target. Rather, again they argue, this is an issue of trade-offs. This is certainly something to consider, as long as we continue to understand that all of this is about “tinkering towards a utopia” (Tyack & Cuban, 1997) that I’m not at all certain exists in terms of VAMs and VAM-based accuracy.

Second, researchers found that, “the decision about whether to control for student covariates [or background/demographic variables] and schooling environments, and how to control for this information, influences [emphasis added] which types of schools and teachers are identified as top and bottom performers. Models that are less aggressive in controlling for student characteristics and schooling environments systematically identify schools and teachers that serve more advantaged students as providing the most value-added, and correspondingly, schools and teachers that serve more disadvantaged students as providing the least.”

This certainly adds evidence to the research on VAM-based bias. While there are many researchers who still claim that controlling for student background variables is unnecessary when using VAMs, and if anything bad practice because controlling for such demographics causes perverse effects (e.g., if teachers focus relatively less on such students who are given such statistical accommodations or boosts), this study adds more evidence that “to not control” for such demographics does indeed yield biased estimates. The authors do not disclose, however, how much bias is still “left over” after the controls are used; hence, this is still a very serious point of contention. Whether the controls, even if used, function appropriately is still something to be taken in earnest, particularly when consequential decisions are to be tied to VAM-based output (see also “The Random Assignment of Students into Elementary Classrooms: Implications for Value-Added Analyses and Interpretations”).

Citation: Ehlert, M., Koedel, C., Parsons, E., & Podgursky, M. (2013, November). The sensitivity of value-added estimates to specification adjustments: Evidence from school- and teacher-level models in Missouri. Statistics and Public Policy, 1(1), 19-27. doi: 10.1080/2330443X.2013.856152

Student Learning Objectives, aka Student Growth Objectives, aka Another Attempt to Quantify “High Quality” Teaching

After a previous post about VAMs v. Student Growth Percentiles (SGPs) (see also VAMs v. SGPs Part II) a reader posted a comment asking for more information about the utility of SGPs, but also about the difference between SGPs and Student Growth Objectives.

“Student Growth Objectives” is a new term for an older concept that is being increasingly integrated into educational accountability systems nationwide, and also under scrutiny (see one of Diane Ravitch’s recent posts about this here). But the concept underlying Student Growth Objectives (SGOs) is essentially just Student Learning Objectives (SLOs). Why they insist on using the term “growth” in place of the term “learning” is perhaps yet another fad. Related, it also likely has something to do with various legislative requirements (e.g., Race to the Top terminologies), although evidence in support of this transition is also void.

Regardless, and put simply, an SGO/SLO is an annual goal for measuring student growth/learning of the students instructed by teachers (or principals, for school-level evaluations) who are not eligible to participate in a school’s or district’s value-added or student growth model. This includes the vast majority of teachers in most schools or districts (e.g., 70+%), because only those teachers who instruct reading/language arts or mathematics in state achievement tested grade levels, typically grades 3-8, are eligible to participate in the VAM or SGP evaluation system. Hence via the development of SGOs/SLOs, administrators and others were either unwilling to allow these exclusions to continue or forced to establish a mechanism to include the other teachers to meet some legislative mandate.

New Jersey, for example, defines an SGO as “a long-term academic goal that teachers set for groups of students and must be: Specific and measureable; Aligned to New Jersey’s curriculum standards; Based on available prior student learning data; A measure of what a student has learned between two points in time; Ambitious and achievable” (for more information click here).

Denver Public Schools has been using SGOs for many years; their 2008-2009 Teacher Handbook states that an SGO must be “focused on the expected growth of [a teacher’s] students in areas identified in collaboration with their principal,” as well as that the objectives must be “Job-based; Measurable; Focused on student growth in learning; Based on learning content and teaching strategies; Discussed collaboratively at least three times during the school year; May be adjusted during the school year; Are not directly related to the teacher evaluation process; [and] Recorded online” (for more information click here).

That being said, and in sum, SGOs/SLOs, like VAMs, are not supported with empirical work. As Jersey Jazzman summarized very well in his post about this, the correlational evidence is very weak, the conclusions drawn by outside researchers are a stretch, and the rush to implement these measures is just as unfounded as the rush to implement VAMs for educator evaluation. We don’t know that SGOs/SLOs make a difference in distinguishing “good” from “poor” teachers; and in fact, some could argue (like Jersey Jazzman does) that they don’t actually do so much of anything at all. They’re just another metric being used in the attempt to quantify “high quality” teaching.

Thanks to Dr. Sarah Polasky for this post.

VAMs v. Student Growth Percentiles (SGPs) – Part II

A few weeks ago, a reader posted the following question: “What is the difference [between] VAM and Student Growth Percentiles (SGP) and do SGPs have any usefulness[?]”

In response, I invited a scholar and colleague who knows a lot about the SGP. This is the first of two posts to help others understand the distinctions and similarities. Thanks to our Guest Blogger – Sarah Polasky – for writing the following:

“First, I direct the readers to the VAMboozled! glossary and the information provided that contrasts VAMs and Student Growth Models, if they haven’t visited that section of the site yet. Second, I hope to build upon this by highlighting key terms and methodological differences between traditional VAMs and SGPs.

A Value-Added Model (VAM) is a multivariate (multiple variable) student growth model that attempts to account or statistically control for all potential student, teacher, school, district, and external influences on outcome measures (i.e., growth in student achievement over time). The most well-known example of this model is the SAS Education Value-Added Assessment System (EVAAS)[1]. The primary goal of this model is to estimate teachers’ causal effects on student performance over time. Put differently, the purpose of this model is to measure groups of students’ academic gains over time and then attribute those gains (or losses) back to teachers as key indicators of the teachers’ effectiveness.

In contrast, the Student Growth Percentiles (SGP)[2] model uses students’ level(s) of past performance to determine students’ normative growth (i.e., as compared to his/her peers). As explained by Castellano & Ho[3], “SGPs describe the relative location of a student’s current score compared to the current scores of students with similar score histories” (p. 89). Students are compared to themselves (i.e., students serve as their own controls) over time; therefore, the need to control for other variables (e.g., student demographics) is less necessary. The SGP model was developed as a “better” alternative to existing models, with the goal of providing clearer, more accessible, and more understandable results to both internal and external education stakeholders and consumers. The primary goal of this model is to provide growth indicators for individual students, groups of students, schools, and districts.

The utility of the SGP model lies in reviewing, particularly by subject area, growth histories for individual students and aggregate measures for groups of students (e.g., English language learners) to track progress over time and examine group differences, respectively. The model’s developer admits that, on its own, SGPs should not be used to make causal interpretations, such as attributing high growth in one classroom to the teacher as the sole source of growth[4]. However, when paired with additional indicators, supporting concurrent-related evidence of validity (link to glossary), such inferences may be more appropriate.”

 


[1] Sanders, W.L. & Horn, S.P. (1994). The Tennessee value-added assessment system (TVAAS): Mixed-model methodology in educational assessment. Journal of Personnel Evaluation in Education, 8(3): 299-311.

[2] Betebenner, D.W. (2013). Package ‘SGP’. Retrieved from http://cran.r-project.org/web/packages/SGP/SGP.pdf.

[3] Castellano, K.E. & Ho, A.D. (2013). A Practitioner’s Guide to Growth Models. Council of Chief State School Officers.

[4] Betebenner, D. W. (2009). Norm- and criterion-referenced student growth. Educational Measurement: Issues and Practice, 28(4), 42-51. doi:10.1111/j.1745-3992.2009.00161.x

VAMs v. Student Growth Percentiles (SGPs)

Yesterday (11/4/2013) a reader posted a question, the first part of which I am partially addressing here: “What is the difference [between] VAM[s] and Student Growth Percentiles (SGP[s]) and do SGPs have any usefulness[?]” One of my colleagues, soon to be a “Guest Blogger” on TheTeam, but already an SGP expert, is helping me work on a more nuanced response, but for the time-being please check out the Glossary section of this blog:

VAMs v. Student Growth Models: The main similarities between VAMs and student growth models are that they all use students’ large-scale standardized test score data from current and prior years to calculate students’ growth in achievement over time. In addition, they all use students’ prior test score data to “control for” the risk factors that impact student learning and achievement both at singular points in time as well as over time. The main differences between VAMs and student growth models are how precisely estimates are made, as related to whether, how, and how many control variables are included in the statistical models to control for these risk and other extraneous variable (e.g.,  other teachers’ simultaneous and prior teacher’s residual effects). The best and most popular example of a student growth model is the Student Growth Percentiles (SGP) model. It is not a VAM by traditional standards and definitions, mainly because the SGP model does not use as many sophisticated controls as does its VAM counterparts.

See more forthcoming…