Nevada (Potentially) Dropping Students’ Test Scores from Its Teacher Evaluation System

This week in Nevada “Lawmakers Mull[ed] Dropping Student Test Scores from Teacher Evaluations,” as per a recent article in The Nevada Independent (see here). This would be quite a move from 2011 when the state (as backed by state Republicans, not backed by federal Race to the Top funds, and as inspired by Michelle Rhee) passed into policy a requirement that 50% of all Nevada teachers’ evaluations were to rely on said data. The current percentage rests at 20%, but it is to double next year to 40%.

Nevada is one of a still uncertain number of states looking to retract the weight and purported “value-added” of such measures. Note also that last week Connecticut dropped some of its test-based components of its teacher evaluation system (see here). All of this is occurring, of course, post the federal passage of the Every Student Succeeds Act (ESSA), within which it is written that states must no longer set up teacher-evaluation systems based in significant part on their students’ test scores.

Accordingly, Nevada’s “Democratic lawmakers are trying to eliminate — or at least reduce — the role [students’] standardized tests play in evaluations of teachers, saying educators are being unfairly judged on factors outside of their control.” The Democratic Assembly Speaker, for example, said that “he’s always been troubled that teachers are rated on standardized test scores,” more specifically noting: “I don’t think any single teacher that I’ve talked to would shirk away from being held accountable…[b]ut if they’re going to be held accountable, they want to be held accountable for things that … reflect their actual work.” I’ve never met a teacher would disagree with this statement.

Anyhow, this past Monday the state’s Assembly Education Committee heard public testimony on these matters and three bills “that would alter the criteria for how teachers’ effectiveness is measured.” These three bills are as follows:

  • AB212 would prohibit the use of student test scores in evaluating teachers, while
  • AB320 would eliminate statewide [standardized] test results as a measure but allow local assessments to account for 20 percent of the total evaluation.
  • AB312 would ensure that teachers in overcrowded classrooms not be penalized for certain evaluation metrics deemed out of their control given the student-to-teacher ratio.

Many presented testimony in support of these bills over an extended period of time on Tuesday. I was also invited to speak, during which I “cautioned lawmakers against being ‘mesmerized’ by the promised objectivity of standardized tests. They have their own flaws, [I] argued, estimating that 90-95 percent of researchers who are looking at the effects of high-stakes testing agree that they’re not moving the dial [really whatsoever] on teacher performance.”

Lawmakers have until the end of tomorrow (i.e., Friday) to pass these bills outside of the committee. Otherwise, they will die.

Of course, I will keep you posted, but things are currently looking “very promising,” especially for AB320.

The Tripod Student Survey Instrument: Its Factor Structure and Value-Added Correlations

The Tripod student perception survey instrument is a “research-based” instrument increasingly being used by states to add to state’s teacher evaluation systems as based on “multiple measures.” While there are other instruments also in use, as well as student survey instruments being developed by states and local districts, this one in particular is gaining in popularity, also in that it was used throughout the Bill & Melinda Gates Foundation’s ($43 million worth of) Measures of Effective Teaching (MET) studies. A current estimate (as per the study discussed in this post) is that during the 2015–2016 school year approximately 1,400 schools purchased and administered the Tripod. See also a prior post (here) about this instrument, or more specifically a chapter of a book about the instrument as authored by the instrument’s developer and lead researcher in a  research surrounding it – Ronald Ferguson.

In a study recently released in the esteemed American Educational Research Journal (AERJ), and titled “What Can Student Perception Surveys Tell Us About Teaching? Empirically Testing the Underlying Structure of the Tripod Student Perception Survey,” researchers found that the Tripod’s factor structure did not “hold up.” That is, Tripod’s 7Cs (i.e., seven constructs including: Care, Confer, Captivate, Clarify, Consolidate, Challenge, Classroom Management; see more information about the 7Cs here) and the 36 items that are positioned within each of the 7Cs did not fit the 7C framework as theorized by instrument developer(s).

Rather, using the MET database (N=1,049 middle school math class sections; N=25,423 students), researchers found that an alternative bi-factor structure (i.e., two versus seven constructs) best fit the Tripod items theoretically positioned otherwise. These two factors included (1) a general responsivity dimension that includes all items (more or less) unrelated to (2) a classroom management dimension that governs responses on items surrounding teachers’ classroom management. Researchers were unable to to distinguish across items seven separate dimensions.

Researchers also found that the two alternative factors noted — general responsivity and classroom management — were positively associated with teacher value-added scores. More specifically, results suggested that these two factors were positively and statistically significantly associated with teachers’ value-added measures based on state mathematics tests (standardized coefficients were .25 and .25, respectively), although for undisclosed reasons, results apparently suggested nothing about these two factors’ (cor)relationships with value-added estimates base on state English/language arts (ELA) tests. As per authors’ findings in the area of mathematics, prior researchers have also found low to moderate agreement between teacher ratings and student perception ratings; hence, this particular finding simply adds another source of convergent evidence.

Authors do give multiple reasons and plausible explanations as to why they found what they did that you all can read in more depth via the full article, linked to above and fully cited below. Authors also note that “It is unclear whether the original 7Cs that describe the Tripod instrument were intended to capture seven distinct dimensions on which students can reliably discriminate among teachers or whether the 7Cs were merely intended to be more heuristic domains that map out important aspects of teaching” (p. 1859); hence, this is also important to keep in mind given study findings.

As per study authors, and to their knowledge, “this study [was] the first to systematically investigate the multidimensionality of the Tripod student perception survey” (p. 1863).

Citation: Wallace, T. L., Kelcey, B., &  Ruzek, E. (2016). What can student perception surveys tell us about teaching? Empirically testing the underlying structure of the Tripod student perception survey.  American Educational Research Journal, 53(6), 1834–1868.
doiI:10.3102/0002831216671864 Retrieved from http://journals.sagepub.com/doi/pdf/10.3102/0002831216671864

New (Unvetted) Research about Washington DC’s Teacher Evaluation Reforms

In November of 2013, I published a blog post about a “working paper” released by the National Bureau of Economic Research (NBER) and written by authors Thomas Dee – Economics and Educational Policy Professor at Stanford, and James Wyckoff – Economics and Educational Policy Professor at the University of Virginia. In the study titled “Incentives, Selection, and Teacher Performance: Evidence from IMPACT,” Dee and Wyckoff (2013) analyzed the controversial IMPACT educator evaluation system that was put into place in Washington DC Public Schools (DCPS) under the then Chancellor, Michelle Rhee. In this paper, Dee and Wyckoff (2013) presented what they termed to be “novel evidence” to suggest that the “uniquely high-powered incentives” linked to “teacher performance” via DC’s IMPACT initiative worked to improve the performance of high-performing teachers, and that dismissal threats worked to increase the voluntary attrition of low-performing teachers, as well as improve the performance of the students of the teachers who replaced them.

I critiqued this study in full (see both short and long versions of this critique here), and ultimately asserted that the study had “fatal flaws” which compromised the (exaggerated) claims Dee and Wyckoff (2013) advanced. These flaws included but were not limited to that only 17% of the teachers included in this study (i.e., teachers of reading and mathematics in grades 4 through 8) were actually evaluated under the value-added component of the IMPACT system. Put inversely, 83% of the teachers included in this study about teachers’ “value-added” did not have student test scores available to determine if they were indeed of “added value.” That is, 83% of the teachers evaluated, rather, were assessed on their overall levels of effectiveness or subsequent increases/decreases in effectiveness as per only the subjective observational and other self-report data include within the IMPACT system. Hence, while authors’ findings were presented as hard fact, given the 17% fact, their (exaggerated) conclusions did not at all generalize across teachers given the sample limitations, and despite what they claimed.

In short, the extent to which Dee and Wyckoff (2013) oversimplified very complex data to oversimplify a very complex context and policy situation, after which they exaggerated questionable findings, was of issue, that should have been reconciled or cleared out prior to the study’s release. I should add that this study was published in 2015 in the (economics-oriented and not-educational-policy specific) Journal of Policy Analysis and Management (see here), although I have not since revisited the piece to analyze, comparatively (e.g., via a content analysis), the original 2013 to the final 2015 piece.

Anyhow, they are at it again. Just this past January (2017) they published another report, albeit alongside two additional authors: Melinda Adnot – a Visiting Assistant Professor at the University of Virginia, and Veronica Katz – an Educational Policy PhD student, also at the University of Virginia. This study titled “Teacher Turnover, Teacher Quality, and Student Achievement in DCPS,” was also (prematurely) released as a “working paper” by the same NBER, again, without any internal or external vetting but (irresponsibly) released “for discussion and comment.”

Hence, I provide below my “discussion and comments” below, all the while underscoring how this continues to be problematic, also given the fact that I was contacted by the media for comment. Frankly, no media reports should be released about these (or for that matter any other) “working papers” until they are not only internally but also externally reviewed (e.g., in press or published, post vetting). Unfortunately, as they too commonly do, however, NBER released this report, clearly without such concern. Now, we as the public are responsible for consuming this study with much critical caution, while also advocating that others (and helping others to) do the same. Hence, I write into this post my critiques of this particular study.

First, the primary assumption (i.e., the “conceptual model”) driving this Adnot, Dee, Katz, & Wyckoff (2016) piece is that low-performing teachers should be identified and replaced with more effective teachers. This is akin to the assumption noted in the first Dee and Wyckoff (2013) piece. It should be noted here that in DCPS teachers rated as “Ineffective” or consecutively as “Minimally Effective” are “separated” from the district; hence, DCPS has adopted educational policies that align with this “conceptual model” as well. Interesting to note is how researchers, purportedly external to DCPS, entered into this study with the same a priori “conceptual model.” This, in and of itself, is an indicator of researcher bias (see also forthcoming).

Nonetheless, Adnot et al.’s (2016) highlighted finding was that “on average, DCPS replaced teachers who left with teachers who increased student achievement by 0.08 SD [standard deviations] in math.” Buried further into the report they also found that DCPS replaced teachers who left with teachers who increased student achievement by 0.05 SD in reading (at not a 5% but a 10% statistical significance level). These findings, in simpler but also more realistic terms, mean that (if actually precise and correct, also given all of the problems with how teacher classifications were determined at the DCPS level), “effective” mathematics teachers who replaced “ineffective” mathematics teachers increased student achievement by approximately 2.7%, and “effective” reading teachers who replaced “ineffective” reading teachers increased student achievement by approximately 1.7% (at not a 5% but a 10% statistical significance level). These are hardly groundbreaking results as these proportional movements likely represented one or maybe two total test items on the large-scale standardized tests uses to assess DCPS’s policy impacts.

Interesting to also note is that not only were the “small effects” exaggerated to mean so much more than what they are actually worth (see also forthcoming), but also that only the larger of the two findings – the mathematics finding – is highlighted in the abstract. The complimentary and smaller reading effect is actually buried into the text. Also buried is that these findings pertain only to grade four and eight, general education teachers who were value-added eligible, akin to Dee and Wyckoff’s (2013) earlier piece (e.g., typically 30% of a school’s population, although Dee and Wyckoff’s (2013) piece marked this percentage at 17%).

As mentioned prior, none of this would have likely happened had this piece been internally and/or externally reviewed prior to this study’s release.

Regardless, Adnot et al. (2016) also found that the attrition of relatively higher-performing teachers (e.g., “Effective” or “Highly Effective”) had a negative but also statistically insignificant effect.

Hence, it can be concluded that the only “finding” highlighted in the abstract of this piece was not the only “finding,” but rather buried in this piece were these other findings that researchers (perhaps) purposefully buried into the text. It is possible, in other words, that because these other findings did not support researchers a priori conclusions and claims, researchers chose not to bring attention to these findings, or rather the lack thereof (e.g., in the abstract).

Related, I should note that in a few places the authors exaggerate how, for example, teachers’ effects on their students’ achievement are so tangible, without any mention of contrary reports, namely as published by the American Statistical Association (ASA), in which the ASA evidenced that these (oft-exaggerated) teacher effects account for no more than 1%-14% of the variance in students’ growth scores (see more information here). In fact, teacher effectiveness is very likely not “qualitatively large” as Adnot et al. (2016) argue, without evidence in this piece, and also imply throughout this piece as a foundational part of their aforementioned “conceptual model.”

Likewise, while most everyone would likely agree that there are “stark inequities” in students’ access to effective teachers, how to improve this condition is certainly of great debate, as also not explicitly or implicitly acknowledged throughout this piece. Rather, much disagreement and debate, in fact, still exist regarding whether inducing teacher turnover will get “us” really anywhere in terms of school reform, as also related to how big (or small) teachers’ effects on students’ measurable performance actually are as discussed prior. Accordingly, and perhaps not surprisingly, Adnot et al. (2016) cite only the articles of other researchers, or rather members of their proverbial choir (e.g, Eric Hanushek who, without actual evidence, has been hypothesizing about how replacing “ineffective” teachers with “effective” teachers will reform America’s schools for nearly now one decade) to support these same a priori conclusions. Consequently, the professional integrity of the researchers must be put into check given these simple (albeit biased) errors.

Taking all of this into consideration, I would hardly call the findings they advanced in this piece (and emphasized in the abstract) solid indicators of the “overall positive effects of teacher turnover,” with only one statistically but not practically significant finding of note in mathematics (i.e., a 2.7% increase if accurate)? None of this could/should, accordingly, lead anyone to conclude that “the supply of entering teachers appears to be of sufficient quality to sustain a relatively high turnover rate.”

Hence, this is yet another case of these authors oversimplifying very complex data to oversimplify a very complex context and policy situation, after which they exaggerated negligible findings while also dismissing others.

Related, would we not expect greater results given teachers who are deemed highly effective are to be given one-time bonuses of up to $25,000, and permanent increases to teachers’ base salaries of up to $27,000 per year? This bang, or lack thereof, may not be worth the buck, either.

Additionally, is an annual attrition rate of “low-performing teachers” (e.g., classified as such for one or two consecutive years) in the district, currently hanging at around 46% worth these diminutive results?

Did they also actually find, overall, that “high-poverty schools actually improve as a result of teacher turnover?” I don’t think so, but do give this study a full read to test their as well as my conclusions for yourself (see, again, the full study here).

In the end, Adnot et al. (2016) do conclude that they found “that the overall effect of teacher turnover in DCPS conservatively had no effect on achievement and, under reasonable assumptions, improved achievement.” This is a MUCH more balanced interpretation of this study, although I would certainly question their “reasonable assumptions” (see also prior). Moreover, it is much more curious as to why we had to wait for the actual headline of this study until the end. This is especially important given that others, including members of the media, public, and policy making community, might not make it that far (i.e., trusting only what is in the abstract).

Citations:

Adnot, M., Dee, T., Katz, V., & Wyckoff, J. (2016). Teacher turnover, teacher quality, and student achievement in DCPS [Washington DC Public Schools]. Cambridge, MA: National Bureau of Economic Research (NBER). Retrieved from http://www.nber.org/papers/w21922.pdf

Dee, T., & Wyckoff, J. (2013). Incentives, selection, and teacher performance: Evidence from IMPACT. National Bureau of Economic Research (NBER). Retrieved from http://www.nber.org/papers/w19529.pdf