“Virginia SGP” Wins in Court Against State

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

Virginia SGP, also known as Brian Davison — a parent of two public school students in the affluent Loudoun, Virginia area (hereafter referred to as Virginia SGP) — has been an avid (and sometimes abrasive) commentator about value-added models (VAMs), defined generically, on this blog (see, for example, here, here, and here), on Diane Ravitch’s blog (see, for example, here, here, and here), and elsewhere (e.g., Virginia SGP’s Facebook page here). He is an advocate and promoter of the use of VAMs (which are in this particular case Student Growth Percentiles (SGPs); see differences between VAMs and SGPs here and here) to evaluate teachers, and he is an advocate and promoter of the release of teachers’ SGP scores to parents and the general public for their consumption and use.

Related, and as described in a Washington Post article published in March of 2016, Virginia SGP “…Pushed [Virginia] into Debate of Teacher Privacy vs. Transparency for Parents” as per teachers’ SPG data. This occurred via a lawsuit Virginia SGP filed against the state, attempting to force the release of teachers’ SGP data for all teachers across the state. More specifically, and akin to what happened in 2010 when the Los Angeles Times published the names and VAM-based ratings of thousands of teachers teaching in the Los Angeles Unified School District (LAUSD), Virginia SGP “pressed for the data’s release because he thinks parents have a right to know how their children’s teachers are performing, information about public employees that exists but has so far been hidden. He also wants to expose what he says is Virginia’s broken promise to begin using the data to evaluate how effective the state’s teachers are.” He thinks that “teacher data should be out there,” especially if taxpayers are paying for it.

In January, a Richmond, Virginia judge ruled in Virginia SGP’s favor, despite the state’s claims that Virginia school districts, despite the state’s investments, had reportedly not been using the SGP data, “calling them flawed and unreliable measures of a teacher’s effectiveness.” And even though this ruling was challenged by state officials and the Virginia Education Association thereafter, Virginia SGP posted via his Facebook page the millions of student records the state released in compliance with the court, with teacher names and other information redacted.

This past Tuesday, however, and despite the challenges to the court’s initial ruling, came another win for Virginia SGP, as well as another loss for the state of Virginia. See the article “Judge Sides with Loudoun Parent Seeking Teachers’ Names, Student Test Scores,” published yesterday in a local Loudon, Virginia news outlet.

The author of this article, Danielle Nadler, explains more specifically that, “A Richmond Circuit Court judge has ruled that [the] VDOE [Virginia Department of Education] must release Loudoun County Public Schools’ Student Growth Percentile [SGP] scores by school and by teacher…[including] teacher identifying information.” The judge noted that “that VDOE and the Loudoun school system failed to ‘meet the burden of proof to establish an exemption’ under Virginia’s Freedom of Information Act [FOIA].” The court also ordered VDOE to pay Davison $35,000 to cover his attorney fees and other costs. This final order was dated April 12, 2016.

“Davison said he plans to publish the information on his ‘Virginia SGP’ Facebook page. Students will not be identified, but some of the teachers will. ‘I may mask the names of the worst performers when posting rankings/lists but other members of the public can analyze the data themselves to discover who those teachers are,” Virginia SGP said.

I’ve exchanged messages with Virginia SGP prior to this ruling and since, and since I’ve explicitly invited him to also comment via this blog. While with this objective and subsequent ruling I disagree, although I do believe in transparency, it is nonetheless newsworthy in the realm of VAMs and for followers/readers of this blog. Comment now and/or do stay tuned for more.

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

Alabama’s “New” Teacher Accountability System “Shelved”

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

I wrote in two prior posts (here and here) about how in Alabama, just this past January, the Grand Old Party (GOP) put forth a bill to become an act, titled the Rewarding Advancement in Instruction and Student Excellence (RAISE) Act. Its purpose was to promote the usage of students’ test scores to grade and pay teachers annual bonuses (i.e., “supplements”) as per their performance, and “provide a procedure for observing and evaluating teachers” to help make other “significant differentiation[s] in pay, retention, promotion, dismissals, and other staffing decisions, including transfers, placements, and preferences in the event of reductions in force, [as] primarily [based] on evaluation results.” The bill was also to have extended the time to obtain tenure from three to five years, and all teachers’ test-based measures were to account for up to 25% of their yearly evaluations. Likewise, all of this was to happen at the state level, regardless of the fact that the state was no longer required to move forward with such a teacher evaluation system post the passage of the Every Student Succeeds Act (ESSA; see prior posts about the ESSA here, here, and here).

Now it looks like this bill has been officially “shelved.”

As per an Association Press release here, the GOP leader of the Alabama Senate Del Marsh says that “he is shelving his education bill for the session after continued resistance to the idea of tying teacher evaluations to test score improvement…he was frustrated by the pushback from some education groups.

This is great news for educators in Alabama, as well as potentially other states (e.g., Georgia, see for example here) regarding trends to come. See also a prior post on what other states in the south are similarly doing here.

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

Teachers’ “Similar” Value-Added Estimates Yield “Different” Meanings Across “Different” Contexts

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

Some, particularly educational practitioners, might respond with a sense of “duh”-like sarcasm to the title of this post above, but as per a new research study recently released in the highly reputable, peer-reviewed American Educational Research Journal (AERJ), researchers evidenced this very headline via an extensive research study they conducted in the northeast United States. Hence, this title has now been substantiated with empirical evidence.

Researchers David Blazar (Doctoral Candidate at Harvard), Erica Litke (Assistant Professor at University of Delaware), and Johanna Barmore (Doctoral Candidate at Harvard) examined (1) the comparability of teachers’ value-added estimates within and across four urban districts and (2), given the extent to which variations observed, how and whether said value-added estimates consistently captured differences in teachers’ observed, videotaped, and scored classroom practices.

Regarding their first point of investigation, they found that teachers were categorized differently when compared within versus across districts (i.e., when compared to other similar teachers within districts versus across districts, which is a methodological choice that value-added modelers often make). Researchers found and asserted not that either approach yielded more valid interpretations, however. Rather, they evidenced that the differences they observed within and across districts were notable, and these differences had notable implications for validity, whereas a teacher classified as adding X value in one context could be categorized as adding Y value in another, given the context in which (s)he was teaching. In other words, the validity of the inferences to be drawn about potentially any teacher depended greatly on context in which the teacher taught, in that his/her value-added estimate did not necessarily generalize across contexts. Put in their words, “it is not clear whether the signal of teachers’ effectiveness sent by their value-added rankings retains a substantive interpretation across contexts” (p. 326). Inversely put, “it is clear that labels such as highly effective or ineffective based on value-added scores do not have fixed meaning” (p. 351).

Regarding their second point of investigation, they found “stark differences in instructional practices across districts among teachers who received similar within-district value-added rankings” (p. 324). In other words, “when comparing [similar] teachers within districts, value-added rankings signaled differences in instructional quality in some but not all instances” (p. 351), whereas similarly ranked teachers did not necessarily display effective or ineffective teachers. This has also been more loosely evidenced via those who have investigated the correlations between teachers’ value-added and observational scores, and have found weak to moderate correlations (see prior posts on this here, here, here, and here). In the simplest of terms, “value-added categorizations did not signal common sets of instructional practices across districts” (p. 352).

The bottom line here, then, is that those in charge of making consequential decisions about teachers, as based even if in part on teachers’ value-added estimates, need to be cautious when making particularly high-stakes decisions about teachers as based on said estimates. A teacher, as based on the evidence presented in this particular study could logically but also legally argue that had (s)he been teaching in a different district, even within the same state and using the same assessment instruments, (s)he could have received a substantively different value-added score given the teacher(s) to whom (s)he was compared when estimating his/her value-added elsewhere. Hence, the validity of the inferences and statements asserting that one teacher was effective or not as based on his/her value-added estimates is suspect, again, as based on the contexts in which teachers teach, as well as when compared to whatever other comparable teachers to whom teachers are compared when estimating teacher-level value-added. “Here, the instructional quality of the lowest ranked teachers was not particularly weak and in fact was as strong as the instructional quality of the highest ranked teachers in other districts” (p. 353).

This has serious implications, not only for practice but also for the lawsuits ongoing across the nation, especially in terms of those pertaining to teachers’ wrongful terminations, as charged.

Citation: Blazar, D., Litke, E., & Barmore, J. (2016). What does it mean to be ranked a ‘‘high’’ or ‘‘low’’ value-added teacher? Observing differences in instructional quality across districts. American Educational Research Journal, 53(2), 324–359.  doi:10.3102/0002831216630407

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

VAMs in Higher Ed: Goose and Gander

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

A good one not to miss, from Diane Ravitch’s blog over the weekend:

Reader Chiara has a great idea. It will turn [pro-VAM] professors against VAM.

She writes:

“I’m wondering if there’s ever been any discussion of ranking teachers by student test scores in higher ed.

“It would obviously be more difficult to do, but one could use the tests that are used for graduate schools, right? How much “value” did undergrad professors add?

If you back this approach in K-12 why wouldn’t you back it in higher ed?”

Let us see the VAM scores for Raj Chetty, Jonah Rockoff, John Friedman, Thomas Kane, and all the other professors who endorse VAM. Be sure their ratings are posted in public. And while we are at it, all professors who testified against teacher tenure should give up their own tenure, On principle.

Goose and gander.

Even though the professors listed above likely do not teach many undergraduate courses, the logic is much the same, not only about using such tests to theoretically hold them (and other) professors accountable, but also regarding tenure and the protections tenure bring also in higher education. Hence, this certainly is cause for pause.

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

The “Vergara v. California” Decision Reversed: Another (Huge) Victory in Court

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

In June of 2014, defendants in “Vergara v. California” in Los Angeles, California lost their case. As a reminder, plaintiffs included nine public school students (backed by some serious corporate reformer funds as per Students Matter) who challenged five California state statutes that supported the state’s “ironclad [teacher] tenure system.” The prosecution’s argument was that students’ rights to a good education were being violated by teachers’ job protections…protections that were making it too difficult to fire “grossly ineffective” teachers. The prosecution’s suggested replacement to the “old” way of doing this, of course, was to use value-added scores to make “better” decisions about which teachers to fire and whom to keep around.

In February of 2016, “Vergara v. California” was appealed, back in Los Angeles.

Released, yesterday, was the Court of Appeal’s decision reversing the trial court’s earlier decision. As per an email I received also yesterday from one of the lawyers involved, “The unanimous decision holds that the plaintiffs did not establish their equal protection claim because they did not show that the challenged [“ironclad” tenure] laws themselves cause harm to poor students or students of color.” Accordingly, the Court of Appeal “ordered that judgment be entered for the defendants (the state officials and teachers’ unions)…[and]…this should end the case, and copycat cases in other parts of the country [emphasis added].” However, plaintiffs have already announced their intent to appeal this ruling to the California Supreme Court.

Please find attached here, as certified for publication, the actual Court of Appeal decision. See also a post here about this reversal authored by California teachers’ unions. See also here more information released by the California Teachers Association.

See also the amicus brief that a large set of deans and professors across the country contributed to/signed to help in this reversal.

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

The Marzano Observational Framework’s Correlations with Value-Added

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

“Good news for schools using the Marzano framework,”…according to Marzano researchers. “One of the largest validation studies ever conducted on [the] observation framework shows that the Marzano model’s research-based structure is correlated with state VAMs.” See this claim, via the full report here.

The more specific claim is as follows: Study researchers found a “strong [emphasis added, see discussion forthcoming] correlation between Dr. Marzano’s nine Design Questions [within the model’s Domain 1] and increased student achievement on state math and reading scores.” All correlations were positive, with the highest correlation just below r = 0.4 and the lowest correlation just above r = 0.0. See the actual correlations illustrated here:

Screen Shot 2016-04-13 at 7.59.37 AM

 

 

 

 

 

See also a standard model to categorize such correlations, albeit out of any particular context, below. Using this, one can see that the correlations observed were indeed small to moderate, but not “strong” as claimed. Elsewhere, as also cited in this report, other observed correlations from similar studies on the same model ranged from r = 0.13 to 0.15, r = 0.14 to 0.21, and r = 0.21 to 0.26. While these are also noted as statistically significant, using the table below one can determine that statistical significance does not necessarily mean that such “very weak” to “weak” correlations are of much practical significance, especially if and when high-stakes decisions about teachers and their effects are to be attached to such evidence.

Screen Shot 2016-04-13 at 8.28.23 AM

Likewise, if such results (i.e., 0.0 < r < 0.4) sound familiar, they should, as a good number of researchers have set out to explore similar correlations in the past, using different value-added and observational data, and these researchers have also found similar zero-to-moderate (i.e., 0.0 < r < 0.4), but not (and dare I say never) “strong” correlations. See prior posts about such studies, for example, here, here, and here. See also the authors’ Endnote #1 in their report, again, here.

As the authors write: “When evaluating the validity of observation protocols, studies [sic] typically assess the correlations between teacher observation scores and their value-added scores.” This is true, but this is true only in that such correlations offer only one piece of validity evidence.

Validity, or rather evidencing that something from which inferences are drawn is in fact valid, is MUCH more complicated than simply running these types of correlations. Rather, the type of evidence that these authors are exploring is called convergent-related evidence of validity; however, for something to actually be deemed valid, MUCH more validity evidence is needed (e.g., content-, consequence-, predictive-, etc.- related evidence of validity). See, for example, some of the Educational Testing Service (ETS)’s Michael T. Kane’s work on validity here. See also The Standards for Educational and Psychological Testing developed by the American Educational Research Association (AERA), American Psychological Association (APA), and National Council on Measurement in Education (NCME) here.

Instead, in this report the authors write that “Small to moderate correlations permit researchers to claim that the framework is validated (Kane, Taylor, Tyler, & Wooten, 2010).” This is false. As well, this unfortunately demonstrates a very naive conception and unsophisticated treatment of validity. This is also illustrated in that the authors use one external citation, authored by Thomas Kane NOT the aforementioned validity expert Michael Kane, to allege that validity can be (and is being) claimed. Here is the actual Thomas Kane et al. article the Marzano authors reference to support their validity claim, also noting that nowhere in this piece do Thomas Kane et al. make this actual claim. In fact, a search for “small” or “moderate” correlations yields zero total hits.

In the end, what can be more fairly and appropriately asserted from this research report is that the Marzano model is indeed correlated with value-added estimates, and their  correlation coefficients fall right in line with all other correlation coefficients evidenced via  other current studies on this topic, again, whereby researchers have correlated multiple observational models with multiple value-added estimates. These correlations are small to moderate, and certainly not “strong,” and definitely not “strong” enough to warrant high-stakes decisions (e.g., teacher termination) given everything (i.e., the unexplained variance) that is still not captured among these multiple measures…and that still threatens the validity of the inferences to be drawn from these measures combined.

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

Kane Is At It, Again: “Statistically Significant” Claims Exaggerated to Influence Policy

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

In a recent post, I critiqued a fellow academic and value-added model (VAM) supporter — Thomas Kane, an economics professor from Harvard University who also directed the $45 million worth of Measures of Effective Teaching (MET) studies for the Bill & Melinda Gates Foundation. Kane has been the source of multiple posts on this blog (see also here, here, and here) as he is a very public figure, very often backing, albeit often not in non-peer-reviewed technical reports and documents, series of exaggerated, and “research-based” claims. In this prior post, I more specifically critiqued the overstated claims he made in a recent National Public Radio (NPR) interview titled: “There Is No FDA For Education. Maybe There Should Be.”

Well, a colleague recently emailed me another such document authored by Kane (and co-written with four colleagues), titled: “Teaching Higher: Educators’ Perspectives on Common Core Implementation.” While this one is quite methodologically sound (i.e., as assessed via a thorough read of the main text of the document, including all footnotes and appendices), it is Kane’s set of claims, again, that are of concern, especially knowing that this report, even though it too has not yet been externally vetted or reviewed, will likely have a policy impact. The main goal of this report is clearly (although not made explicit) to endorse, promote, and in many ways save the Common Core State Standards (CCSS). I emphasize the word save in that clearly, and especially since the passage of the Every Student Succeeds Act (ESSA), many states have rejected the still highly controversial Common Core. I also should note that researchers in this study clearly conducted this study with similar a priori conclusions in mind (i.e., that the Common Core should be saved/promoted); hence, future peer review of this piece may be out of the question as the bias evident in the sets of findings would certainly be a “methodological issue,” again, likely preventing a peer-reviewed publication (see, for example, the a priori conclusion that “[this] study highlights an important advantage of having a common set of standards and assessments across multiple states,” in the abstract (p. 3).

First I will comment on the findings regarding the Common Core, as related to value-added models (VAMs). Next, I will comment on Section III of the report, about “Which [Common Core] Implementation Strategies Helped Students Succeed?” (p. 17). This is where Kane and colleagues “link[ed] teachers’ survey responses [about the Common Core] to their students’ test scores on the 2014–2015 PARCC [Partnership for Assessment of Readiness for College and Careers] and SBAC [Smarter Balanced Assessment Consortium] assessments [both of which are aligned to the Common Core Standards]… This allowed [Kane et al.] to investigate which strategies and which of the [Common Core-related] supports [teachers] received were associated with their performance on PARCC and SBAC,” controlling for a variety of factors including teachers’ prior value-added (p. 17).

With regards to the Common Core sections, Kane et al. lay claims like: “Despite the additional work, teachers and principals in the five states [that have adopted the Common Core = Delaware, Maryland, Massachusetts, New Mexico, and Nevada] have largely embraced [emphasis added] the new standards” (p. 3). They mention nowhere, however, the mediating set of influences interfering with such a claim, that likely lead to this claim entirely or at least in part – that many teachers across the nation have been forced, by prior federal and current state mandates (e.g., in New Mexico), to “embrace the new standards.” Rather, Kane et al. imply throughout the document that this “embracement” is a sure sign that teachers and principals are literally taking the Common Core into and within their open arms. The same interference is at play with their similar claim that “Teachers in the five study states have made major changes [emphasis in the original] in their lesson plans and instructional materials to meet the CCSS” (p. 3). Compliance is certainly a intervening factor, again, likely contaminating and distorting the validity of both of these claims (which are two of the four total claims highlighted throughout the document’s (p. 3)).

Elsewhere, Kane et al. claim that “The new standards and assessments represent a significant challenge for teachers and students” (p. 6), along with an accompanying figure they use to illustrate how proficiency (i.e., the percent of students labeled as proficient) on these five states’ prior tests has decreased, indicating more rigor or a more “significant challenge for teachers and students” thanks to the Common Core. What they completely ignore again, however, is that the cut scores used to define “proficiency” are arbitrary per state, as was their approach to define “proficiency” across states in comparison (see footnote four). What we also know from years of research on such tests is that whenever a state introduces a “new and improved” test (e.g., the PARCC and SBAC tests), which is typically tied to “new and improved standards” (e.g., the Common Core), lower “proficiency” rates are observed. This has happened countless times across states, and certainly prior to the introduction of the PARCC and SBAC tests. Thereafter, the state typically responds with the same types of claims, that “The new standards and assessments represent a significant challenge for teachers and students.” These claims are meant to signal to the public that at last “we” are holding our teachers and students accountable for their teaching and learning, but thereafter, again, proficiency cut scores are arbitrarily redefined (among other things), and then five or ten years later “new and improved” tests and standards are needed again. In other words, this claim is nothing new and it should not be interpreted as such, but it should rather be interpreted as aligned with Einstein’s definition of insanity (i.e., repeating the same behaviors over and over again in the hopes that different results will ultimately materialize) as this is precisely what we as a nation have been doing since the minimum competency era in the early 1980s.

Otherwise, Kane et al.’s other two claims were related to “Which [Common Core] Implementation Strategies Helped Students Succeed” (p. 17), as mentioned. They assert first that “In mathematics, [they] identified three markers of successful implementation: more professional development days, more classroom observations with explicit feedback tied to the Common Core, and the inclusion of Common Core-aligned student outcomes in teacher evaluations. All were associated with statistically significantly [emphasis added] higher student performance on the PARCC and [SBAC] assessments in mathematics” (p. 3, see also p. 20). They assert second that “In English language arts, [they] did not find evidence for or against any particular implementation strategies” (p. 3, see also p. 20).

What is highly problematic about these claims is that the three correlated implementation strategies noted, again as significantly associated with teachers’ students’ test-based performance on the PARCC and SBAC mathematics assessments, were “statistically significant” (determined by standard p or “probability” values under which findings that may have happened due to chance are numerically specified). But, they were not really practically significant, at all. There IS a difference whereby “statistically significant” findings may not be “practically significant,” or in this case “policy relevant,” at all. While many misinterpret “statistical significance” as an indicator of strength or importance, it is not. Practical significance is.

As per the American Statistical Association’s (ASA) recently released “Statement on P-Values,” statistical significance “is not equivalent to scientific, human, or economic significance…Any effect, no matter how tiny, can produce a small p-value [i.e., “statistical significance”] if the sample size or measurement precision is high enough” (p. 10); hence, one must always check for practical significance when making claims about statistical significance, like Kane et al. actually do here, but do here in a similar inflated vein.

As their Table 6 shows (p. 20), the regression coefficients related to these three areas of “statistically significant” influence on teachers’ students’ test-based performance on the new PARCC and SBAC mathematics tests (i.e., more professional development days, more classroom observations with explicit feedback tied to the Common Core, and the inclusion of Common Core-aligned student outcomes in teacher evaluations) yielded the following coefficients, respectively: 0.045 (p < 0.01), 0.044 (p < 0.05), and 0.054 (p < 0.01). They then use as an example the 0.044 (p < 0.05) coefficient (as related to more classroom observations with explicit feedback tied to the Common Core) and explain that “a difference of one standard deviation in the observation and feedback index was associated with an increase of 0.044 standard deviations in students’ mathematics test scores—roughly the equivalent of 1.4 scale score points on the PARCC assessment and 4.1 scale score points on the SBAC.”

In order to generate sizable and policy relevant improvement in test scores, (e.g., by half of a standard deviation), the observation and feedback index should jump up by 11 standard deviations! In addition, given that scale score points do not equal raw or actual test items (e.g., scale score-to-actual test item relationships are typically in the neighborhood of 4 or 5 scale scores points to 1 actual test item), this likely also means that Kane’s interpretations (i.e., mathematics scores were roughly the equivalent of 1.4 scale score points on the PARCC and 4.1 scale score points on the SBAC) actually mean 1/4th or 1/5th of a test item in mathematics on the PARCC and 4/5th of or 1 test item on the SBAC. This hardly “Provides New Evidence on Strategies Related to Improved Student Performance,” unless you define improved student performance as something as little as 1/5th of a test item.

This is also not what Kane et al. claim to be “a moderately sizeable effect!” (p. 21). These numbers should not even be reported, much less emphasized as policy relevant/significant, unless perhaps equivalent to at least 0.25 standard deviations on the test (as a quasi-standard/accepted minimum). Likewise, the same argument can be made about the other three coefficients derived via these mathematics tests. See also similar claims that they asserted (e.g., that “students perform[ed] better when teachers [were] being evaluated based on student achievement” (p. 21).

Because the abstract (and possibly conclusions) section are the main sections of this paper that are likely to have the most educational/policy impact, especially when people do not read all of the text, footnotes, and abstract contents of this entire document, this is irresponsible, and in many ways contemptible. This is also precisely the reason why, again, Kane’s calls for a Federal Drug Administration (FDA) type of entity for education are also so ironic (as explained in my prior post here).

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

What ESSA Means for Teacher Evaluation and VAMs

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

Within a prior post, I wrote in some detail about what the Every Student Succeeds Act (ESSA) means for the U.S., as well as states’ teacher evaluation systems as per the federally mandated adoption and use of growth and value-added models (VAMs) across the U.S., after President Obama signed it into law in December.

Diane Ravitch recently covered, in her own words, what ESSA means for teacher evaluations systems as well, in what she called Part II of a nine Part series on all key sections of ESSA (see Parts I-IX here). I thought Part II was important to share with you all, especially given this particular post captures that in which followers of this blog are most interested, although I do recommend that you all also see what the ESSA means for other areas of educational progress and reform in terms of the Common Core, teacher education, charter schools, etc. in her Parts I-IX.

Here is what she captured in her Part II post, however, copied and pasted here from her original post:

The stakes attached to testing: will teachers be evaluated by test scores, as Duncan demanded and as the American Statistical Association rejected? Will teachers be fired because of ratings based on test scores?

Short Answer:

The federal mandate on teacher evaluation linked to test scores, as created in the waivers, is eliminated in ESSA.

States are allowed to use federal funds to continue these programs, if they choose, or completely change their strategy, but they will no longer be required to include these policies as a condition of receiving federal funds. In fact, the Secretary is explicitly prohibited from mandating any aspect of a teacher evaluation system, or mandating a state conduct the evaluation altogether, in section 1111(e)(1)(B)(iii)(IX) and (X), section 2101(e), and section 8401(d)(3) of the new law.

Long Answer:

Chairman Alexander has been a long advocate of the concept, as he calls it, of “paying teachers more for teaching well.” As governor of Tennessee he created the first teacher evaluation system in the nation, and believes to this day that the “Holy Grail” of education reform is finding fair ways to pay teachers more for teaching well.

But he opposed the idea of creating or continuing a federal mandate and requiring states to follow a Washington-based model of how to establish these types of systems.

Teacher evaluation is complicated work and the last thing local school districts and states need is to send their evaluation system to Washington, D.C., to see if a bureaucrat in Washington thinks they got it right.

ESSA ends the waiver requirements on August 2016 so states or districts that choose to end their teacher evaluation system may. Otherwise, states can make changes to their teacher evaluation systems, or start over and start a new system. The decision is left to states and school districts to work out.

The law does continue a separate, competitive funding program, the Teacher and School Leader Incentive Fund, to allow states, school districts, or non-profits or for-profits in partnership with a state or school district to apply for competitive grants to implement teacher evaluation systems to see if the country can learn more about effective and fair ways of linking student performance to teacher performance.

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

The ACT Testing Corporation (Unsurprisingly) Against America’s Opt-Out Movement

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

The Research and Evaluation section/division of the ACT testing corporation — ACT, Inc., the nonprofit also famously known for developing the college-entrance ACT test — recently released a policy issue brief titled “Opt-Outs: What Is Lost When Students Do Not Test.” What an interesting read, especially given ACT’s position and perspective as a testing company, also likely being impacted by America’s opt-out-of-testing movement. Should it not be a rule that people writing on policy issues should disclose all potential conflicts-of-interest? They did not here…

Regardless, last year throughout the state of New York, approximately 20% of students opted out of statewide testing. In the state of Washington more than 25% of students opted out. Large and significant numbers of students also opted out in Colorado, Florida, Oregon, Maine, Michigan, New Jersey, and New Mexico. Students are opting out, primarily because of community, parent, and student concerns about the types of tests being administered, the length and number of the tests administered, the time that testing and testing preparation takes away from classroom instruction, and the like.

Because many states also rely on ACT tests for statewide, not just college entrance exam purposes, clearly this is of concern to ACT, Inc. But rather than the corporation rightfully positioning itself on this matter as a company with clear vested interests, ACT Issue Brief author Michelle Croft frames the piece as a genuine plead to help others understand why they should reject the opt out movement, not opt out their own children, generally help to curb and reduce the nation’s opt-out movement, and the like, given the movement’s purportedly negative effects.

Here are some of the reasons ACT’s Croft give in support of not opting out, along with my research-informed commentaries per reason:

  • Scores on annual statewide achievement tests can provide parents, students, educators, and policymakers with valuable information—but only if students participate. What Croft does not note here is that such large scale standardized test scores, without taking into account growth over time (an argument that actually works in favor of VAMs), are so highly correlated with student test-takers’ demographics that test scores do not often tell us much that we would not have known otherwise from what student demographics alone tell us. This is a very true, and also very unfortunate reality, whereby with a small set of student demographics we can actually predict with great (albeit imperfect) certainty students’ test scores without students taking the tests. In other words, if 100% of students opted out, we could still use some of even our most rudimentary statistical techniques to determine what students’ scores would have been regardless; hence, this claim is false.
  • Statewide test scores are one of the most readily available forms of data used by educators to help inform instruction. This is also patently false. Teachers, on average and as per the research, do not use the “[i]ndividual student data [derived via these tests] to identify general strengths and weaknesses, [or to] identify students who may need additional support” for many reasons, including the fact that test scores often come back to teachers after their tested students have moved onto the next grade level. This is also especially true when these tests, as compared to tests that are administered not at the state, but at the district, school, or classroom levels, yield data that is much more instructionally useful. What Croft  does not note is that many research studies, and researchers, have evidenced that the types of tests at the source of the opt out movement are the tests that are also the least instructionally useful (see a prior post on this topic here). Accordingly, Croft’s claim here also contradicts recent research written by some of the luminaries in the field of educational measurement, who collectively support the design of more instructionally useful and sensitive tests in general, to combat the perpetual claims like these surrounding large scale standardized tests (see here).
  • Statewide test scores allow parents and educators to see how students measure up
    to statewide academic standards intended for all students in the state…[by providing] information about a student’s, school’s, or district’s standing compared to others in the state (or across states, if the assessment is used by more than one). See my first argument about student-level demographics, as the same holds true here. Whether these tests are better indicators of what students learned or students’ demographics is certainly of debate, and unfortunately most of the research evidence supports the latter (unless, perhaps, VAMs or growth models are used to measure large scale growth over time).
  • Another benefit…is that the data gives parents an indicator of school quality that can
    help in selecting a school for their children. See my prior argument, again, especially in that test scores are also highly correlated with property/house values; hence, with great certainty one can pick a school just by picking a home one can afford or a neighborhood in which one would like to live, regardless of test scores, as the test scores of the surrounding schools will ultimately reveal themselves to match said property/house values.
  • While grades are important, they [are not as objective as large-scale test scores because they] can also be influenced by a variety of factors unrelated to student achievement, such as grade inflation, noncognitive factors separate from achievement (such as attendance and timely completion of assignments), unintentional bias, or unawareness of performance expectations in subsequent grades (e.g., what it means to be prepared for college). Large-scale standardized tests, of course, are not subject to such biases and unrelated influences, we are to assume and accept as an objective truth.
  • Opt-outs threaten the overall accuracy—and therefore the usefulness—of the data provided. Indeed, this is true, and also one of the arguably positive side-effects of the opt out movement, whereby without large enough samples of students participating in such tests, the extent to which test companies and others can make generalizable results about, in this case, larger student populations is statistically limited. Given the fact that we have been relying on large-scale standardized tests to reform America’s education system for now over the past 30 years, yet we continue to face an “educational crisis” across America’s public schools, perhaps test-based reform policies are not the solution that testing companies like ACT, Inc. continue to argue they are. While perpetuating this argument in favor of reform is financially wise and lucrative, all at the taxpayer’s expense, no to very little research exists to support that using such large scale test-based information helps to reform or improve much of anything.
  • Student assessment data allows for rigorous examination of programs and policies to ensure that resources are allocated towards what works. The one thing large scale standardized tests do help us do, especially as researchers and program evaluators, is help us examine and assess large-scale programs’ and other reform efforts’ impacts. Whether students should have to take tests for just this purpose, however, may also not be worth the nation’s and states’ financial and human resources and investments. With this, most scholars also agree, but more so now when VAMs are used for such large-scale research and evaluation purposes. VAMs are, indeed, a step in the right direction when we are talking about large-scale research.

Author Croft, on behalf of ACT, then makes a series of recommendations to states regarding such large scale testing, again, to help curb the opt out movement. Here are their four recommendations, again, alongside my research-informed commentaries per recommendation:

  • Districts should reduce unnecessary testing. Interesting, here, is that that states are not listed as an additional entity that should reduce unnecessary testing. See my prior comments, especially the one regarding the most instructionally useful tests being at the classroom, school, and/or district levels.
  • Educators and policymakers should improve communication with parents
    about the value gained from having all students take the assessments. Unfortunately, I would not start with the list provided in this piece. Perhaps this blog post will help, however, present a fairer interpretation of their recommendations and the research-based truths surrounding them.
  • Policymakers should discourage opting out…States that allow opt-outs should avoid creating laws, policies, or communications that suggest an endorsement of the practice. Such a recommendation is remiss, in my opinion, given the vested interests of the company making this recommendation.
  • Policymakers should support appropriate uses of test scores. I think we can all agree with this one, although large scale tests scores should not be used and promoted for accountability purposes, as also suggested herein, given the research does not support that doing this actually works either. For a great, recent post on this, click here.

In the end, all of these recommendations, as well as reasons that the opt out movement should be thwarted, are coming via an Issue Brief authored and sponsored by a large scale testing company. This fact, in and of itself, puts everything they position as a set of disinterested recommendations and reasons, at question. This is unfortunate, for ACT Inc., and their roles as the authors and sponsors of this piece.

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

Some Lawmakers Reconsidering VAMs in the South

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

A few weeks ago in Education Week, Stephen Sawchuk and Emmanuel Felton wrote a post in the its Teacher Beat blog about lawmakers, particularly in the southern states, who are beginning to reconsider, via legislation, the role of test scores and value-added measures in their states’ teacher evaluation systems. Perhaps the tides are turning.

I tweeted this one out, but I also pasted this (short) one below to make sure you all, especially those of you teaching and/or residing in states like Georgia, Oklahoma, Louisiana, Tennessee, and Virginia, did not miss it.

Southern Lawmakers Reconsidering Role of Test Scores in Teacher Evaluations

After years of fierce debates over effectiveness and fairness of the methodology, several southern lawmakers are looking to minimize the weight placed on so called value-added measures, derived from how much students’ test scores changed, in teacher-evaluation systems.

In part because these states are home to some of the weakest teachers unions in the country, southern policymakers were able to push past arguments that the state tests were ill suited for teacher-evaluation purposes and that the system would punish teachers for working in the toughest classrooms. States like Louisiana, Georgia and Tennessee, became some of the earliest and strongest adopters of the practice. But in the past few weeks, lawmakers from Baton Rouge, La., to Atlanta have introduced bills to limit the practice.

In February, the Georgia Senate unanimously passed a bill that would reduce the student-growth component from 50 percent of a teachers’ evaluation down to 30 percent. Earlier this week, nearly 30 individuals signed up to speak on behalf fo the bill at a State House hearing.

Similarly, Louisiana House Bill 479 would reduce student-growth weight from 50 percent to 35 percent. Tennessee House Bill 1453 would reduce the weight of student-growth data through the 2018-2019 school year and would require the state Board of Education to produce a report evaluating the policy’s ongoing effectiveness. Lawmakers in Florida, Kentucky, and Oklahoma have introduced similar bills, according to the Southern Regional Education Board’s 2016 educator-effectiveness bill tracker.

By and large, states adopted these test-score centric teacher-evaluation systems to attain waivers from No Child Left Behind’s requirement that all students by proficient by 2014. To get a waiver, states had to adopt systems that evaluated teachers “in significant part, based on student growth.” That has looked very different from state to state, ranging from 20 percent in Utah to 50 percent in states like Alaska, Tennessee, and Louisiana.

No Child Left Behind’s replacement, the Every Student Succeeds Act, doesn’t require states to have a teacher-evaluation system at all, but, as my colleague Stephen Sawchuk reported, the nation’s state superintendents say they remain committed to maintaining systems that regularly review teachers.

But, as Sawchuk reported, Steven Staples, Virginia’s state superintendent, signaled that his state may move away from its current system where student test scores make up 40 percent of a teacher’s evaluation:

“What we’ve found is that through our experience [with the NCLB waivers], we have had some unintended outcomes. The biggest one is that there’s an over-reliance on a single measure; too many of our divisions defaulted to the statewide standardized test … and their feedback was that because that was a focus [of the federal government], they felt they needed to emphasize that, ignoring some other factors. It also drove a real emphasis on a summative, final evaluation. And it resulted in our best teachers running away from our most challenged.”

Some state lawmakers appear to absorbing a similar message

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit