Identifying Effective Teacher Preparation Programs Using VAMs Does Not Work

A New Study [does not] Show Why It’s So Hard to Improve Teacher Preparation” Programs (TPPs). More specifically, it shows why using value-added models (VAMs) to evaluate TPPs, and then ideally improving them using the value-added data derived, is nearly if not entirely impossible.

This is precisely why yet another, perhaps, commonsensical but highly improbable federal policy move to imitate great teacher education programs and shut down ineffective ones, as based on their graduates’ students test-based performance over time (i.e., value-added) continues to fail.

Accordingly, in another, although not-yet peer-reviewed or published study referenced in the article above, titled “How Much Does Teacher Quality Vary Across Teacher Preparation Programs? Reanalyzing Estimates from [Six] States,” authors Paul T. von Hippel, from the University of Texas at Austin, and Laura Bellows, a PhD Student from Duke University, investigated “whether the teacher quality differences between TPPs are large enough to make [such] an accountability system worthwhile” (p. 2). More specifically, using a meta-analysis technique, they reanalyzed the results of such evaluations in six of the approximately 16 states doing this (i.e., in New York, Louisiana, Missouri, Washington, Texas, and Florida), each of which ultimately yielded a peer-reviewed publication, and they found “that teacher quality differences between most TPPs [were] negligible [at approximately] 0-0.04 standard deviations in student test scores” (p. 2).

They also highlight some of the statistical practices that exaggerated the “true” differences noted between TPPs in each of these but also these types of studies in general, and consequently conclude that the “results of TPP evaluations in different states may vary not for substantive reasons, but because of the[se] methodological choices” (p. 5). Likewise, as is the case with value-added research in general, when “[f]aced with the same set of results, some authors may [also] believe they see intriguing differences between TPPs, while others may believe there is not much going on” (p. 6). With that being said, I will not cover these statistical/technical issue more here. Do read the full study for these details, though, as also important.

Related, they found that in every state, the variation that they statistically observed was greater among relatively small TPPs versus large ones. They suggest that this occurs, accordingly, due to estimation or statistical methods that may be inadequate for the task at hand. However, if this is true this also means that because there is relatively less variation observed among large TPPs, it may be much more difficult “to single out a large TPP that is significantly better or worse than average” (p. 30). Accordingly, there are
several ways to mistakenly single out a TPP as exceptional or less than, merely given TPP size. This is obviously problematic.

Nonetheless, the authors also note that before they began this study, in Missouri, Texas, and Washington, that “the differences between TPPs appeared small or negligible” (p. 29), but in Louisiana and New York “they appeared more substantial” (p. 29). After their (re)analyses, however, their found that the results from and across these six different states were “more congruent” (p. 29), as also noted prior (i.e., differences between TPPs around 0 and 0.04 SDs in student test scores).

“In short,” they conclude, that “TPP evaluations may have some policy value, but the value is more modest than was originally envisioned. [Likewise, it] is probably not meaningful to rank all the TPPs in a state; the true differences between most TPPs are too small to matter, and the estimated differences consist mostly of noise” (p. 29). As per the article cited prior, they added that “It appears that differences between [programs] are rarely detectable, and that if they could be detected they would usually be too small to support effective policy decisions.”

To see a study similar to this, that colleagues and I conducted in Arizona, and that was recently published in Teaching Education, see “An Elusive Policy Imperative: Data and Methodological Challenges When Using Growth in Student Achievement to Evaluate Teacher Education Programs’ ‘Value-Added” summarized and referenced here.

New Article Published on Using Value-Added Data to Evaluate Teacher Education Programs

A former colleague, a current PhD student, and I just had an article released about using value-added data to (or rather not to) evaluate teacher education/preparation, higher education programs. The article is titled “An Elusive Policy Imperative: Data and Methodological Challenges When Using Growth in Student Achievement to Evaluate Teacher Education Programs’ ‘Value-Added,” and the abstract of the article is included below.

If there is anyone out there who might be interested in this topic, please note that the journal in which this piece was published (online first and to be published in its paper version later) – Teaching Education – has made the article free for its first 50 visitors. Hence, I thought I’d share this with you all first.

If you’re interested, do access the full piece here.

Happy reading…and here’s the abstract:

In this study researchers examined the effectiveness of one of the largest teacher education programs located within the largest research-intensive universities within the US. They did this using a value-added model as per current federal educational policy imperatives to assess the measurable effects of teacher education programs on their teacher graduates’ students’ learning and achievement as compared to other teacher education programs. Correlational and group comparisons revealed little to no relationship between value-added scores and teacher education program regardless of subject area or position on the value-added scale. These findings are discussed within the context of several very important data and methodological challenges researchers also made transparent, as also likely common across many efforts to evaluate teacher education programs using value-added approaches. Such transparency and clarity might assist in the creation of more informed value-added practices (and more informed educational policies) surrounding teacher education accountability.

Deep Pockets, Corporate Reform, and Teacher Education

A colleague whom I have never formally met, but with whom I’ve had some interesting email exchanges with over the past few months — James D. Kirylo, Professor of Teaching and Learning in Louisiana — recently sent me an email I read, and appreciated; hence, I asked him to turn it into a blog post. He responded with a guest post he has titled “Deep Pockets, Corporate Reform, and Teacher Education,” pasted below. Do give this a read, and a social media share, as this one is deserving of some legs.

Here is what he wrote:

Money is power. Money is influence. Money shapes direction. Notwithstanding the influential nature of it in the electoral process, one only needs to see how bags of dough from the mega-rich-one-percenters—largely led by Bill Gates—have bought their way in their attempt to corporatize K-12 education (see, for example, here).  

This corporatization works to defund public education, grossly blames teachers for all that ails society, is obsessed with testing, and aims to privatize.  And next on the corporatized docket: teacher education programs.

In a recent piece by Valerie Strauss, “Gates Foundation Puts Millions of Dollars into New Education Focus: Teacher Preparation,” she sketches how Gates is awarding $35 million to a three-year project called Teacher Preparation Transformation Centers funneled through five different projects, one of which is the Texas Tech based University-School Partnerships for the Renewal of Educator Preparation (U.S. Prep) National Center.

A framework that will guide this “renewal” of educator preparation comes from the National Institute for Excellence in Teaching (NIET), along with the peddling of their programs, The System for Teacher and Student Advancement (TAP) and Student and Best Practices Center (BPC). Yet, again, coming from another guy with oodles of money, leading the charge of NIET is Lowell Milken who is Chairmen and TAP founder (see, for example, here).

The state of Louisiana serves as an example on how NIET is already working overtime in chipping its way into K-12 education. One can spend hours at the Louisiana Department of Education (LDE) website and view the various links on how TAP is applying a full-court-press in hyping its brand (see, for example, here).  

And now that TAP has entered the K-12 door in Louisiana, the brand is now squiggling its way into teacher education preparation programs, namely through the Texas Tech based U.S. Prep National Center. This Gates Foundation backed project involves five teacher education programs in the country (Southern Methodist University, University of Houston, Jackson State University, and the University of Memphis, including one in Louisiana (Southeastern Louisiana University) (see more information about this here).  

Therefore, teacher educators must be “trained” to use TAP in order to “rightly” inculcate the prescription to teacher candidates.

TAP: Four Elements of Success

TAP principally plugs four Elements of Success: Multiple Career Paths (for educators as career, mentor and master teachers); Ongoing Applied Professional Growth (through weekly cluster meetings, follow-up support in the classroom, and coaching); Instructionally Focused Accountability (through multiple classroom observations and evaluations utilizing a research based instrument and rubric that identified effective teaching practices); and, Performance-Based Compensation (based on multiple; measures of performance, including student achievement gains and teachers’ instructional practices).

And according to the TAP literature, the elements of success “…were developed based upon scientific research, as well as best practices from the fields of education, business, and management” (see, for example, here). Recall, perhaps, that No Child Left Behind (NCLB) was also based on “scientific-based” research. Enough said. It is also interesting to note their use of the words “business” and “management” when referring to educating our children. Regardless, “The ultimate goal of TAP is to raise student achievement” so students will presumably be better equipped to compete in the global society (see, for example, here). 

While each element is worthy of discussion, a brief comment is in order on the first element Multiple Career Paths and fourth element, Performance-Based Compensation. Regarding the former, TAP has created a mini-hierarchy within already-hierarchical school systems (which most are) in identifying three potential sets of teachers, to reiterate from the above: a “career” teacher; a “mentor” teacher, and a “master” teacher. A “career” teacher as opposed to what? As opposed to a “temporary” teacher, a Teach For America (TFA) teacher, a substitute teacher? But, of course, according to TAP, as opposed to a “mentor” teacher and a “master” teacher.

This certainly begs the question: Why in the world would any parent want their child to be taught by a “career” teacher as opposed to a “mentor” teacher or better yet a “master” teacher? Wouldn’t we want “master” teachers in all our classrooms? To analogize, I would rather have a “master” doctor performing heart surgery on me than a “lowly” career doctor. Indeed, words, language, and concepts matter.

With respect to the latter, the notion of having an ultimate goal on raising student achievement is perhaps more than euphemistic on raising test scores, cultivating a test-centric way of doing things.

Achievement and VAM

That is, instead of focusing on learning, opportunity, developmentally appropriate practices, and falling in love with learning, “achievement” is the goal of TAP. Make no mistake, this is far from an argument on semantics. And this “achievement” linked to student growth to merit pay relies heavily on a VAM-aligned rubric.

Yet, there are multiple problems with VAM, an instrument that has been used in K-12 education since 2011. Among many other outstanding sources, one may simply want to check out this cleverly called blog here, “VAMboozled,” or see what Diane Ravitch has said about VAMs (among other places, see, for example, here), not to mention the well-visited site produced by Mercedes Schneider here. Finally, see the 2015 position statement issued by the American Educational Research Association (AERA) regarding VAMs here, as well as a similar statement issued by the American Statistical Association (ASA) here

Back to the Gates Foundation and the Texas Tech based (U.S. Prep) National Center, though. To restate, at the aforementioned university in Louisiana (though likely in the other four recruited institutions, as well), TAP will be the chief vehicle that will drive this process, and teacher education programs will be used as the host to prop the brand.

With presumably some very smart, well-educated, talented, and experienced professionals at respective teacher education sites, how is it possible that they capitulated to be the samples for the petri dish that will only work to enculturate the continuation of corporate reform, which will predictably lead to what Hofstra University Professor, Alan Singer, calls the “McDonaldization of Teacher Education“?

Strauss puts the question this way, “How many times do educators need to attempt to reinvent the wheel just because someone with deep pockets wants to try when the money could almost certainly be more usefully spent somewhere else?” I ask this same question, in this case, here.

National Council on Teacher Quality (NCTQ) Report on States’ Teacher Evaluation Systems

The controversial National Council on Teacher Quality (NCTQ) — created by the conservative Thomas B. Fordham Institute, funded (in part) by the Bill & Melinda Gates Foundation, and “part of a coalition for ‘a better orchestrated agenda’ for accountability, choice, and using test scores to drive the evaluation of teachers (see here; see also other instances of controversy here and here) — recently issued a 106 page document report titled: “State of the States 2015: Evaluating Teaching, Leading and Learning.” In this report, they present “the most comprehensive and up-to-date policy trends on how states [plus DC] are evaluating teachers” (p. i). The report also provides similar information about how principals are also being evaluated across states, but given the focus of this blog, I focus only on the information they offer regarding states’ teacher evaluation systems.

I want to underscore that this is, indeed, the most comprehensive and up-to-date report capturing what states are currently doing in terms of their teacher evaluation policies and systems; however, I would not claim all of the data included within are entirely accurate, although this is understandable given how very difficult it is to be comprehensive and remain up-to-date on this topic, especially across all 50 states (plus DC). Therefore, do consume the factual data included within this report as potentially incorrect, and certainly imperfect.

I also want to bring attention to the many figures included within this report. Many should find these of interest and use, again, given likely/understandable errors and inconsistencies. The language around these figures, as well as much of the other text littered throughout this document, however, should be more critically consumed, in context, and primarily given the original source of the report (i.e., the NCTQ — see above). In other words, while the figures may prove to be useful, the polemics around them are likely of less value…unless, of course, you want to read/analyze a good example of the non-research-based rhetoric and assumptions advanced by “the other side.”

For example, their Figure B (p. i) titled Figure 1 below, illustrates that as of 2015 there were “just five states – California, Iowa, Montana, Nebraska and Vermont – that still have [emphasis added] no formal state policy requiring that teacher evaluations take objective measures of student achievement [emphasis added] into account in evaluating teacher effectiveness. Only three states – Alabama, New Hampshire and Texas – have evaluation policies that exist only in waiver requests to the federal government” (p. ii).


Figure 1. Teacher effectiveness state policy trends (2009-2015)

In addition, “27 states [now] require annual evaluations for all teachers, compared to just 15 states in 2009;” “17 states include student growth as the preponderant criterion in teacher evaluations, up from only four states in 2009…An additional 18 states include growth measures as a “significant” criterion in teacher evaluations;” “23 states require that evidence of teacher performance be used in tenure decisions [whereas no] state had such a policy in 2009;” “19 states require that teacher performance is considered in reduction in force decisions;” and the “majority of states (28) now articulate that ineffectiveness is grounds for teacher dismissal” (p. 6). These are all indicative of “progress,” as per the NCTQ.

Nonetheless, here is another figure that should be of interest (see their Figure 27, p. 37, titled Figure 2 below), capturing when states adopted their (more or less) current teacher evaluation policies.


Figure 2. Timeline for state adoption of teacher evaluation policies

One of the best/worst parts of their report is in one of their conclusions, that there’s “a real downside for states that indulge critics [emphasis added] by delaying implementation, adopting hold harmless policies or reducing the weight of student achievement in evaluations. These short-term public relations solutions [emphasis added] reinforce the idea that there are a lot of immediate punitive consequences coming for teachers when performance-based evaluations are fully implemented, which is simply not the case (p. iii).

Ironic here is that they, immediately thereafter, insert their Figure D (p. v titled Figure 3 below), that they explicitly title “Connecting the Dots,” to illustrate all of the punitive consequences already at play across the nation (giving Delaware, Florida, and Louisiana (dis)honorable mentions for leading the nation when it comes to their policy efforts to connect the dots). Some of these consequences/connectors are also at the source of the now 15 lawsuits occurring across the nation because of the low-to-high stakes consequences being attached to these data (see information about these 15 lawsuits here).


Figure 3. “Connecting the dots”

To what “dots” are they explicitly referring, and arguably hoping that states better “connect?” See their Figure 23 (p. 29 titled Figure 4) below.


Figure 4. “Connecting the dots” conceptual framework

Delaware, Florida, and Louisiana, as mentioned prior, “lead the nation when it comes to using teacher effectiveness data to inform other policies. Each of these states connects evaluation to nine of 11 related areas” (p. 29) illustrated above.

Related, the NCTQ advance a set of non-at-all-research-based claims that “there has been some good progress on connecting the dots in the states, [but] unless pay scales change [to increase merit pay initiatives, abort traditional salary schedules, and abandon traditional bonus pay systems, which research also does not support], evaluation is only going to be a feedback tool [also currently false] when it could be so much more [also currently false]” (p. vi). They conclude that “too few states are willing to take on the issue of teacher pay and lift the teaching profession [emphasis added] by rewarding excellence” (p. vi).

Related, NCTQ also highlights the state-level trends when tying teacher effectiveness to dismissal policies, also as progress or movement in the right direction (see their Figure 6, p.8, titled Figure 5 below).


Figure 5. Trends in state policy tying teacher effectiveness to dismissal policies

Here they make reference to this occurring, despite the current opt-out movement (see, for example, here) that has been taken up by teacher unions, setting to undermine teacher evaluations, protect teachers, and put students at risk — especially poor and minority students — by stripping states, districts, and schools “of any means of accountability [emphasis added] for ensuring that all children learn” (p. 10).

Finally, they also note that “States could [should also] do a lot more to use evaluation data to better prepare future teachers. Only 14 states with evaluations of effectiveness (up from eight in 2013) have adopted policies connecting the performance of students to their teachers and the institutions where their teachers were trained [i.e., colleges of education]” (p. 31). While up from years prior, more should also be done in this potential area of “value-added” accountability, as well.

See also what Diane Ravitch called “a hilarious summary” of this same NCTQ report. It is written by Peter Greene, and posted on his Curmudgucation blog, here. See also here, here, and here for prior posts by Peter Greene.