One School’s Legitimately, “New and Improved” Teacher Evaluation System: In TIME Magazine

In an article featured this week in TIME Magazine titled “How Do You Measure a Teacher’s Worth?” author Karen Hunter Quartz – research director at the UCLA Community School and a faculty member in the UCLA Graduate School of Education – describes the legitimately, “new and improved” teacher evaluation system co-constructed by teachers, valued as professionals, in Los Angeles.

Below are what I read as the highlights, and also some comments re: the highlights, but please do click here for the full read as this whole article is in line with what many who research teacher evaluation systems support (see, for example, Chapter 8 in my Rethinking Value-Added Models in Education…).

“For the past five years, teachers at the UCLA Community School, in Koreatown, have been mapping out their own process of evaluation based on multiple measures — and building both a new system and their faith in it…this school is the only one trying to create its own teacher evaluation infrastructure, building on the district’s groundwork…[with] the evaluation process [fully] owned by the teachers themselves.”

“Indeed, these teachers embrace their individual and collective responsibility to advance exemplary teaching practices and believe that collecting and using multiple measures of teaching practice will increase their professional knowledge and growth. They are tough critics of the measures under development, with a focus on making sure the measures help make teachers better at their craft.”

Their new and improved system is based on three different kinds of data — student surveys, observations, and portfolio assessments. The latter includes an assignment teachers gave students, how teachers taught this assignment, and samples of the student work produced during/post the assignment given. Teachers’ portfolios were then scored by “educators trained at UCLA to assess teaching quality on several dimensions, including academic rigor and relevance. Teachers then completed a reflection on the scores they received, what they learned from the data, and how they planned to improve their practice.”

Hence, the “legitimate” part of the title of this post, in that this section is being externally vetted. As for the “new and improved” part of the title of this post, this comes from data indicating that “almost all teachers reported in a survey that they appreciated receiving multiple measures of their practice. Most teachers reported that the measures were a fair assessment of the quality of their teaching, and that the evaluation process helped them grow as educators.”

However, there was also “consensus that more information was needed to help them improve their scores. For example, some teachers wanted to know how to make assignments more relevant to students’ lives; others asked for more support reflecting on their observation transcripts.”

In the end, though, “[p]erhaps the most important accomplishment of this new system was that it restored teachers’ trust in the process of evaluation. Very few teachers trust that value-added measures — which are based on tests that are far removed from their daily work — can inform their improvement. This is an issue explored by researchers who are probing the unintended consequences of teacher accountability systems tied to value-added measures.”

Mirror, Mirror on the Wall…

No surprise, again, but Thomas Kane, an economics professor from Harvard University who also directed the $45 million worth of Measures of Effective Teaching (MET) studies for the Bill & Melinda Gates Foundation, is publicly writing in support of VAMs, again (redundancy intended). I just posted about one of his recent articles published on the website of the Brookings Institution titled “Do Value-Added Estimates Identify Causal Effects of Teachers and Schools?” after which I received another of his articles, this time published by the New York Daily News titled “Teachers Must Look in the Mirror.”

Embracing a fabled metaphor, while not to position teachers as the wicked queens or to position Kane as Snow White, let us ask ourselves the classic question:”Who is the fairest one of all?” as we critically review yet another fairytale authored by Harvard’s Kane. He has, after all, “carefully studied the best systems for rating teachers” (see other prior posts about Kane’s public perspectives on VAMs here and here).

In this piece, Kane continues to advance a series of phantasmal claims about the potentials of VAMs, this time in the state of New York where Governor Andrew Cuomo intends to take the state’s teacher evaluation system up to a system based 50% on teachers’ value-added, or 100% on value-added in cases where a teacher rated as “ineffective” in his/her value-added score can be rated as “ineffective” overall. Here,  value-added could be used to trump all else (see prior posts about this here and here).

According to Kane, Governor Cuomo “picked the right fight.” The state’s new system “will finally give schools the tools they need to manage and improve teaching.” Perhaps the magic mirror would agree with such a statement, but research would evidence it vain.

As I have noted prior, there is absolutely no evidence, thus far, indicating that such systems have any (in)formative use or value. These data are first and foremost designed for summative, or summary, purposes; they are not designed for formative use. Accordingly, the data that come from such systems — besides the data that come from the observational components still being built into these systems that have existed and been used for decades past — are not transparent, difficult to understand, and therefore challenging to use. Likewise, such data are not instructionally sensitive, and they are untimely in that test-based results typically come back to teachers well after their students have moved on to subsequent grade levels.

What about Kane’s claims against tenure: “The tenure process is the place to start. It’s the most important decision a principal makes. One poor decision can burden thousands of future students, parents, colleagues and supervisors.” This is quite an effect considering the typical teacher being held accountable using these new and improved teacher evaluation systems as based (in this case largely) on VAMs typically impacts only teachers at the elementary level who teach mathematics and reading/language arts. Even an elementary teacher with a career spanning 40 years with an average of 30 students per class would directly impact (or burden) 1,200 students, maximum. This is not to say this is inconsequential, but as consequential as Kane’s sensational numbers imply? What about the thousands of parents, colleagues, and supervisors also to be burdened by one poor decision? Fair and objective? This particular mirror thinks not.

Granted, I am not making any claims about tenure as I think all would agree that sometimes tenure can support, keeping with the metaphor, bad apples. Rather I take claim with the exaggerations, including also that “Traditionally, principals have used much too low a standard, promoting everyone but the very worst teachers.” We must all check our assumptions here about how we define “the very worst teachers” and how many of them really lurk in the shadows of America’s now not-so-enchanted forests. There is no evidence to support this claim, either, just conjecture.

As for the solution, “Under the new law, the length of time it will take to earn tenure will be lengthened from three to four years.” Yes, that arbitrary, one-year extension will certainly help… Likewise, tenure decisions will now be made better using classroom observations (the data that have, according to Kane in this piece, been used for years to make all of these aforementioned bad decisions) and our new fair and objective, test-based measures, which not accordingly to Kane, can only be used for about 30% of all teachers in America’s public schools. Nonetheless, “Student achievement gains [are to serve as] the bathroom scale, [and] classroom observations [are to serve] as the mirror.”

Kane continues, scripting, “Although the use of test scores has received all the attention, [one of] the most consequential change[s] in the law has been overlooked: One of a teacher’s observers must now be drawn from outside his or her school — someone whose only role is to comment on teaching.” Those from inside the school were only commenting on one’s beauty and fairness prior, I suppose, as “The fact that 96% of teachers were given the two highest ratings last year — being deemed either “effective” or “highly effective” — is a sure sign that principals have not been honest to date.”

All in all, perhaps somebody else should be taking a long hard “Look in the Mirror,” as this new law will likely do everything but “[open] the door to a renewed focus on instruction and excellence in teaching” despite the best efforts of “union leadership,” although I might add to Kane’s list many adorable little researchers who have also “carefully studied the best systems for rating teachers” and more or less agree on their intended and unintended results in…the end.

Educator Evaluations (and the Use of VAM) Unlikely to be Mandated in Reauthorization of ESEA

In invited a colleague of mine – Kimberly Kappler Hewitt (Assistant Professor, University of North Carolina, Greensboro) – to write a guest post for you all, and she did on her thoughts regarding what is currently occurring on Capitol Hill regarding the reauthorization of the Elementary and Secondary Education Act (ESEA). Here is what she wrote:

Amidst what is largely a bitterly partisan culture on Capitol Hill, Republicans and Democrats agree that teacher evaluation is unlikely to be mandated in the reauthorization of the Elementary and Secondary Education Act (ESEA), the most recent iteration of which is No Child Left Behind (NCLB), signed into law in 2001. See here for an Education Week article by Lauren Camera on the topic.

In another piece on the topic (here), the same author Camera explains: “Republicans, including Chairman Lamar Alexander, R-Tenn., said Washington shouldn’t mandate such policies, while Democrats, including ranking member Patty Murray, D-Wash., were wary of increasing the role student test scores play in evaluations and how those evaluations are used to compensate teachers.” However, under draft legislation introduced by Senator Lamar Alexander (R-Tenn.), Chairman of the Senate Health, Education, Labor, and Pensions Committee, Title II funding would turn into federal block grants, which could be used by states for educator evaluation. Regardless, excluding a teacher evaluation mandate from ESEA reauthorization may undermine efforts by the Obama administration to incorporate student test score gains as a significant component of educator evaluation.

Camera further explains: “Should Congress succeed in overhauling the federal K-12 law, the lack of teacher evaluation requirements will likely stop in its tracks the Obama administration’s efforts to push states to adopt evaluation systems based in part on student test scores and performance-based compensation systems.”

Under the Obama administration, in order for states to obtain a waiver from NCLB penalties and to receive a Race to the Top Grant, they had to incorporate—as a significant component—student growth data in educator evaluations. Influenced by these powerful policy levers, forty states and the District of Columbia require objective measures of student learning to be included in educator evaluations—a sea change from just five years ago (Doherty & Jacobs/National Council on Teacher Quality, 2013). Most states use either some type of value-added model (VAM) or student growth percentile (SGP) model to calculate a teacher’s contribution to student score changes.

The Good, the Bad, and the Ugly

As someone who is skeptical about the use of VAMs and SGPs for evaluating educators, I have mixed feelings about the idea that educator evaluation will be left out of ESEA reauthorization. I believe that student growth measures such as VAMs and SGPs should be used not as a calculable component of an educator’s evaluation but as a screener to flag educators who may need further scrutiny or support, a recommendation made by a number of student growth measure (SGM) experts (e.g., Baker et al., 2010; Hill, Kapitula, & Umland, 2011; IES, 2010; Linn, 2008).

Here are two thoughts about the consequences of not incorporating policy on educator evaluation in the reauthorization of ESEA:

  1. Lack of clear federal vision for educator evaluation devolves to states the debate. There are strong debates about what the nature of educator evaluation can and should be, and education luminaries such as Linda Darling Hammond and James Popham have weighed in on the issue (see here and here, respectively). If Congress does not address educator evaluation in ESEA legislation, the void will be filled by disparate state policies. This in itself is neither good nor bad. It does, however, call into question the longevity of the efforts the Obama administration has made to leverage educator evaluation as a way to increase teacher quality. Essentially, the lack of action on the part of Congress regarding educator evaluation devolves the debates to the state level, which means that heated—and sometimes vitriolic—debates about educator evaluation will endure, shifting attention away from other efforts that could have a more powerful and more positive effect on student learning.
  2. Possibility of increases in inequity. ESEA was first passed in 1965 as part of President Johnson’s War on Poverty. ESEA was intended to promote equity for students from poverty by providing federal funding to districts serving low-income students. The idea was that the federal government could help to level the playing field, so to speak, for students who lacked the advantages of higher income students. My own research suggests that the use of VAM for educator evaluation potentially exacerbates inequity in that some teachers avoid working with certain groups of students (e.g., students with disabilities, gifted students, and students who are multiple grade levels behind) and at certain schools, especially high-poverty schools, based on the perception that teaching such students and in such schools will result in lower value-added scores. Without federal legislation that provides clear direction to states that student test score data should not be used for high-stakes evaluation and personnel decisions, states may continue to use data in this manner, which could exacerbate the very inequities that ESEA was originally designed to address.

While it is a good thing, in my mind, that ESEA reauthorization will not mandate educator evaluation that incorporates student test score data, it is a bad (or at least ugly) thing that Congress is abdicating the role of promoting sound educator evaluation policy.

References

Baker, A. L., Barton, P. E., Darling-Hammond, L., Haertel, E., Ladd, H. F., Linn, R. L., . . . Shepard, L. A. (2010). Problems with the use of student test scores to evaluate teachers. EPI Briefing Paper. Washington, D.C.

Hill, H. C., Kapitula, L., & Umland, K. (2011). A validity argument approach to evaluating teacher value-added scores. American Educational Research Journal, 48(3), 794-831.

Doherty, K. M., & Jacobs, S./National Council on Teacher Quality (2013). State of the states 2013: Connect the dots: Using evaluation of teacher effectiveness to inform policy and practice. Washington, D. C.: National Council on Teacher Quality.

Institute of Education Sciences. (2010). Error rates in measuring teacher and school performance based on students’ test score gains. Washington, D.C.: U. S. Department of Education.

Linn, R. L. (2008). Methodological issues in achieving school accountability. Journal of Curriculum Studies, 40(6), 699-711.

Student Learning Objectives (SLOs): What (Little) We Know about Them Besides We Are to Use Them

Following up on a recent post, a VAMboozled! follower – Laura Chapman – wrote the comment below about Student Learning Objectives (SLOs) that I found important to share with you all. SLOs are objectives that are teacher-developed and administrator-approved to help hold teachers accountable for their students’ growth, although growth in this case is individually and loosely defined, which makes SLOs about as subjective as it gets. Ironically, SLOs serve as alternatives to VAMs when teachers who are VAM-ineligible need to be held accountable for “growth.”

Laura commented about how I need to write more about SLOs as states are increasingly adopting these, but states are doing this without really any research evidence in support of the concept, much less the practice. That might seem more surprising than it really is, but there is not a lot of research being conducted on SLOs, yet. One research document of which I am aware I reviewed here, with the actual document written by Mathematica and published by the US Department of Education here: “Alternative student growth measures for teacher evaluation: Profiles of early-adopting districts.

Conducting a search on ERIC, I found only two additional pieces also contracted out and published by the US Department of Education, although the first piece is more about describing what states are doing in terms of SLOs versus researching the actual properties of the SLOs. The second piece better illustrates the fact that “very little of the literature on SLOs addresses their statistical properties.”

What little we do know about SLOs at this point, however, is two-fold: (1) “no studies have looked at SLO reliability” and (2) “[l]ittle is known about whether SLOs can yield ratings that correlate with other measures of teacher performance” (i.e., one indicator of validity). The very few studies in which researchers have examined this found “small but positive correlations” between SLOs and VAM-based ratings (i.e., not a strong indicator of validity).

With that being said, if any of you are aware of research I should review or if any of you have anything to say or write about SLOs in your states, districts, or schools, feel free to email me at audrey.beardsley@asu.edu.

In the meantime, do also read what Laura Wrote about SLOs here:

I appreciate your work on the VAM problem. Equal attention needs to be given to the use of SLOs for evaluating teacher education in so-called untested and non-tested subjects. It has been estimated that about 65-69% of teachers have job assignments for which there are not state-wide tests. SLOs (and variants) are the proxy of choice for VAM. This writing exercise is required in at least 27 states, with pretest-posttest and/or baseline to post-test reports on student growth. Four reports from USDE (2014) [I found three] show that there is no empirical research to support the use of the SLO process (and associated district-devised tests and cut-off scores) for teacher evaluation.

The template for SLOs originated in Denver in 1999. It has been widely copied and promoted via publications from USDE’s “Reform Support Network,” which operates free of any need for evidence and few constraints other than marketing a deeply flawed product. SLO templates in wide use have no peer reviewed evidence to support their use for teacher evaluation…not one reliability study, not one study addressing their validity for teacher evaluation.

SLO templates in Ohio and other states are designed to fit the teacher-student data link project (funded by Gates and USDE since 2005). This means that USDE’s proposed evaluations of specific teacher education programs ( e.g., art education at Ohio State University) will be aided by the use of extensive “teacher of record” data routinely gathered by schools and districts, including personnel files that typically require the teacher’s college transcripts, degree earned, certifications, scores on tests for any teacher license and so on.

There are technical questions galore, but a big chunk of the data of interest to the promoters of this latest extension of the Gates/USDE’s rating game are in place.
I have written about the use of SLOs as a proxy for VAM in an unpublished paper titled The Marketing of Student Learning Objectives (SLOs): 1999-2014. A pdf with references can be obtained by request at chapmanLH@aol.com

Teacher Evaluation and Accountability Alternatives, for A New Year

At the beginning of December I posted a post about Diane Ravitch’s really nice piece published in the Huffington Post about what she views as a much better paradigm for teacher evaluation and accountability. Diane Ravitch posted another on similar alternatives, although this one was written by teachers themselves.

I thought this was more than appropriate, especially given a New Year is upon us, and while it might very well be wishful thinking, perhaps at least some of our state policy makers might be willing to think in new ways about what really could be new and improved teacher evaluation systems. Cheers to that!

The main point here, though, is that alternatives do, indeed, exist. Likewise, it’s not that teachers do not want to be held accountable for, and evaluated on that which they do, but they do want whatever systems are in place (formal or informal) to be appropriate, professional, and fair. How about that for policy-based resolution.

This is from Diane’s post: The Wisdom of Teachers: A New Vision of Accountability.

Anyone who criticizes the current regime of test-based accountability is inevitably asked: What would you replace it with? Test-based accountability fails because it is based on a lack of trust in professionals. It fails because it confuses measurement with instruction. No doctor ever said to a sick patient, “Go home, take your temperature hourly, and call me in a month.” Measurement is not a treatment or a cure. It is measurement. It doesn’t close gaps: it measures them.

Here is a sound alternative approach to accountability, written by a group of teachers whose collective experience is 275 years in the classroom. Over 900 teachers contributed ideas to the plan. It is a new vision that holds all actors responsible for the full development and education of children, acknowledging that every child is a unique individual.

Its key features:

  • Shared responsibility, not blame
  • Educate the whole child
  • Full and adequate funding for all schools, with less emphasis on standardized testing
  • Teacher autonomy and professionalism
  • A shift from evaluation to support
  • Recognition that in education one size does not fit all

A New Paradigm for Accountability

Diane Ravitch recently published in the Huffington Post a really nice piece about what she views as a much better paradigm for accountability — one based on much better indicators than large scale standardized test scores. This does indeed offer a much better and much more positive and supportive accountability alternative to that with which we have been “dealing” for the last, really, 30 years.

The key components of this new paradigm, as taken from the full post titled, “A New Paradigm for Accountability: The Joy of Learning,” are pasted below. Although I would recommend giving this article a full read, instead or in addition, as the way Diane frames her reasoning around this list is also important to understand. Click here to see the full article on the Huffington Post website. Otherwise, here’s her paradigm:

The new accountability system would be called No Child Left Out. The measures would be these:

  • How many children had the opportunity to learn to play a musical instrument?
  • How many children had the chance to play in the school band or orchestra?
  • How many children participated in singing, either individually or in the chorus or a glee club or other group?
  • How many public performances did the school offer?
  • How many children participated in dramatics?
  • How many children produced documentaries or videos?
  • How many children engaged in science experiments? How many started a project in science and completed it?
  • How many children learned robotics?
  • How many children wrote stories of more than five pages, whether fiction or nonfiction?
  • How often did children have the chance to draw, paint, make videos, or sculpt?
  • How many children wrote poetry? Short stories? Novels? History research papers?
  • How many children performed service in their community to help others?
  • How many children were encouraged to design an invention or to redesign a common item?
  • How many students wrote research papers on historical topics?

Can you imagine an accountability system whose purpose is to encourage and recognize creativity, imagination, originality, and innovation? Isn’t this what we need more of?

Well, you can make up your own metrics, but you get the idea. Setting expectations in the arts, in literature, in science, in history, and in civics can change the nature of schooling. It would require far more work and self-discipline than test prep for a test that is soon forgotten.

My paradigm would dramatically change schools from Gradgrind academies to halls of joy and inspiration, where creativity, self-discipline, and inspiration are nurtured, honored, and valued.

This is only a start. Add your own ideas. The sky is the limit. Surely we can do better than this era of soul-crushing standardized testing.

Surveys + Observations for Measuring Value-Added

Following up on a recent post about the promise of Using Student Surveys to Evaluate Teachers using a more holistic definition of a teacher’s valued added, I just read a chapter written by Ronald Ferguson — the creator of the Tripod student survey instrument and Tripod’s lead researcher — and written along with Charlotte Danielson — the creator of the Framework for Teaching and founder of The Danielson Group (see a prior post about this instrument here). Both instruments are “research-based,” both are used nationally and internationally, both are (increasingly being) used as key indicators to evaluate teachers across the U.S., and both were used throughout the Bill & Melinda Gates Foundation’s ($43 million worth of) Measures of Effective Teaching (MET) studies.

The chapter titled, “How Framework for Teaching and Tripod 7Cs Evidence Distinguish Key Components of Effective Teaching,” was recently published in a book all about the MET studies, titled “Designing Teacher Evaluation Systems: New Guidance from the Measures of Effective Teaching Project” written by Thomas Kane, Kerri Kerr, and Robert Pianta. The chapter is about whether and how data derived via the Tripod student survey instrument (i.e., as built on 7Cs: challenging students, control of the classroom, teacher caring, teachers confer with students, teachers captivate their students, teachers clarify difficult concepts, teachers consolidate students’ concerns) align with the data derived via Danielson’s Framework for Teaching, to collectively capture teacher effectiveness.

Another purpose for this chapter is to examine how both indicators also align with teacher level-value-added. Ferguson (and Danielson) find that:

  • Their two measures (i.e., the Tripod and the Framework for Teaching) are more reliable (and likely more valid) than value-added measures. The over-time, teacher-level classroom correlations, cited in this chapter, are r = 0.38 for value-added (which is comparable with the correlations noted in plentiful studies elsewhere), r = 0.42 for the Danielson Framework, and r = 0.61 for the Tripod student survey component. These “clear correlations,” while not strong particularly in terms of value-added, do indicate there is some common signal that the indicators are capturing, some stronger than the others (as should be obvious given the above numbers).
  • Contrary to what some (softies) might think, classroom management, not caring (i.e., the extent to which teachers care about their students and what their students learn and achieve), is the strongest predictor of a teachers’ value-added. However, the correlation (i.e., the strongest of the bunch) is still quite “weak” at an approximate r = 0.26, even though it is statistically significant. Caring, rather, is the strongest predictor of whether students are happy in their classrooms with their teachers.
  • In terms of “predicting” teacher-level value-added, and of the aforementioned 7Cs, the things that also matter “most” next to classroom management (although none of the coefficients are as strong as we might expect [i.e., r < 0.26]) include: the extent to which teachers challenge their students and have control over their classrooms.
  • Value-added in general is more highly correlated with teachers at the extremes in terms of their student survey and observational composite indicators.

In the end, while the authors of this chapter do not disclose the actual correlations between their two measures and value-added, specifically (although from the appendix one can infer that the correlation between value-added and Tripod output is around r = 0.45 as based on an unadjusted r-squared), and I should mention this is a HUGE shortcoming of this chapter (one that would not have passed peer review should this chapter have been submitted to a journal for publication), the authors do mention that “the conceptual overlap between the frameworks is substantial and that empirical patterns in the data show similarities.” Unfortunately again, however, they do not quantify the strength of said “similarities.” This only leaves us to assume that since they were not reported the actual strength of the similarities empirically observed between was likely low (as is also evidenced in many other studies, although not as often with student survey indicators as opposed to observational indicators.)

The final conclusion the authors of this chapter make is that educators “cross-walk” the two frameworks (i.e., the Tripod and the Danielson Framework) and use both frameworks when reflecting on teaching. I must say I’m concerned about these recommendations, as well, mainly given this recommendation will cost states and districts more $$$, and the returns or “added value” (using the grandest definition of this term) of doing so and engaging in such an approach does not have the necessary evidence I would say one might use to adequately justify such recommendations.

“Accountability that Sticks” v. “Accountability with Sticks”

Michael Fullan, Professor Emeritus from the University of Toronto and former Dean of the Ontario Institute for Studies in Education (OISE), gave a presentation he titled “Accountability that Sticks.” Which is “a preposition away from accountability with sticks.”

In his speech he said: “Firing teachers and closing schools if student test scores and graduation rates do not meet a certain bar is not an effective way to raise achievement across a district or a state…Linking student achievement to teacher appraisal, as sensible as it might seem on the surface, is a non-starter…It’s a wrong policy [emphasis added]…[and] Its days are numbered.”

He noted that teacher evaluation is “the biggest factor that most policies get wrong…Teacher appraisal, even if you get it right – which the federal government doesn’t do – is the wrong driver. It will never be intensive enough.”

He then spoke about how, at least in the state of California, things look more promising than they have in the past, from his view working with local districts throughout the state. He noted noticing that “Growing numbers of superintendents, teachers and parents…are rejecting punitive measures…in favor of what he called more collaborative, humane and effective approaches to supporting teachers and improving student achievement.”

If the goal is to improve teaching, then, what’s the best way to do this according to Fullan? “[A] culture of collaboration is the most powerful tool for improving what happens in classrooms and across districts…This is the foundation. You reinforce it with selective professional development and teacher appraisal.”

In addition, “[c]ollaboration requires a positive school climate – teachers need to feel respected and listened to, school principals need to step back, and the tone has to be one of growth and improvement, not degradation.” Accordingly, “New Local Control and Accountability Plans [emphasis added], created individually by districts, could be used by teachers and parents to push for ways to create collaborative cultures” and cultures of community-based and community-respected accountability.

This will help allow “talented schools” to improve “weak teachers” and further prevent the attrition of “talented teachers” from “weak schools.”

To read more about his speech, as highlighted by Jane Meredith Adams on EdSource, click here.

Does A “Statistically Sound” Alternative Exist?

A few weeks ago a follower posed the following question on our website, and I thought it imperative to share.

Following the post about “The Arbitrariness Inherent in Teacher Observations,” he wrote: “Have you written about a statistically sound alternative proposal?”

My reply? “Nope. I do not believe such a thing exists. I do have a sound alternative proposal though, that has sound statistics to support it. It serves as the core of chapter 8 of my recent book.”

Essentially, this is a solution that, counter-intuitively, offers an even-more conventional and traditional solution. This is a solution that has research and statistical evidence in support, and has evidenced itself as superior to using value-added measures, along with other measures of teacher effectiveness in their current forms, for evaluating and holding teachers accountable for their effectiveness. It is based on the use of multiple measures, as aligned with the standards of the profession and also locally defined theories capturing what it means to be an effective teacher. Its effectiveness also relies on competent supervisors and elected colleagues serving as professional members of educators’ representative juries.

This solution does not rely solely on mathematics and the allure of numbers or grandeur of objectivity that too often comes along with numerical representation, especially in the social sciences. This solution does not trust the test scores too often (and wrongly) used to assess teacher quality, simply because the test output is already available (and paid for) and these data can be represented numerically, mathematically, and hence objectively. This solution does not marginalize human judgment, but rather embraces human judgment for what it is worth, as positioned and operationalized within a more professional, democratically-based, and sound system of judgment, decision-making, and support.

Jesse Rothstein on Teacher Evaluation and Teacher Tenure

Last week, released via the Washington Post’s Wonkblog, Max Ehrenfreund wrote a piece titled “Teacher tenure has little to do with student achievement, economist says.” For those of you who do not know Jesse Rothstein, he’s an Associate Professor of Economics at University of California – Berkeley, and he is one of the leading researchers/economists conducting research on teacher evaluation and accountability policies writ large, as well as the value-added models (VAMs) being used for such purposes. He’s probably most famous for a study he conducted in 2009 about how the non-random, purposeful sorting of students into classrooms indeed biases (or distorts) value-added estimations, pretty much despite the sophistication of the statistical controls meant to block (or control for) such bias (or distorting effects). You can find this study referenced here.

Anyhow, in this piece author Ehrenfreuend discusses with Rothstein teacher evaluation and teacher tenure. Some of the key take-aways from the interview and for this audience follow, but do read the full piece, linked again here, if so inclined:

Rothstein, on teacher evaluation:

  • In terms of evaluating teachers, “[t]here’s no perfect method. I think there are lots of methods that give you some information, and there are lots of problems with any method. I think there’s been a tendency in thinking about methods to prioritize cheap methods over methods that might be more expensive. In particular, there’s been a tendency to prioritize statistical computations based on student test scores, because all you need is one statistician and the test score data. Classroom observation requires having lots of people to sit in the back of lots and lots of classrooms and make judgments.
  • Why the interest in value-added? “I think that’s a complicated question. It seems scientific, in a way that other methods don’t. Partly it has to do with the fact that it’s cheap, and it seems like an easy answer.”
  • What about the fantabulous study Raj Chetty and his Harvard colleagues (Friedman and Rockoff) conducted about teachers’ value-added (which has been the source of many prior posts herein)? “I don’t think anybody disputes that good teachers are important, that teachers matter. I have some methodological concerns about that study, but in any case, even if you take it at face value, what it tells you is that higher value-added teachers’ students earn more on average.”
  • What are the alternatives? “We could double teachers’ salaries. I’m not joking about that. The standard way that you make a profession a prestigious, desirable profession, is you pay people enough to make it attractive. The fact that that doesn’t even enter the conversation tells you something about what’s wrong with the conversation around these topics. I could see an argument that says it’s just not worth it, that it would cost too much. The fact that nobody even asks the question tells me that people are only willing to consider cheap solutions.”

Rothstein, on teacher tenure:

  • “Getting good teachers in front of classrooms is tricky,” and it will likely “still be a challenge without tenure, possibly even harder. There are only so many people willing to consider teaching as a career, and getting rid of tenure could eliminate one of the job’s main attractions.”
  • Likewise, “there are certainly some teachers in urban, high-poverty settings that are not that good, and we ought to be figuring out ways to either help them get better or get them out of the classroom. But it’s important to keep in mind that that’s only one of several sources of the problem.”
  • “Even if you give the principal the freedom to fire lots of teachers, they won’t do it very often, because they know the alternative is worse.” The alternative being replacing an ineffective teacher by an even less effective teacher. Contrary to what is oft-assumed, high qualified teachers are not knocking down the doors to teach in such schools.
  • Teacher tenure is “really a red herring” in the sense that debating tenure ultimately misleads and distracts others from the more relevant and important issues at hand (e.g., recruiting strong teachers into such schools). Tenure “just doesn’t matter that much. If you got rid of tenure, you would find that the principals don’t really fire very many people anyway” (see also point above).