Noam Chomsky on Testing and Using Tests to Rank Teachers for Accountability

Noam Chomsky – Professor Emeritus from Massachusetts Institute of Technology (MIT), “leader in the field” of linguistics, and “a figure of enlightenment and inspiration” for political dissenters – was recently featured in a fabulous post “on the Dangers of Standardized Testing.” The post came from a partial transcript of an interview with Chomsky, with highlights featured in the “Creative by Nature” blog. Click here to read the full article, most of which is also pasted below.

Chomsky writes:

“You take what is happening in education. Right now, in recent years, there’s a strong tendency to require assessment of children and teachers so that you have to teach to tests. And the test determines what happens to the child and what happens to the teacher.

That’s guaranteed to destroy any meaningful educational process. It means the teacher cannot be creative, imaginative, pay attention to individual students’ needs. The student can’t pursue things, maybe some kid is interested in something, can’t do it because you got to memorize something for this test tomorrow. And the teacher’s future depends on it, as well as the student.

The people sitting in the offices, the bureaucrats designing this, they’re not evil people, but they’re working within a system of ideology and doctrines that turns what they’re doing into something extremely harmful.

First of all, you don’t have to assess people all the time… People don’t have to be ranked in terms of some artificial [standards]. The assessment itself is completely artificial. It’s not ranking teachers in accordance with their ability to help develop children who will reach their potential, explore their creative interests. Those things you’re not testing.

So you are giving some kind of a rank, but it’s a rank that’s mostly meaningless. And the very ranking itself is harmful. It’s turning us into individuals who devote our lives to achieving a rank. Not into doing things that are valuable and important.

It’s highly destructive at the lower grades. This is elementary education, so you are training kids this way. And it’s very harmful. I could see it with my own children.

When my own kids were in elementary school, at a good quality suburban school, by the time they were in third grade they were dividing up their kids into dumb and smart. You’re dumb if you’re lower tracked, smart if you’re upper tracked.

Think of what that does to the children. It doesn’t matter where they’re tracked, the children take it seriously… If you’re caught up in that it’s just extremely harmful. It has nothing to do with education.

Education is developing your own potential and creativity. Maybe you’re not going to do well in school and you’ll do great in art. That’s fine. What’s wrong with that? It’s another way of living a fulfilling wonderful life, and one that is significant for other people as well as yourself.

The whole idea [of ranking] is harmful in itself. It’s kind of a system of creating something called “economic man.” There’s a concept of economic man, which is in economics literature. Economic man is somebody who rationally calculates how to improve his own status (and status basically means wealth).

So you rationally calculate what kinds of choices you should make to increase your wealth, and you don’t pay attention to anything else. Maximize the number of goods you have, cause that is what you can measure. If you do that properly, you are a rational person making informed judgments. You can improve your “human capital,” what you can sell on the market.

What kind of human being is that? Is that the kind of human being you want to create? All of these mechanisms- testing, assessing, evaluating, measuring- they force people to develop those characteristics… These ideas and concepts have consequences…”

 

Student Learning Objectives (SLOs) as a Measure of Teacher Effectiveness: A Survey of the Policy Landscape

I have invited another one of my former PhD students, Noelle Paufler, to the VAMboozled! team, and for her first post she has written on student learning objectives (SLOs), in large part as per the prior request(s) of VAMboozled! followers. Here is what she wrote:

Student learning objectives (SLOs) are rapidly emerging as the next iteration in the policy debate surrounding teacher accountability at the state and national levels. Purported as one solution to the methodologically challenging task of measuring the effectiveness of teachers of subject areas for which large-scaled standardized tests are unavailable, SLOs prompt the same questions of validity, reliability, and fairness raised by many about value-added models (VAMs). Defining the SLO process as “a participatory method of setting measurable goals, or objectives, based on the specific assignment or class, such as the students taught, the subject matter taught, the baseline performance of the students, and the measurable gain in student performance during the course of instruction” (Race to the Top Technical Assistance Network, 2010, p. 1), Lacireno-Paquet, Morgan, and Mello (2014) provide an overview of states’ use of SLOs in teacher evaluation systems.

There are three primary types of SLOs (i.e., for individual teachers, teams or grade levels, and school-wide) that may target subgroups of students and measure student growth or another measurable target (Lacireno-Paquet et al., 2014). SLOs relying on one or more assessments (e.g., state-wide standardized tests; district-, school-, or classroom measures) for individual teachers are most commonly used in teacher evaluation systems (Lacireno-Paquet et al., 2014). At the time of their writing, 25 states had included SLOs under various monikers (e.g., student learning targets, student learning goals) in their teacher evaluation systems (Lacireno-Paquet et al., 2014). Of these states, 24 provide a structured process for setting, approving, and evaluating SLOs which most often requires an evaluator at the school or district level to review and approve SLOs for individual teachers (Lacireno-Paquet et al., 2014). For more detailed state-level information, read the full report here.

Arizona serves as a case in point for considering the use of SLOs as part of the Arizona Model for Measuring Educator Effectiveness, an evaluation system comprising measures of teacher professional practice (50%-67%) and student achievement (33%-50%). Currently, the Arizona Department of Education (ADE) classifies teachers into two groups (A and B) based on the availability of state standardized tests for their respective content areas. ADE (2015) defines teachers “who have limited or no classroom level student achievement data that are valid and reliable, aligned to Arizona’s academic standards and appropriate to teachers’ individual content area” as Group B for evaluation purposes (e.g., social studies, physical education, fine arts, career and technical education [CTE]) (p. 1). Recommending SLOs as a measure of student achievement for these teachers, ADE (2015) cites their use as a means to positively impact student achievement, especially when teachers collaboratively create quality common assessments to measure students across a grade level or within a content area. ADE (2015) describes SLOs as “classroom level measures of student growth and mastery” that are “standards based and relevant to the course content,” “specific and measureable,” and “use [student data from] two points in time,” specifically stating that individual lesson objectives and units of study do not qualify and discouraging teaching to the test (p. 1). Having piloted the SLO process in the 2012-2013 school year with full implementation in the 2013-2014 school year in five Local Education Agencies (LEAs) (four district and one charter), ADE (2015) continues to discuss next steps in the implementation of SLOs.

Despite this growing national interest in and rapid implementation of SLOs, very little research has examined the perspectives of district- and school-level administrators and teachers (in both Groups A and B or their equivalent) with regards to the validity, reliability, and fairness of measuring student achievement in this manner. Additional research in early adopter states as well as in states that are piloting the use of SLOs is needed in order to better understand the implications of yet another wave of accountability policy changes.

References

Arizona Department of Education. (2015). The student learning objective handbook. Retrieved from http://www.azed.gov/teacherprincipal-evaluation/files/2015/01/slo-handbook-7-2.pdf?20150120

Lacireno-Paquet, N., Morgan, C., & Mello, D. (2014). How states use student learning objectives in teacher evaluation systems: A review of state websites (REL 2014-013). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory North-east & Islands. Retrieved from http://ies.ed.gov/ncee/edlabs/projects/project.asp?projectID=380

Race to the Top Technical Assistance Network. (2010). Measuring student growth for teachers in non-tested grades and subjects: A primer. Washington, DC: ICF International. Retrieved http://nassauboces.org/cms/lib5/NY18000988/Centricity/Domain/156/NTS__PRIMER_FINAL.pdf

Harvard Economist Deming on VAM-Based Bias

David Deming – an Associate Professor of Education and Economics at Harvard – just published in the esteemed American Economic Review an article about VAM-based bias, in this case when VAMs are used to measure school versus teacher level effects.

Deming appropriately situated his study within the prior works on this topic, including the key works of Thomas Kane (Education and Economics at Harvard) and Raj Chetty (Economics at Harvard). These two, most notably, continue to advance assertions that using students’ prior test scores and other covariates (i.e., to statistically control for students’ demographic/background factors) minimizes VAM-based bias to negligible levels. Deming also situated his study given the notable works of Jesse Rothstein (Public Policy and Economics at the University of California, Berkeley) who continues to evidence VAM-based bias really does exist. The research of these three key players, along with their scholarly disagreements, have also been highlighted in prior posts about VAM-based bias on this blog (see, for example, here and here).

In this study to test for bias, though, Deming used data from Charlotte-Mecklenburg, North Carolina, given a data set derived from a district in which there was quasi-random assignment of students to schools (given a school choice initiative). With these data, Deming tested whether VAM-biased bias was evident across a variety of common VAM approaches, from the least sophisticated VAM (e.g., one year of prior test scores and no other covariates) to the most (e.g., two or more years of prior test score data plus various covariates).

Overall, Deming failed to reject the hypothesis that school-level effects as measured using VAMs are unbiased, almost regardless of the VAM being used. In more straightforward terms, Deming found that school effects as measured using VAMs were rarely if ever biased when compared to his randomized samples. Hence, this work falls inline with prior works countering that bias really does exist (Note: this is a correction from the prior post).

There are still, however, at least three reasons that could lead to bias in either direction (I.e., positive, in favor of school effects or negative, underestimating school effects):

  • VAMs may be biased due to the non-random sorting of students into schools (and classrooms) “on unobserved determinants of achievement” (see also the work of Rothstein, here and here).
  • If “true” school effects vary over time (independent of error), then test-based forecasts based on prior cohorts’ test scores (as is common when measuring the difference between predictions and “actual” growth, when calculating value-added) may be poor predictors of future effectiveness.
  • When students self-select into schools, the impact of attending a school may be different for students who self-select in than for students who do not. The same thing likely holds true for classroom assignment practices, although that is my extrapolation, not Deming’s.

In addition, and in Deming’s overall conclusions that also pertain here, “many other important outcomes of schooling are not measured here. Schools and teachers [who] are good at increasing student achievement may or may not be effective along other important dimensions” (see also here).

For all of these reasons, “we should be cautious before moving toward policies that hold schools accountable for improving their ‘value added” given bias.

Lawsuit in New Mexico Challenging State’s Teacher Evaluation System

On Friday (February 13) the American Federation of Teachers (AFT) posted a news release “Teachers and State Legislators Join AFT New Mexico in Lawsuit Challenging Constitutionality of Punitive, Error-Ridden Teacher Evaluation System” in which it is detailed why New Mexico teachers, state legislators, AFT New Mexico, and the Albuquerque Teachers Federation are taking on New Mexico’s Public Education Department and the State’s Education Secretary-Designee, charging that “New Mexico’s current teacher evaluation system is harming teachers and depriving students of the high-quality educators they need to succeed.”

The actual lawsuit can be found here, in which the “numerous problems” with the state’s systems are detailed more fully. But in short, the key claim in the charge follows.

Throughout the state value-added measures (VAMs) comprise 50% of a teacher’s rating, with the other 50% coming from supervisor observations, student or parent surveys, attendance records, and the like. The charge is that all of the indicators are “riddled with errors,” not only in terms of the quality of the data inputted into the system but also, related and perhaps more importantly, the quality of the data coming out (i.e., the teacher evaluation output). Errors include but are not limited to: “teachers rated on incomplete or incorrect test data (for example, teachers matched up to students they never taught, students given tests on subjects or levels they didn’t know); teachers docked for being absent more days than they were actually gone from school, and some penalized for being absent for family or medical leave, bereavement, or professional development; missing data from student surveys; and teachers rated poorly on the student achievement portion of the evaluation, even when their students had made clear progress on tests.”

AFT President Randi Weingarten, commenting on this lawsuit said, “Last year, I called the VAM-based individual teacher evaluation system a sham based on how it was being used in places like Texas and Florida. New Mexico’s use of it is just as concerning.” Hence, New Mexico has certainly made it to the top of the list in terms of state’s to watch, although I would also add to this list the states of Tennessee and New York.

As I have written and said before, I believe the VAM-related “wars” will be won out in the courthouse. Hopefully this, as well as some of the other key lawsuits currently underway in these other states (see, for example, here, here, and here), will continue to take the lead, and even lead our nation back to a more reasonable and valid set of standards and expectations when it comes to the evaluation of America’s public school teachers. Do stay tuned…

Educator Evaluations (and the Use of VAM) Unlikely to be Mandated in Reauthorization of ESEA

In invited a colleague of mine – Kimberly Kappler Hewitt (Assistant Professor, University of North Carolina, Greensboro) – to write a guest post for you all, and she did on her thoughts regarding what is currently occurring on Capitol Hill regarding the reauthorization of the Elementary and Secondary Education Act (ESEA). Here is what she wrote:

Amidst what is largely a bitterly partisan culture on Capitol Hill, Republicans and Democrats agree that teacher evaluation is unlikely to be mandated in the reauthorization of the Elementary and Secondary Education Act (ESEA), the most recent iteration of which is No Child Left Behind (NCLB), signed into law in 2001. See here for an Education Week article by Lauren Camera on the topic.

In another piece on the topic (here), the same author Camera explains: “Republicans, including Chairman Lamar Alexander, R-Tenn., said Washington shouldn’t mandate such policies, while Democrats, including ranking member Patty Murray, D-Wash., were wary of increasing the role student test scores play in evaluations and how those evaluations are used to compensate teachers.” However, under draft legislation introduced by Senator Lamar Alexander (R-Tenn.), Chairman of the Senate Health, Education, Labor, and Pensions Committee, Title II funding would turn into federal block grants, which could be used by states for educator evaluation. Regardless, excluding a teacher evaluation mandate from ESEA reauthorization may undermine efforts by the Obama administration to incorporate student test score gains as a significant component of educator evaluation.

Camera further explains: “Should Congress succeed in overhauling the federal K-12 law, the lack of teacher evaluation requirements will likely stop in its tracks the Obama administration’s efforts to push states to adopt evaluation systems based in part on student test scores and performance-based compensation systems.”

Under the Obama administration, in order for states to obtain a waiver from NCLB penalties and to receive a Race to the Top Grant, they had to incorporate—as a significant component—student growth data in educator evaluations. Influenced by these powerful policy levers, forty states and the District of Columbia require objective measures of student learning to be included in educator evaluations—a sea change from just five years ago (Doherty & Jacobs/National Council on Teacher Quality, 2013). Most states use either some type of value-added model (VAM) or student growth percentile (SGP) model to calculate a teacher’s contribution to student score changes.

The Good, the Bad, and the Ugly

As someone who is skeptical about the use of VAMs and SGPs for evaluating educators, I have mixed feelings about the idea that educator evaluation will be left out of ESEA reauthorization. I believe that student growth measures such as VAMs and SGPs should be used not as a calculable component of an educator’s evaluation but as a screener to flag educators who may need further scrutiny or support, a recommendation made by a number of student growth measure (SGM) experts (e.g., Baker et al., 2010; Hill, Kapitula, & Umland, 2011; IES, 2010; Linn, 2008).

Here are two thoughts about the consequences of not incorporating policy on educator evaluation in the reauthorization of ESEA:

  1. Lack of clear federal vision for educator evaluation devolves to states the debate. There are strong debates about what the nature of educator evaluation can and should be, and education luminaries such as Linda Darling Hammond and James Popham have weighed in on the issue (see here and here, respectively). If Congress does not address educator evaluation in ESEA legislation, the void will be filled by disparate state policies. This in itself is neither good nor bad. It does, however, call into question the longevity of the efforts the Obama administration has made to leverage educator evaluation as a way to increase teacher quality. Essentially, the lack of action on the part of Congress regarding educator evaluation devolves the debates to the state level, which means that heated—and sometimes vitriolic—debates about educator evaluation will endure, shifting attention away from other efforts that could have a more powerful and more positive effect on student learning.
  2. Possibility of increases in inequity. ESEA was first passed in 1965 as part of President Johnson’s War on Poverty. ESEA was intended to promote equity for students from poverty by providing federal funding to districts serving low-income students. The idea was that the federal government could help to level the playing field, so to speak, for students who lacked the advantages of higher income students. My own research suggests that the use of VAM for educator evaluation potentially exacerbates inequity in that some teachers avoid working with certain groups of students (e.g., students with disabilities, gifted students, and students who are multiple grade levels behind) and at certain schools, especially high-poverty schools, based on the perception that teaching such students and in such schools will result in lower value-added scores. Without federal legislation that provides clear direction to states that student test score data should not be used for high-stakes evaluation and personnel decisions, states may continue to use data in this manner, which could exacerbate the very inequities that ESEA was originally designed to address.

While it is a good thing, in my mind, that ESEA reauthorization will not mandate educator evaluation that incorporates student test score data, it is a bad (or at least ugly) thing that Congress is abdicating the role of promoting sound educator evaluation policy.

References

Baker, A. L., Barton, P. E., Darling-Hammond, L., Haertel, E., Ladd, H. F., Linn, R. L., . . . Shepard, L. A. (2010). Problems with the use of student test scores to evaluate teachers. EPI Briefing Paper. Washington, D.C.

Hill, H. C., Kapitula, L., & Umland, K. (2011). A validity argument approach to evaluating teacher value-added scores. American Educational Research Journal, 48(3), 794-831.

Doherty, K. M., & Jacobs, S./National Council on Teacher Quality (2013). State of the states 2013: Connect the dots: Using evaluation of teacher effectiveness to inform policy and practice. Washington, D. C.: National Council on Teacher Quality.

Institute of Education Sciences. (2010). Error rates in measuring teacher and school performance based on students’ test score gains. Washington, D.C.: U. S. Department of Education.

Linn, R. L. (2008). Methodological issues in achieving school accountability. Journal of Curriculum Studies, 40(6), 699-711.

Your Voices Were (At Least) Heard: Federal Plans for College of Education Value-Added

A couple of weeks ago I published a post titled “Your Voice Also Needs to Be Heard.” In this post I put out an all-call to solicit responses to an open request for feedback regarding the US Department of Education’s proposal to require teacher training programs (i.e., colleges of education) to track and be accountable for how their teacher graduates’ students are performing on standardized tests, once their teachers teach in the field for x years. That is, teacher-level value-added that reflects all the way back to a college of education’s purported quality.

In an article written by Stephen Sawchuk and published this past week in Education Week, you can read more about some of your and many others’ responses (i.e., more than 2,300 separate public comments) in an article titled “U.S. Teacher-Prep Rules Face Heavy Criticism in Public Comments.”

As written into the first paragraph, the feedback was “overwhelmingly critical,” and Sawchuk charged this was the case given the “coordinated opposition from higher education officials and assorted policy groups.” Not that many of particularly the former group of folks have valid arguments to make on the topic, given they are the ones at the center of the proposed reforms and understand the realities surrounding such reforms much better than the policymakers in charge…

Among the policy groups, Sawchuk accordingly positions groups like the National Education Policy Center (NEPC), that he defined as “a left-leaning think tank at the University of Colorado at Boulder that is partly funded by teachers’ unions and generally opposes market-based education policies” — against, for example — the Thomas B. Fordham Institute, which in reality is a neoconservative education policy think tank, but in Sawchuk’s “reporting of the facts” he defines as just “generally back[ing] stronger accountability mechanisms in education.” Why such (biased) reporting, Sawchuk?

Regardless, the proposed rules at issue here were “issued under the Higher Education Act [and]…released by the U.S. Department of Education in November, some two years after negotiations with representatives from various types of colleges broke down over the regulations’ shape and scope. Among other provisions, the rules would require states to use measures such as surveys of school districts, teacher-employment data, and student-achievement results to classify each preparation program in one of four categories…The lowest-rated would be barred from offering federal grants of up to $4,000 a year to help pay for teacher education under the teach program.”

Although Sawchuk does not disaggregate the data, I would venture to say that the main if not only issues with which folks are actually talking issue is the latter piece – the use of student-achievement results to classify each preparation program in one of four categories as per their “value-added.” Sawchuk did, however, report on five major themes common across responses about how the new rules would:

  • Prioritize student test scores, potentially leading to deleterious effects on teacher-preparation coursework;
  • Apply punitive sanctions to programs rather than support them;
  • Expand federal meddling in state affairs;
  • Prescribe flawed measures that would yield biased results; and
  • Cost far more to implement than the $42 million the Education Department estimated.

You can see individual’s responses also highlighted within the article, again linked to here.

“Only a handful of commenters were outright supportive of the rules.” Yet, “[w]hether the [US] Education Department will be swayed by the volume of negative comments to rewrite or withdraw the rules remains an open question.” What do you think they will do?

As per Michael J. Petrilli, a former staffer under George W. Bush’s administration, the US Department of Education “must give the public a chance to provide input, and has to explain if it has changed its regulations as a result of the process. But it doesn’t have to change a word.” I will try to stay positive, but I guess we shall wait and see…

Petrilli also cautioned that “critics’ attempts to undermine the rules could backfire. ‘If opponents want to be constructive, they need to suggest ways to improve the regulation, not just argue for its elimination.” For the record, I am MORE THAN HAPPY to help offer better and much more reasonable and valid solutions!!

Breaking News from Tennessee: Another Lawsuit Challenging Value-Added Use for Elective Teachers

Just announced late this week, in an article in The Tennessean and another in Chalkbeat Tennessee, is that the Tennessee Education Association (TEA), the state’s largest teacher’s union, filed a lawsuit against the governor, state education commissioner, Tennessee Board of Education, the Metropolitan Nashville Board of Public Education, and the Anderson County Schools Board of Education. They are challenging the use of state test scores to conduct teacher evaluations for teachers of non-state-tested grades and subject areas (i.e., elective teachers), and they are doing this with the full support of the National Education Association (see here).

While in Tennessee the state uses tests to test grade 3-8 grade students in mathematics, science, English, and social studies, two teachers in particular are part of the lawsuit charging that because they taught physical education and visual arts, one did not receive a bonus and the other lost her tenure eligibility, respectively, because test scores of students they did not teach in said subject areas were incorporated into their teacher evaluations.

Recall that this is the state that really lead the nation in terms of its adoption and implementation of value-added systems, thanks in large part to the Tennessee Value-Added Assessment System (TVAAS), which is now more popularly known as the Education Value-Added Assessment System (EVAAS), but that was also the main reason the state was one of the first to win federal Race to the Top funds in 2010 landing a total of $501 million. The TVAAS is also being named as part of the problem in this lawsuit in that this is also the system being used to measure growth for these out-of-subject-area teachers. This growth counts for 25% of these teachers’ value-added.

The main problem, as per TEA’s General Counsel Rick Colbert, is that “These [and many other] teachers are [being] evaluated in part based on state assessment scores from students they do not teach and may have never met.” Also according to Colbert, “Depriving someone of those interests on the basis of something they have no control over is arbitrary, and therefore not due process.” This problem is amplified considering that,  “more than half of Tennessee’s public school teachers — about 50,000 — teach non-tested subjects” throughout the state, according to TEA’s President Barbara Gray.

This is actually the third lawsuit contesting the use of the TVAAS. As written into the article in Chalkbeat Tennessee, “The first two suits, filed last March, contested the methodology of TVAAS – which TEA officials say is imprecise – for teachers whose students are tested. Those cases are [still] pending in federal court in Knox County.”

You can read this particular lawsuit in its entirety here.

Can VAMs Be Trusted?

In a recent paper published in the peer-reviewed journal Education Finance and Policy, coauthors Cassandra Guarino (Indiana University – Bloomington), Mark Reckase (Michigan State University), and Jeffrey Wooldridge (Michigan State University) ask and then answer the following question: “Can Value-Added Measures of Teacher Performance Be Trusted?” While what I write below is taken from what I read via the official publication, I link here to the working paper that was published online via the Education Policy Center at Michigan State University (i.e., not for a fee).

From the abstract, authors “investigate whether commonly used value-added estimation strategies produce accurate estimates of teacher effects under a variety of scenarios. [They] estimate teacher effects [using] simulated student achievement data sets that mimic plausible types of student grouping and teacher assignment scenarios. [They] find that no one method accurately captures true teacher effects in all scenarios, and the potential for misclassifying teachers as high- or low-performing can be substantial.”

From elsewhere in more specific terms, the authors use simulated data to “represent controlled conditions” to most closely match “the relatively simple conceptual model upon which value-added estimation strategies are based.” This is the strength of this research study in that authors’ findings represent best-case scenarios, while when working with real-world and real-life data “conditions are [much] more complex.” Hence, working with various statistical estimators, controls, approaches, and the like using simulated data becomes “the best way to discover fundamental flaws and differences among them when they should be expected to perform at their best.”

They found…

  • “No one [value-added] estimator performs well under all plausible circumstances, but some are more robust than others…[some] fare better than expected…[and] some of the most popular methods are neither the most robust nor ideal.” In other words, calculating value-added regardless of the sophistication of the statistical specifications and controls used is messy, and this messiness can seriously throw off the validity of the inferences to be drawn about teachers, even given the fanciest models and methodological approaches we currently have going (i.e., those models and model specifications being advanced via policy).
  • “[S]ubstantial proportions of teachers can be misclassified as ‘below average’ or ‘above average’ as well as in the bottom and top quintiles of the teacher quality distribution, even in [these] best-case scenarios.” This means that the misclassification errors we are seeing with real-world data, we are also seeing with simulated data. This leads us to even more concern about whether VAMs will ever be able to get it right, or in this case, counter the effects of the nonrandom assignment of students to classrooms and teachers to the same.
  • Researchers found that “even in the best scenarios and under the simplistic and idealized conditions imposed by [their] data-generating process, the potential for misclassifying above-average teachers as below average or for misidentifying the “worst” or “best” teachers remains nontrivial, particularly if teacher effects are relatively small. Applying the [most] commonly used [value-added approaches] results in misclassification rates that range from at least 7 percent to more than 60 percent, depending upon the estimator and scenario.” So even with a pretty perfect dataset, or a dataset much cleaner than those that come from actual children and their test scores in real schools, misclassification errors can impact teachers upwards of 60% of the time.

In sum, researchers conclude that while certain VAMs hold more promise than others, they may not be capable of overcoming the many obstacles presented by the non-random assignment of students to teachers (and teachers to classrooms).

In their own words, “it is clear that every estimator has an Achilles heel (or more than one area of potential weakness)” that can distort teacher-level output in highly consequential ways. Hence, “[t]he degree of error in [VAM] estimates…may make them less trustworthy for the specific purpose of evaluating individual teachers” than we might think.

Laura Chapman: SLOs Continued

Within my last post, about “Student Learning Objectives (SLOs) [and] What (Little) We Know about Them…,” I requested more information about SLOs and Laura H. Chapman (whose work on SLOs was at the core of this prior post) responded with the paper also referenced in the prior post. This paper is about using SLOs as a proxy for value-added modeling (VAM) and is available for download here: The Marketing of Student Learning Objectives (SLOs)-1999-2014.

Chapman defines SLOs as “a version of the 1950s business practice known as management-by-objectives modified with pseudo-scientific specifications intended to create an aura of objectivity,” although “the apparent scientific precision of the SLO process [remains] an illusion.” In business, this occurs when “lower-level managers identify measurable goals and ‘targets’ to be met [and a] manager of higher rank approves the goals, targets, and measures,” after which performance pay is attained if and when the targets are met. In education, SLOs are to be used “for rating the majority of teachers not covered by VAM, including teachers in the arts and other ‘untested’ or ‘nontested’ subjects.” In education, SLOs are also otherwise called “student learning targets,” “student learning goals,” “student growth targets (SGOs),” or “SMART goals”—Specific, Measurable, Achievable, Results-oriented and Relevant, and Time-bound.

Why is this all happening in Chapman’s view? “This preoccupation with ratings and other forms of measurement is one manifestation of what I have called the econometric turn in federal and state policies. The econometric turn is most evident in the treatment of educational issues as managerial problems and the reification of metrics, especially test scores, as if these are objective, trustworthy, and essential for making educational decisions (Chapman, 2013).”

Chapman then reviews four reports funded by the US Department of Education that, despite a series of positive promotional attempts, altogether “point out the absence of evidence to support any use of SLOs other than securing teacher compliance with administrative mandates.” I also discussed this in my aforementioned post on this topic, but do read Chapman’s full report for more in-depth coverage.

Regardless, SLOs along with VAMs have become foundational to the “broader federal project to make pay-for-performance the national norm for teacher compensation.” Likewise, internal funders including the US Department of Education and their Reform Support Network (RSN), and external funders including but not limited to the Bill and Melinda Gates Foundation, Teach Plus, Center for Teacher Quality, Hope Street Group, Educators for Excellence, and Teachers United continue to fund and advance SLO + VAM efforts, despite the evidence, or lack thereof, especially in the case of SLOs.

As per Chapman, folks affiliated with these groups (and others) continue to push SLOs forward by focusing on four points in the hope of inducing increased compliance. These points include assertions that the SLO process (1) is collaborative, (2) is adaptable, (3) improves instruction (which has no evidence in support), and (4) improves student learning (which has no evidence in support). You can read more about each of these studies in Chapman’s report, linked to again here, and the evidence that exists (or not) per report.