Is this Thing On? Amplifying the Call to Stop the Use of Test Data for Educator Evaluations (At Least for Now)

I invited a colleague of mine and now member of the VAMboozled! team – Kimberly Kappler Hewitt (Assistant Professor, University of North Carolina, Greensboro) – to write another guest post for you all (see her first post here). She wrote another, this time capturing what three leading professional organizations have to say on the use of VAMs and tests in general for purposes of teacher accountability. Here’s what she wrote:

Within the last year, three influential organizations—reflecting researchers, practitioners, and philanthropic sectors—have called for a moratorium on the current use of student test score data for educator evaluations, including the use of value-added models (VAMs).

In April of 2014, the American Statistical Association (ASA) released a position statement that was highly skeptical of the use of VAMs for educator evaluation. ASA declared that “Attaching too much importance to a single item of quantitative information is counterproductive—in fact, it can be detrimental to the goal of improving quality.” To be clear, the ASA stopped short of outright condemning the use of VAM for educator evaluation, and declared that its statement was designed to provide guidance, not prescription. Instead, ASA outlined the possibilities and limitations of VAM and called into question how it is currently being (mis)used for educator evaluation.

In June of 2014, the Gates Foundation, the largest American philanthropic education funder, released “A Letter to Our Partners: Let’s Give Students and Teachers Time.” This was written by Vicki Phillips, Director of Education, College Ready, in which she (on behalf of the Foundation) called for a two-year moratorium on the use of test scores for educator evaluation. She explained that “teachers need time to develop lessons, receive more training, get used to the new tests, and offer their feedback.”

Similarly, the Association for Supervision and Curriculum Development (ASCD), which is arguably the leading international educator organization comprised of 125,000 members in more than 130 nations, also recently released a policy brief that also calls for a two-year moratorium on high stakes use of state tests—including their use for educator evaluations. ASCD also explicitly acknowledged that “reliance on high-stakes standardized tests to evaluate students, educators, or schools is antithetical to a whole child education. It is also counter to what constitutes good educational practice.”

While the call to halt the current use of test scores for educator evaluation is echoed across all three of these organizations, there are important nuances to their messages. The Gates Foundation, for example, makes it clear that the foundation supports the use of student test data for educator evaluation even as it declares the need for a two-year moratorium, the purpose of which is to allow teachers the time to adjust to the new Common Core Standards and related tests:

The Gates Foundation is an ardent supporter of fair teacher feedback and evaluation systems that include measures of student gains. We don’t believe student assessments should ever be the sole measure of teaching performance, but evidence of a teacher’s impact on student learning should be part of a balanced evaluation that helps all teachers learn and improve.

The Gates Foundation cautions, though, the risk of moving too quickly to tie test scores to teacher evaluation:

Applying assessment scores to evaluations before these pieces are developed would be like measuring the speed of a runner based on her time—without knowing how far she ran, what obstacles were in the way, or whether the stopwatch worked!

I wonder what the stopwatch symbolizes in the simile: Does the Gates Foundation have questions about the measurement mechanism itself (VAM or another student growth measure), or is Gates simply arguing for more time in order for educators to be “ready” for the race they are expected to run?

While the Gates call for a moratorium is oriented on increasing the possibility of realizing the positive potential of policies regarding the use of student test data for educator evaluation by providing more time to prepare educators for them, ASA on the other hand is concerned about the potential negative effects of such policies. The ASA, in its attempt to provide guidance, identified problems with the current use of VAM for educator evaluation and raised important questions about the potential effects of high stakes use of VAM for educator evaluation:

A decision to use VAMs for teacher evaluations might change the way the tests are viewed and lead to changes in the school environment. For example, more classroom time might be spent on test preparation and on specific content from the test at the exclusion of content that may lead to better long-term learning gains or motivation for students. Certain schools may be hard to staff if there is a perception that it is harder for teachers to achieve good VAM scores when working in them. Over-reliance on VAM scores may foster a competitive environment, discouraging collaboration and efforts to improve the educational system as a whole.

Similarly to ASA, ASCD is concerned with the negative effects of current accountability practices, including “over testing, a narrowing of the curriculum, and a de-emphasis of untested subjects and concepts—the arts, civics, and social and emotional skills, among many others.” While ASCD is clear that it is not calling for a moratorium on testing, it is calling for a moratorium on accountability consequences linked to state tests: “States can and should still administer standardized assessments and communicate the results and what they mean to districts, schools, and families, but without the threat of punitive sanctions that have distorted their importance.” ASCD goes further than ASA and Gates in calling for a complete revamp of accountability practices, including policies regarding teacher accountability:

We need a pause to replace the current system with a new vision. Policymakers and the public must immediately engage in an open and transparent community decision-making process about the best ways to use test scores and to develop accountability systems that fully support a broader, more accurate definition of college, career, and citizenship readiness that ensures equity and access for all students.

So…are policymakers listening? Are these influential organizations able to amplify the voices of researchers and practitioners across the country who also want a moratorium on misguided teacher accountability practices? Let’s hope so.

Playing Fair: Factors that Influence VAM for Special Education Teachers

As you all know, value-added models (VAMs) are intended to measure a teacher’s effectiveness. By comparing students’ learning and the value that educators add, VAMs attempt to isolate the teacher’s impact on student achievement. VAMs focus on individual student progress from one testing period to the next, sometimes without considering past learning, peer influence, family environment or individual ability, depending on the model.

Teachers, administrators and experts have debated VAM reliability and validity, but not often mentioned is the controversy regarding the use of VAMs for teachers of special education students. Why is this so controversial? Because students with disabilities are often educated in general education classrooms, but generally score lower on standardized tests – tests that they often should not be taking in the first place. Accordingly, holding teachers of special education students accountable for their performance is uniquely problematic. For example, many special education students are in mainstream classrooms, with co-teaching provided by both special and general education teachers; hence, special education programs can present challenges to VAMs that are meant to measure straightforward progress.

Co-teaching Complexities

Research like “Co-Teaching: An Illustration of the Complexity of Collaboration in Special Education” outlines some of the specific challenges that teachers of special education can face when co-teaching is involved. But essentially, co-teaching is a partnership between a general and a special education teacher, who jointly instruct a group of students, including those with special needs and disabilities. The intent is to provide special education students with access to the general curriculum while receiving more specialized instruction to support their learning.

Accordingly, collaboration is key to successful co-teaching. Teams that demonstrate lower levels of collaboration tend to struggle more, while successful co-teaching teams share their expertise to motivate students. However, special education teachers often report differences in teaching styles that lead to conflict; they often feel regulated to the role of classroom assistant, rather than full teaching partner. This also has implications for VAMs.

For example, student outcomes from co-teaching vary. A 2002 study by Rea, McLaughlin and Walther-Thomas found that students with learning disabilities in co-taught classes had better attendance and report card grades, but no better performance on standardized tests. Another report showed that test scores for students with and without disabilities were not affected by co-teaching (Idol, 2006).

A 2014 study by the Carnegie Foundation for the Advancement of Teaching points out another issue that can make co-teaching more difficult in a special education settings; it can be difficult to determine value-added because it can be hard to separate such teachers’ contributions. Authors also assert that calculating value-added would be more accurate if the models used more detailed data about disability status, services rendered, and past and present accommodations made, but many states do not collect these data (Buzick, 2014), and even if they did there is no real level of certainty that this would work.

Likewise, inclusion brings special education students into the general classroom, eliminating boundaries between special education students and general education peers. However, special education teachers often voice opposition to general education inclusion as it relates to VAMs.

According to “Value-Added Modeling: Challenges for Measuring Special Education Teacher Quality” (Lawson, 2014) some of the specific challenges cited include:

  • When students with disabilities spend the majority of their day in general education classrooms, special education teacher effectiveness is distorted.
  • Quality special education instruction can be hindered by poor general education instruction.
  • Students may be pulled out of class for additional services, which makes it difficult to maintain progress and pace.
  • Multiple teachers often provide instruction to special education students, so each teacher’s impact is difficult to assess.
  • When special education teachers assist general education classrooms, their impact is not measured by VAMs.

And along with the complexities involved with teaching students with disabilities, special education teachers also deal with a number of constraints that impact instructional time and affect VAMs. Special education teachers also deal with more paperwork, including Individualized Education Plans (IEPs) that take time to write and review. In addition, they must handle extensive curriculum and lesson planning, manage parent communication, keep up with special education laws and coordinate with general education teachers. While their priority may be to fully support each student’s learning and achievement, it’s not always possible. In addition, not everything special education teachers do can be appropriately captured using tests.

These are but a few reasons that special education teachers should question the fairness of VAMs.

***

This is a guest post from Umari Osgood who works at Bisk Education and writes on behalf of University of St. Thomas online programs.

Student Learning Objectives (SLOs) as a Measure of Teacher Effectiveness: A Survey of the Policy Landscape

I have invited another one of my former PhD students, Noelle Paufler, to the VAMboozled! team, and for her first post she has written on student learning objectives (SLOs), in large part as per the prior request(s) of VAMboozled! followers. Here is what she wrote:

Student learning objectives (SLOs) are rapidly emerging as the next iteration in the policy debate surrounding teacher accountability at the state and national levels. Purported as one solution to the methodologically challenging task of measuring the effectiveness of teachers of subject areas for which large-scaled standardized tests are unavailable, SLOs prompt the same questions of validity, reliability, and fairness raised by many about value-added models (VAMs). Defining the SLO process as “a participatory method of setting measurable goals, or objectives, based on the specific assignment or class, such as the students taught, the subject matter taught, the baseline performance of the students, and the measurable gain in student performance during the course of instruction” (Race to the Top Technical Assistance Network, 2010, p. 1), Lacireno-Paquet, Morgan, and Mello (2014) provide an overview of states’ use of SLOs in teacher evaluation systems.

There are three primary types of SLOs (i.e., for individual teachers, teams or grade levels, and school-wide) that may target subgroups of students and measure student growth or another measurable target (Lacireno-Paquet et al., 2014). SLOs relying on one or more assessments (e.g., state-wide standardized tests; district-, school-, or classroom measures) for individual teachers are most commonly used in teacher evaluation systems (Lacireno-Paquet et al., 2014). At the time of their writing, 25 states had included SLOs under various monikers (e.g., student learning targets, student learning goals) in their teacher evaluation systems (Lacireno-Paquet et al., 2014). Of these states, 24 provide a structured process for setting, approving, and evaluating SLOs which most often requires an evaluator at the school or district level to review and approve SLOs for individual teachers (Lacireno-Paquet et al., 2014). For more detailed state-level information, read the full report here.

Arizona serves as a case in point for considering the use of SLOs as part of the Arizona Model for Measuring Educator Effectiveness, an evaluation system comprising measures of teacher professional practice (50%-67%) and student achievement (33%-50%). Currently, the Arizona Department of Education (ADE) classifies teachers into two groups (A and B) based on the availability of state standardized tests for their respective content areas. ADE (2015) defines teachers “who have limited or no classroom level student achievement data that are valid and reliable, aligned to Arizona’s academic standards and appropriate to teachers’ individual content area” as Group B for evaluation purposes (e.g., social studies, physical education, fine arts, career and technical education [CTE]) (p. 1). Recommending SLOs as a measure of student achievement for these teachers, ADE (2015) cites their use as a means to positively impact student achievement, especially when teachers collaboratively create quality common assessments to measure students across a grade level or within a content area. ADE (2015) describes SLOs as “classroom level measures of student growth and mastery” that are “standards based and relevant to the course content,” “specific and measureable,” and “use [student data from] two points in time,” specifically stating that individual lesson objectives and units of study do not qualify and discouraging teaching to the test (p. 1). Having piloted the SLO process in the 2012-2013 school year with full implementation in the 2013-2014 school year in five Local Education Agencies (LEAs) (four district and one charter), ADE (2015) continues to discuss next steps in the implementation of SLOs.

Despite this growing national interest in and rapid implementation of SLOs, very little research has examined the perspectives of district- and school-level administrators and teachers (in both Groups A and B or their equivalent) with regards to the validity, reliability, and fairness of measuring student achievement in this manner. Additional research in early adopter states as well as in states that are piloting the use of SLOs is needed in order to better understand the implications of yet another wave of accountability policy changes.

References

Arizona Department of Education. (2015). The student learning objective handbook. Retrieved from http://www.azed.gov/teacherprincipal-evaluation/files/2015/01/slo-handbook-7-2.pdf?20150120

Lacireno-Paquet, N., Morgan, C., & Mello, D. (2014). How states use student learning objectives in teacher evaluation systems: A review of state websites (REL 2014-013). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory North-east & Islands. Retrieved from http://ies.ed.gov/ncee/edlabs/projects/project.asp?projectID=380

Race to the Top Technical Assistance Network. (2010). Measuring student growth for teachers in non-tested grades and subjects: A primer. Washington, DC: ICF International. Retrieved http://nassauboces.org/cms/lib5/NY18000988/Centricity/Domain/156/NTS__PRIMER_FINAL.pdf

Educator Evaluations (and the Use of VAM) Unlikely to be Mandated in Reauthorization of ESEA

In invited a colleague of mine – Kimberly Kappler Hewitt (Assistant Professor, University of North Carolina, Greensboro) – to write a guest post for you all, and she did on her thoughts regarding what is currently occurring on Capitol Hill regarding the reauthorization of the Elementary and Secondary Education Act (ESEA). Here is what she wrote:

Amidst what is largely a bitterly partisan culture on Capitol Hill, Republicans and Democrats agree that teacher evaluation is unlikely to be mandated in the reauthorization of the Elementary and Secondary Education Act (ESEA), the most recent iteration of which is No Child Left Behind (NCLB), signed into law in 2001. See here for an Education Week article by Lauren Camera on the topic.

In another piece on the topic (here), the same author Camera explains: “Republicans, including Chairman Lamar Alexander, R-Tenn., said Washington shouldn’t mandate such policies, while Democrats, including ranking member Patty Murray, D-Wash., were wary of increasing the role student test scores play in evaluations and how those evaluations are used to compensate teachers.” However, under draft legislation introduced by Senator Lamar Alexander (R-Tenn.), Chairman of the Senate Health, Education, Labor, and Pensions Committee, Title II funding would turn into federal block grants, which could be used by states for educator evaluation. Regardless, excluding a teacher evaluation mandate from ESEA reauthorization may undermine efforts by the Obama administration to incorporate student test score gains as a significant component of educator evaluation.

Camera further explains: “Should Congress succeed in overhauling the federal K-12 law, the lack of teacher evaluation requirements will likely stop in its tracks the Obama administration’s efforts to push states to adopt evaluation systems based in part on student test scores and performance-based compensation systems.”

Under the Obama administration, in order for states to obtain a waiver from NCLB penalties and to receive a Race to the Top Grant, they had to incorporate—as a significant component—student growth data in educator evaluations. Influenced by these powerful policy levers, forty states and the District of Columbia require objective measures of student learning to be included in educator evaluations—a sea change from just five years ago (Doherty & Jacobs/National Council on Teacher Quality, 2013). Most states use either some type of value-added model (VAM) or student growth percentile (SGP) model to calculate a teacher’s contribution to student score changes.

The Good, the Bad, and the Ugly

As someone who is skeptical about the use of VAMs and SGPs for evaluating educators, I have mixed feelings about the idea that educator evaluation will be left out of ESEA reauthorization. I believe that student growth measures such as VAMs and SGPs should be used not as a calculable component of an educator’s evaluation but as a screener to flag educators who may need further scrutiny or support, a recommendation made by a number of student growth measure (SGM) experts (e.g., Baker et al., 2010; Hill, Kapitula, & Umland, 2011; IES, 2010; Linn, 2008).

Here are two thoughts about the consequences of not incorporating policy on educator evaluation in the reauthorization of ESEA:

  1. Lack of clear federal vision for educator evaluation devolves to states the debate. There are strong debates about what the nature of educator evaluation can and should be, and education luminaries such as Linda Darling Hammond and James Popham have weighed in on the issue (see here and here, respectively). If Congress does not address educator evaluation in ESEA legislation, the void will be filled by disparate state policies. This in itself is neither good nor bad. It does, however, call into question the longevity of the efforts the Obama administration has made to leverage educator evaluation as a way to increase teacher quality. Essentially, the lack of action on the part of Congress regarding educator evaluation devolves the debates to the state level, which means that heated—and sometimes vitriolic—debates about educator evaluation will endure, shifting attention away from other efforts that could have a more powerful and more positive effect on student learning.
  2. Possibility of increases in inequity. ESEA was first passed in 1965 as part of President Johnson’s War on Poverty. ESEA was intended to promote equity for students from poverty by providing federal funding to districts serving low-income students. The idea was that the federal government could help to level the playing field, so to speak, for students who lacked the advantages of higher income students. My own research suggests that the use of VAM for educator evaluation potentially exacerbates inequity in that some teachers avoid working with certain groups of students (e.g., students with disabilities, gifted students, and students who are multiple grade levels behind) and at certain schools, especially high-poverty schools, based on the perception that teaching such students and in such schools will result in lower value-added scores. Without federal legislation that provides clear direction to states that student test score data should not be used for high-stakes evaluation and personnel decisions, states may continue to use data in this manner, which could exacerbate the very inequities that ESEA was originally designed to address.

While it is a good thing, in my mind, that ESEA reauthorization will not mandate educator evaluation that incorporates student test score data, it is a bad (or at least ugly) thing that Congress is abdicating the role of promoting sound educator evaluation policy.

References

Baker, A. L., Barton, P. E., Darling-Hammond, L., Haertel, E., Ladd, H. F., Linn, R. L., . . . Shepard, L. A. (2010). Problems with the use of student test scores to evaluate teachers. EPI Briefing Paper. Washington, D.C.

Hill, H. C., Kapitula, L., & Umland, K. (2011). A validity argument approach to evaluating teacher value-added scores. American Educational Research Journal, 48(3), 794-831.

Doherty, K. M., & Jacobs, S./National Council on Teacher Quality (2013). State of the states 2013: Connect the dots: Using evaluation of teacher effectiveness to inform policy and practice. Washington, D. C.: National Council on Teacher Quality.

Institute of Education Sciences. (2010). Error rates in measuring teacher and school performance based on students’ test score gains. Washington, D.C.: U. S. Department of Education.

Linn, R. L. (2008). Methodological issues in achieving school accountability. Journal of Curriculum Studies, 40(6), 699-711.