Special Issue of “Educational Researcher” (Paper #5 of 9): Teachers’ Perceptions of Observations and Student Growth

Recall that the peer-reviewed journal Educational Researcher (ER) – recently published a “Special Issue” including nine articles examining value-added measures (VAMs). I have reviewed the next of nine articles (#5 of 9) here, titled “Teacher Perspectives on Evaluation Reform: Chicago’s REACH [Recognizing Educators Advancing Chicago Students] Students.” This one is authored by Jennie Jiang, Susan Sporte, and Stuart Luppescu, all of whom are associated with The University of Chicago’s Consortium on Chicago School Research, and all of whom conducted survey- and interview-based research on teachers’ perceptions of the Chicago Public Schools (CPS) teacher evaluation system, twice since it was implemented in 2012–2013. They did this across CPS’s almost 600 schools and its more than 12,000 teachers, with high-stakes being recently attached to teacher evaluations (e.g., professional development plans, remediation, tenure attainment, teacher dismissal/contract non-renewal; p. 108).

Directly related to the Review of Article #4 prior (i.e., #4 of 9 on observational systems’ potentials here), these researchers found that Chicago teachers are, in general, positive about the evaluation system, primarily given the system’s observational component (i.e., the Charlotte Danielson Framework for Teaching, used twice per year for tenured teachers and that counts for 75% of teachers’ evaluation scores), and not given the inclusion of student growth in this evaluation system (that counts for the other 25%). Although researchers also found that overall satisfaction levels with the REACH system at large is declining at a statistically significant rate over time, as teachers get to know the system, perhaps, better.

This system, like the strong majority of others across the nation, is based on only these two components, although the growth measure includes a combination of two different metrics (i.e., value-added scores and growth on “performance tasks” as per the grades and subject areas taught). See more information about how these measures are broken down by teacher type in Table 1 (p. 107), and see also (p. 107) for the different types of measures used (e.g., the Northwest Evaluation Association’s Measures of Academic Progress assessment (NWEA-MAP), a Web-based, computer-adaptive, multiple-choice assessment, that is used to measure value-added scores for teachers in grades 3-8).

As for the student growth component, more specifically, when researchers asked teachers “if their evaluation relies too heavily on student growth, 65% of teachers agreed or strongly agreed” (p. 112); “Fifty percent of teachers disagreed or strongly disagreed that NWEA-MAP [and other off-the-shelf tests used to measure growth in CPS offered] a fair assessment of their student’s learning” (p. 112); “teachers expressed concerns about the narrow representation of student learning that is measured by standardized tests and the increase in the already heavy testing burden on teachers and students” (p. 112); and “Several teachers also expressed concerns that measures of student growth were unfair to teachers in more challenging schools [i.e., bias], because student growth is related to the supports that students may or may not receive outside of the classroom” (p. 112). “One teacher explained this concern [writing]: “I think the part that I find unfair is that so much of what goes on in these kids’ lives is affecting their academics, and those are things that a
teacher cannot possibly control” (p. 112).

As for the performance tasks meant to compliment (or serve as) the student growth or VAM measure, teachers were discouraged with this being so subjective, and susceptible to distortion because teachers “score their own students’ performance tasks at both the beginning and end of the year. Teachers noted that if they wanted to maximize their student growth score, they could simply give all students a low score on the beginning-of-year task and a higher score at the end of the year” (p. 113).

As for the observational component, however, researchers found that “almost 90% of teachers agreed that the feedback they were provided in post-observation conferences” (p. 111) was of highest value; the observational processes but more importantly the post-observational processes made them and their supervisors more accountable for their effectiveness, and more importantly their improvement. While in the conclusions section of this article authors stretch this finding out a bit, writing that “Overall, this study finds that there is promise in teacher evaluation reform in Chicago,” (p. 114) as primarily based on their findings about “the new observation process” (p. 114) being used in CPS, recall from the Review of Article #4 prior (i.e., #4 of 9 on observational systems’ potentials here), these observational systems are not “new and improved.” Rather, these are the same observational systems that, given their levels of subjectivity featured and highlighted in reports like “The Widget Effect” (here), brought us to our now (over)reliance on VAMs.

Researchers also found that teachers were generally confused about the REACH system, and what actually “counted” and for how much in their evaluations. The most confusion surrounded the student growth or value-added component, as (based on prior research) would be expected. Beginning teachers reported more clarity, than did relatively more experienced teachers, high school teachers, and teachers of special education students, and all of this was related to the extent to which a measure of student growth directly impacted teachers’ evaluations. Teachers receiving school-wide value-added scores were also relatively more critical.

Lastly, researchers found that in 2014, “79% of teachers reported that the evaluation process had increased their levels of stress and anxiety, and almost 60% of teachers
agreed or strongly agreed the evaluation process takes more effort than the results are worth.” Again, beginning teachers were “consistently more positive on all…measures than veteran teachers; elementary teachers were consistently more positive than high
school teachers, special education teachers were significantly more negative about student growth than general teachers,” and the like (p. 113). And all of this was positively and significantly related to teachers’ perceptions of their school’s leadership, perceptions of the professional communities at their schools, and teachers’ perceptions of evaluation writ large.


If interested, see the Review of Article #1 – the introduction to the special issue here; see the Review of Article #2 – on VAMs’ measurement errors, issues with retroactive revisions, and (more) problems with using standardized tests in VAMs here; see the Review of Article #3 – on VAMs’ potentials here; and see the Review of Article #4 – on observational systems’ potentials here.

Article #5 Reference: Jiang, J. Y., Sporte, S. E., & Luppescu, S. (2015). Teacher perspectives on evaluation reform: Chicago’s REACH students. Educational Researcher, 44(2), 105-116. doi:10.3102/0013189X15575517

Vanderbilt Researchers on Performance Pay, VAMs, and SLOs

Do higher paychecks translate into higher student test scores? That is the question two researchers at Vanderbilt – Ryan Balch (recent Graduate Research Assistant at Vanderbilt’s National Center on Performance Incentives) and Matthew Springer (Assistant Professor of Public Policy and Education and Director of Vanderbilt’s National Center on Performance Incentives) – attempted to answer in a recent study of the REACH pay-for-performance program in Austin, Texas (a nationally recognized performance program model with $62.3 million in federal support). The study published in Education Economics Review can be found here, but for a $19.95 fee; hence, I’ll do my best to explain this study’s contents so you all can save your money, unless of course you too want to dig deeper.

As background (and as explained on the first page of the full paper), the theory behind performance pay is that tying teacher pay to teacher performance provides “strong incentives” to improve outcomes of interest. “It can help motivate teachers to higher levels of performance and align their behaviors and interests with institutional goals.” I should note, however, that there is very mixed evidence from over 100 years of research on performance pay regarding whether it has ever worked. Economists tend to believe it works while educational researchers tend to disagree.

Regardless, in this study as per a ResearchNews@Vanderbilt post put out by Vanderbilt highlighting it, researchers found that teacher-level growth in student achievement in mathematics and reading in schools in which teachers were given monetary performance incentives was significantly higher during the first year of the program’s implementation (2007-2008), than was the same growth in the nearest matched, neighborhood schools where teachers were not given performance incentives. Similar gains were maintained the following year, yet (as per the full report) no additional growth or loss was noted otherwise.

As per the full report as well, researchers more specifically found that students who were enrolled in the REACH program gained between 0.13 and 0.17 standard deviations greater gains in mathematics, and (although not as evident or highlighted in the text of the actual report, but within a related table) students who were enrolled in the REACH program gained between 0.10 and 0.05 standard deviations greater gains in reading, although these gains were also less significant in statistical terms. Curious…

While the method by which schools were matched was well-detailed, and inter-school descriptive statistics were presented to help readers determine whether in fact the schools sampled for this study were comparable (although statistics that would also help us determine whether the inter-school differences noted were statistically significant enough to pay attention to), the statistics comparing the teachers in REACH schools versus those not in REACH schools to whom they were compared were completely missing. Hence, it is impossible to even begin to determine whether the matching methodology used actually yielded comparable samples down to the teacher level – the heart of this research study. This is a fatal flaw that in my opinion should have prevented this study from being published, at least as is, as without this information we have no guarantees that teachers within these schools were indeed comparable.

Regardless, researchers also examined teachers’ Student Learning Objectives (SLOs) – the incentive program’s “primary measure of individual teacher performance” given so many teachers are still VAM-ineligible (see a prior post about SLOs, here). They examined whether SLO scores correlated with VAM scores, for those teachers who had both.

They found, as per a quote by Springer in the above-mentioned post, that “[w]hile SLOs may serve as an important pedagogical tool for teachers in encouraging goal-setting for students, the format and guidance for SLOs within the specific program did not lead to the proper identification of high value-added teachers.” That is, more precisely and as indicated in the actual study, SLOs were “not significantly correlated with a teacher’s value-added student test scores;” hence, “a teacher is no more likely to meet his or her SLO targets if [his/her] students have higher levels of achievement [over time].” This has huge implications, in particular regarding the still lacking evidence of validity surrounding SLOs.

Student Learning Objectives (SLOs) as a Measure of Teacher Effectiveness: A Survey of the Policy Landscape

I have invited another one of my former PhD students, Noelle Paufler, to the VAMboozled! team, and for her first post she has written on student learning objectives (SLOs), in large part as per the prior request(s) of VAMboozled! followers. Here is what she wrote:

Student learning objectives (SLOs) are rapidly emerging as the next iteration in the policy debate surrounding teacher accountability at the state and national levels. Purported as one solution to the methodologically challenging task of measuring the effectiveness of teachers of subject areas for which large-scaled standardized tests are unavailable, SLOs prompt the same questions of validity, reliability, and fairness raised by many about value-added models (VAMs). Defining the SLO process as “a participatory method of setting measurable goals, or objectives, based on the specific assignment or class, such as the students taught, the subject matter taught, the baseline performance of the students, and the measurable gain in student performance during the course of instruction” (Race to the Top Technical Assistance Network, 2010, p. 1), Lacireno-Paquet, Morgan, and Mello (2014) provide an overview of states’ use of SLOs in teacher evaluation systems.

There are three primary types of SLOs (i.e., for individual teachers, teams or grade levels, and school-wide) that may target subgroups of students and measure student growth or another measurable target (Lacireno-Paquet et al., 2014). SLOs relying on one or more assessments (e.g., state-wide standardized tests; district-, school-, or classroom measures) for individual teachers are most commonly used in teacher evaluation systems (Lacireno-Paquet et al., 2014). At the time of their writing, 25 states had included SLOs under various monikers (e.g., student learning targets, student learning goals) in their teacher evaluation systems (Lacireno-Paquet et al., 2014). Of these states, 24 provide a structured process for setting, approving, and evaluating SLOs which most often requires an evaluator at the school or district level to review and approve SLOs for individual teachers (Lacireno-Paquet et al., 2014). For more detailed state-level information, read the full report here.

Arizona serves as a case in point for considering the use of SLOs as part of the Arizona Model for Measuring Educator Effectiveness, an evaluation system comprising measures of teacher professional practice (50%-67%) and student achievement (33%-50%). Currently, the Arizona Department of Education (ADE) classifies teachers into two groups (A and B) based on the availability of state standardized tests for their respective content areas. ADE (2015) defines teachers “who have limited or no classroom level student achievement data that are valid and reliable, aligned to Arizona’s academic standards and appropriate to teachers’ individual content area” as Group B for evaluation purposes (e.g., social studies, physical education, fine arts, career and technical education [CTE]) (p. 1). Recommending SLOs as a measure of student achievement for these teachers, ADE (2015) cites their use as a means to positively impact student achievement, especially when teachers collaboratively create quality common assessments to measure students across a grade level or within a content area. ADE (2015) describes SLOs as “classroom level measures of student growth and mastery” that are “standards based and relevant to the course content,” “specific and measureable,” and “use [student data from] two points in time,” specifically stating that individual lesson objectives and units of study do not qualify and discouraging teaching to the test (p. 1). Having piloted the SLO process in the 2012-2013 school year with full implementation in the 2013-2014 school year in five Local Education Agencies (LEAs) (four district and one charter), ADE (2015) continues to discuss next steps in the implementation of SLOs.

Despite this growing national interest in and rapid implementation of SLOs, very little research has examined the perspectives of district- and school-level administrators and teachers (in both Groups A and B or their equivalent) with regards to the validity, reliability, and fairness of measuring student achievement in this manner. Additional research in early adopter states as well as in states that are piloting the use of SLOs is needed in order to better understand the implications of yet another wave of accountability policy changes.


Arizona Department of Education. (2015). The student learning objective handbook. Retrieved from http://www.azed.gov/teacherprincipal-evaluation/files/2015/01/slo-handbook-7-2.pdf?20150120

Lacireno-Paquet, N., Morgan, C., & Mello, D. (2014). How states use student learning objectives in teacher evaluation systems: A review of state websites (REL 2014-013). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory North-east & Islands. Retrieved from http://ies.ed.gov/ncee/edlabs/projects/project.asp?projectID=380

Race to the Top Technical Assistance Network. (2010). Measuring student growth for teachers in non-tested grades and subjects: A primer. Washington, DC: ICF International. Retrieved http://nassauboces.org/cms/lib5/NY18000988/Centricity/Domain/156/NTS__PRIMER_FINAL.pdf

Laura Chapman: SLOs Continued

Within my last post, about “Student Learning Objectives (SLOs) [and] What (Little) We Know about Them…,” I requested more information about SLOs and Laura H. Chapman (whose work on SLOs was at the core of this prior post) responded with the paper also referenced in the prior post. This paper is about using SLOs as a proxy for value-added modeling (VAM) and is available for download here: The Marketing of Student Learning Objectives (SLOs)-1999-2014.

Chapman defines SLOs as “a version of the 1950s business practice known as management-by-objectives modified with pseudo-scientific specifications intended to create an aura of objectivity,” although “the apparent scientific precision of the SLO process [remains] an illusion.” In business, this occurs when “lower-level managers identify measurable goals and ‘targets’ to be met [and a] manager of higher rank approves the goals, targets, and measures,” after which performance pay is attained if and when the targets are met. In education, SLOs are to be used “for rating the majority of teachers not covered by VAM, including teachers in the arts and other ‘untested’ or ‘nontested’ subjects.” In education, SLOs are also otherwise called “student learning targets,” “student learning goals,” “student growth targets (SGOs),” or “SMART goals”—Specific, Measurable, Achievable, Results-oriented and Relevant, and Time-bound.

Why is this all happening in Chapman’s view? “This preoccupation with ratings and other forms of measurement is one manifestation of what I have called the econometric turn in federal and state policies. The econometric turn is most evident in the treatment of educational issues as managerial problems and the reification of metrics, especially test scores, as if these are objective, trustworthy, and essential for making educational decisions (Chapman, 2013).”

Chapman then reviews four reports funded by the US Department of Education that, despite a series of positive promotional attempts, altogether “point out the absence of evidence to support any use of SLOs other than securing teacher compliance with administrative mandates.” I also discussed this in my aforementioned post on this topic, but do read Chapman’s full report for more in-depth coverage.

Regardless, SLOs along with VAMs have become foundational to the “broader federal project to make pay-for-performance the national norm for teacher compensation.” Likewise, internal funders including the US Department of Education and their Reform Support Network (RSN), and external funders including but not limited to the Bill and Melinda Gates Foundation, Teach Plus, Center for Teacher Quality, Hope Street Group, Educators for Excellence, and Teachers United continue to fund and advance SLO + VAM efforts, despite the evidence, or lack thereof, especially in the case of SLOs.

As per Chapman, folks affiliated with these groups (and others) continue to push SLOs forward by focusing on four points in the hope of inducing increased compliance. These points include assertions that the SLO process (1) is collaborative, (2) is adaptable, (3) improves instruction (which has no evidence in support), and (4) improves student learning (which has no evidence in support). You can read more about each of these studies in Chapman’s report, linked to again here, and the evidence that exists (or not) per report.

Student Learning Objectives (SLOs): What (Little) We Know about Them Besides We Are to Use Them

Following up on a recent post, a VAMboozled! follower – Laura Chapman – wrote the comment below about Student Learning Objectives (SLOs) that I found important to share with you all. SLOs are objectives that are teacher-developed and administrator-approved to help hold teachers accountable for their students’ growth, although growth in this case is individually and loosely defined, which makes SLOs about as subjective as it gets. Ironically, SLOs serve as alternatives to VAMs when teachers who are VAM-ineligible need to be held accountable for “growth.”

Laura commented about how I need to write more about SLOs as states are increasingly adopting these, but states are doing this without really any research evidence in support of the concept, much less the practice. That might seem more surprising than it really is, but there is not a lot of research being conducted on SLOs, yet. One research document of which I am aware I reviewed here, with the actual document written by Mathematica and published by the US Department of Education here: “Alternative student growth measures for teacher evaluation: Profiles of early-adopting districts.

Conducting a search on ERIC, I found only two additional pieces also contracted out and published by the US Department of Education, although the first piece is more about describing what states are doing in terms of SLOs versus researching the actual properties of the SLOs. The second piece better illustrates the fact that “very little of the literature on SLOs addresses their statistical properties.”

What little we do know about SLOs at this point, however, is two-fold: (1) “no studies have looked at SLO reliability” and (2) “[l]ittle is known about whether SLOs can yield ratings that correlate with other measures of teacher performance” (i.e., one indicator of validity). The very few studies in which researchers have examined this found “small but positive correlations” between SLOs and VAM-based ratings (i.e., not a strong indicator of validity).

With that being said, if any of you are aware of research I should review or if any of you have anything to say or write about SLOs in your states, districts, or schools, feel free to email me at audrey.beardsley@asu.edu.

In the meantime, do also read what Laura Wrote about SLOs here:

I appreciate your work on the VAM problem. Equal attention needs to be given to the use of SLOs for evaluating teacher education in so-called untested and non-tested subjects. It has been estimated that about 65-69% of teachers have job assignments for which there are not state-wide tests. SLOs (and variants) are the proxy of choice for VAM. This writing exercise is required in at least 27 states, with pretest-posttest and/or baseline to post-test reports on student growth. Four reports from USDE (2014) [I found three] show that there is no empirical research to support the use of the SLO process (and associated district-devised tests and cut-off scores) for teacher evaluation.

The template for SLOs originated in Denver in 1999. It has been widely copied and promoted via publications from USDE’s “Reform Support Network,” which operates free of any need for evidence and few constraints other than marketing a deeply flawed product. SLO templates in wide use have no peer reviewed evidence to support their use for teacher evaluation…not one reliability study, not one study addressing their validity for teacher evaluation.

SLO templates in Ohio and other states are designed to fit the teacher-student data link project (funded by Gates and USDE since 2005). This means that USDE’s proposed evaluations of specific teacher education programs ( e.g., art education at Ohio State University) will be aided by the use of extensive “teacher of record” data routinely gathered by schools and districts, including personnel files that typically require the teacher’s college transcripts, degree earned, certifications, scores on tests for any teacher license and so on.

There are technical questions galore, but a big chunk of the data of interest to the promoters of this latest extension of the Gates/USDE’s rating game are in place.
I have written about the use of SLOs as a proxy for VAM in an unpublished paper titled The Marketing of Student Learning Objectives (SLOs): 1999-2014. A pdf with references can be obtained by request at chapmanLH@aol.com

Forcing the Fit Using Alternative “Student Growth” Measures

As discussed on this blog prior, when we are talking about teacher effectiveness as defined by the output derived via VAMs, we are talking about the VAMs that still, to date, only impact 30%-40% of all America’s public school teachers. These are the teachers who typically teach mathematics and/or reading/language arts in grades 3-8.

The teachers who are not VAM-eligible are those who typically teach in the primary grades (i.e., grades K-2), teachers in high-schools who teach more specialized subject areas that are often not tested using large-scale tests (e.g., geometry, calculus), and the teachers who teach out of the subject areas typically tested (e.g., social studies, science [although there is a current push to increase testing in science], physical education, art, music, special education, etc.). Sometimes entire campuses of teachers are not VAM-eligible.

So, what are districts to do when they are to follow the letter of the law, and the accountability policies being financially incentivized by the feds, and then the states (e.g., via Race to the Top and the NCLB waivers)? A new report released by the Institute of Education Sciences (IES), the research arm of the US Department of Education, and produced by Mathematica Inc. (via a contract with the IES) explains what states are up to in order to comply. You can find the summary and full report titled “Alternative student growth measures for teacher evaluation: Profiles of early-adopting districtshere.

What investigators found is that these “early adopters” are using end-of course exams, commercially available tests (e.g., the Galileo assessment system), and Student Learning Objectives (SLOs), which are teacher-developed and administrator-approved to hold teachers accountable for their students’ growth. Although an SLO is about as subjective as it gets in the company of the seemingly objective, more rigorous, and vastly superior VAMs. In addition, the districts sampled are also adopting the same VAM methodologies to keep all analytical approaches (except for the SLOs) the same, almost regardless of the measures used. If the measures exist, or are to be adopted, might as well “take advantage of them” to evaluate value-added because the assessments can be used (and exploited) to measure the value-added of more and more teachers. What?

This is the classic case of what we call “junk science.” We cannot just take whatever tests, regardless of to what standards they are aligned, or not, and run the data through the same value-added calculator in the name of accountability consistency.

Research already tells us that when using different tests, even on the same students of the same teachers at the same time, but using the same VAMs, gives us very, very different results (see, for example, the Papay article here).

Do the feds not see that forcing states to force the fit is completely wrong-headed and simply wrong? They are the ones who funded this study, but apparently see nothing wrong with the absurdity of the study’s results. Rather, they suggest, results should be used to “provide key pieces of information about the [sampled] districts’ experiences” so that results “can be used by other states and districts to decide whether and how to implement alternative assessment-based value-added models or SLOs.”

Force the fit, they say, regardless of the research or really any inkling of commonsense. Perhaps this will help to further line the pockets of more corporate reformers eager to offer, now, not only their VAM services but also even more tests, end-of-course, and SLO systems.

Way to lead the nation!