Student Learning Objectives, aka Student Growth Objectives, aka Another Attempt to Quantify “High Quality” Teaching

After a previous post about VAMs v. Student Growth Percentiles (SGPs) (see also VAMs v. SGPs Part II) a reader posted a comment asking for more information about the utility of SGPs, but also about the difference between SGPs and Student Growth Objectives.

“Student Growth Objectives” is a new term for an older concept that is being increasingly integrated into educational accountability systems nationwide, and also under scrutiny (see one of Diane Ravitch’s recent posts about this here). But the concept underlying Student Growth Objectives (SGOs) is essentially just Student Learning Objectives (SLOs). Why they insist on using the term “growth” in place of the term “learning” is perhaps yet another fad. Related, it also likely has something to do with various legislative requirements (e.g., Race to the Top terminologies), although evidence in support of this transition is also void.

Regardless, and put simply, an SGO/SLO is an annual goal for measuring student growth/learning of the students instructed by teachers (or principals, for school-level evaluations) who are not eligible to participate in a school’s or district’s value-added or student growth model. This includes the vast majority of teachers in most schools or districts (e.g., 70+%), because only those teachers who instruct reading/language arts or mathematics in state achievement tested grade levels, typically grades 3-8, are eligible to participate in the VAM or SGP evaluation system. Hence via the development of SGOs/SLOs, administrators and others were either unwilling to allow these exclusions to continue or forced to establish a mechanism to include the other teachers to meet some legislative mandate.

New Jersey, for example, defines an SGO as “a long-term academic goal that teachers set for groups of students and must be: Specific and measureable; Aligned to New Jersey’s curriculum standards; Based on available prior student learning data; A measure of what a student has learned between two points in time; Ambitious and achievable” (for more information click here).

Denver Public Schools has been using SGOs for many years; their 2008-2009 Teacher Handbook states that an SGO must be “focused on the expected growth of [a teacher’s] students in areas identified in collaboration with their principal,” as well as that the objectives must be “Job-based; Measurable; Focused on student growth in learning; Based on learning content and teaching strategies; Discussed collaboratively at least three times during the school year; May be adjusted during the school year; Are not directly related to the teacher evaluation process; [and] Recorded online” (for more information click here).

That being said, and in sum, SGOs/SLOs, like VAMs, are not supported with empirical work. As Jersey Jazzman summarized very well in his post about this, the correlational evidence is very weak, the conclusions drawn by outside researchers are a stretch, and the rush to implement these measures is just as unfounded as the rush to implement VAMs for educator evaluation. We don’t know that SGOs/SLOs make a difference in distinguishing “good” from “poor” teachers; and in fact, some could argue (like Jersey Jazzman does) that they don’t actually do so much of anything at all. They’re just another metric being used in the attempt to quantify “high quality” teaching.

Thanks to Dr. Sarah Polasky for this post.

Stanford Professor, Dr. Edward Haertel, on VAMs

In a recent speech and subsequent paper written by Dr. Edward Haertel – National Academy of Education member and Professor at Stanford University – he writes about VAMs and the extent to which VAMs, being based on student test scores, can be used to make reliable and valid inferences about teachers and teacher effectiveness. This is a must-read, particularly for those out there who are new to the research literature in this area. Dr. Haertel is certainly an expert here, actually one of the best we have, and in this piece he captures the major issues well.

Some of the issues highlighted include concerns about the tests used to model value-added and how their scales (falsely assumed to be as objective and equal as units on a measuring stick) complicate and distort VAM-based estimates. He also discusses the general issues with the tests almost if not always used when modeling value-added (i.e., the state-level tests mandated as per No Child Left Behind in 2002).

He discusses why VAM estimates are least trustworthy, and most volatile and error prone, when used to compare teachers who work in very different schools with very different student populations – students who do not attend schools in randomized patterns and who are rarely if ever randomly assigned to classrooms. The issues with bias, as highlighted by Dr. Haertel and also in a recent VAMboozled! post with a link to a new research article here, are probably the most major VAM-related, problems/issues going. As captured in his words, “VAMs will not simply reward or penalize teachers according to how well or poorly they teach. They will also reward or penalize teachers according to which students they teach and which schools they teach in” (Haertel, 2013, p. 12-13).

He reiterates issues with reliability, or a lack thereof. As per one research study he cites, researchers found that “a minimum of 10% of the teachers in the bottom fifth of the distribution one year were in the top fifth the next year, and conversely. Typically, only about a third of 1 year’s top performers were in the top category again the following year, and likewise, only about a third of 1 year’s lowest performers were in the lowest category again the following year. These findings are typical [emphasis added]…[While a] few studies have found reliabilities around .5 or a little higher…this still says that only half the variation in these value-added estimates is signal, and the remainder is noise [and/or error, which makes VAM estimates entirely invalid about half of the time]” (Haertel, 2013, p. 18).

Dr. Haertel also discusses other correlations among VAM estimates and teacher observational scores, VAM estimates and student evaluation scores, and VAM estimates taken from the same teachers at the same time but using different tests, all of which also yield abysmally (and unfortunately) low correlations, similar to those mentioned above.

His bottom line? “VAMs are complicated, but not nearly so complicated as the reality they are intended to represent” (Haertel, 2013, p. 12). They just do not measure well what so many believe they measure so very well.

Again, to find out more reasons and more in-depth explanations as to why, click here for the full speech and subsequent paper.

Mr. T’s Scores on the DC Public Schools’ IMPACT Evaluation System

After our recent post regarding the DC Public Schools’ IMPACT Evaluation System, and Diane Ravitch’s follow-up, a DC teacher wrote to Diane expressing his concerns about his DC IMPACT evaluation scores, attaching the scores he recently received after his supervising administrator and a master educator observed the same 30-minute lesson he recently taught to the same class.

First, take a look at his scores summarized below. Please note that other supportive “evidence” (e.g., notes re: what was observed to support and warrant the scores below) was available, but for purposes of brevity and confidentiality this “evidence” is not included here.

As you can easily see, these two evaluators were very much NOT on the same evaluation page, again, when observing the same thing during the same time at the same instructional occasion.

Evaluative Criteria Definition Administrator Scores   (Mean Score = 1.44) Master Educator Scores (Mean Score = 3.11)
TEACH 1 Lead Well-Organized, Objective-Driven Lessons = 1 Ineffective = 4 Highly Effective
TEACH 2 Explain Content Clearly = 1 Ineffective = 3 Effective
TEACH 3 Engage Students at All Learning Levels in Rigorous Work = 1 Ineffective = 3 Effective
TEACH 4 Provide Students Multiple Ways to Engage with Content = 1 Ineffective = 3 Effective
TEACH 5 Check for Student Understanding = 2 Minimally Effective = 4 Highly Effective
TEACH 6 Respond to Student Understandings = 1 Ineffective = 3 Effective
TEACH 7 Develop Higher- Level Understanding through Effective Questioning = 1 Ineffective = 2 Minimally Effective
TEACH 8 Maximize Instructional Time = 2 Minimally Effective = 3 Effective
TEACH 9 Build a Supportive, Learning-Focused Classroom Community = 3 Effective = 3 Effective

Overall, Mr. T (an obvious pseudonym) received a 1.44 from his supervising administrator and a 3.11 from the master educator, with scores ranging from 1 = Ineffective to 4 = Highly Effective.

This is particularly important as illustrated in the prior post (Footnote 8 of the full piece to be exact), because “Teacher effectiveness ratings were based on, in order of importance by the proportion of weight assigned to each indicator [including first and foremost]: (1) scores derived via [this] district-created and purportedly “rigorous” (Dee & Wyckoff, 2013, p. 5) yet invalid (i.e., not having been validated) observational instrument with which teachers are observed five times per year by different folks, but about which no psychometric data were made available (e.g., Kappa statistics to test for inter-rater consistencies among scores).” For all DC teachers, this is THE observational system used, and for 83% of them these data are weighted at 75% of their total “worth” (Dee & Wyckoff, 2013, p. 10). This is precisely the system that is receiving (and gaining) praise, especially as it has thus far led to teacher bonuses (professedly up to $25,000 per year) as well as terminations of more than 500 teachers (≈ 8%) throughout DC’s Public Schools. Yet as evident here, again,this system has some fatal flaws and serious issues, despite its praised “rigor” (Dee & Wyckoff, 2013, p. 5).

See also ten representative comments taken from both the administrator’s evaluation form and the master educator’s evaluation form. Revealed here, as well, are MAJOR issues and discrepancies that should not occur in any “objective” and reliable” evaluation system, especially in one to which such major consequences are attached and that has been, accordingly, so “rigorously” praised (Dee & Wyckoff, 2013, p. 5).

Administrator’s Comments:
1. The objective was not posted nor verbally articulated during the observation… Students were asked what the objective was and they looked to the board but when they saw no objective.
2. There was limited evidence that students mastered the content based on the work they produced.
3. Explanations of content weren’t clear and coherent based on student responses and the level of attention that Mr. T had to pay to most students.
4. Students were observed using limited academic language throughout the observation.
5. The lesson was not accessible to students and therefore posed too much challenge based on their level of ability.
6. [T]here wasn’t an appropriate balance between teacher‐directed and student‐centered learning.
7. There was limited higher-level understanding developed based on verbal conferencing or work products that were created.
8. Through [checks for understanding] Mr. T was able to get the pulse of the class… however there was limited evidence that Mr. T understood the depth of student understanding.
9. There were many students that had misunderstandings based on student responses from putting their heads down to moving to others to talk instead of work.
10. Inappropriate behaviors occurred regularly within the classroom.

Master Educator’s Comments:
1. Mr. T was highly effective at leading well-organized, objective-driven lessons.
2. Mr. T’s explanations of content were clear and coherent, and they built student understanding of content.
3. All parts of Mr. T’s lesson significantly moved students towards mastery of the objective as evidenced by students.
4. Mr. T included learning styles that were appropriate to students needs and all students responded positively and were actively involved.
5. Mr. T’s explanations of content were clear and coherent, and they built student understanding of content.
6. Mr. T was effective at engaging students at all levels in accessible and challenging work.
7. Students had adequate opportunities to meaningfully practice, apply, and demonstrate what they are learning.
8. Mr. T always used appropriate strategies to ensure that students moved toward higher-level understanding.
9. Mr. T was effective at maximizing instructional time…Inappropriate or off-task student behavior never interrupted or delayed the lesson.
10. Mr. T was effective at building a supportive, learning-focused classroom community. Students were invested in their work and valued academic success.

In sum, as Mr. T wrote in his email to Diane, while he is “fortunate enough to have a teaching position that is not affected by VAM nonsense…that doesn’t mean [he’s] completely immune from a flawed system of evaluations.” This “supposedly ‘objective’ measure seems to be anything but.” Is the administrator correct whereas positioning Mr. T as ineffective? Or might it be, perhaps, the master educator was “just being too soft.” Either way, “it’s confusing and it’s giving [Mr. T.] some thought as to whether [he] should just spend the school day at [his] desk working on [his] resumé.”

Our thanks to Mr. T for sharing his DC data, and for sharing his story!

Why VAMs & Merit Pay Aren’t Fair

An “oldie” (i.e., published about one year ago), but a goodie! This one is already posted in the video gallery of this site, but it recently came up again as a good, short-at-three minutes, video version, that captures some of the main issues.
Check it out and share as (so) needed!

Six Reasons Why VAMs and Merit Pay Aren’t Fair