The Arbitrariness Inherent in Teacher Observations

Please follow and like us:

In a recent article released in The Journal News, a newspaper serving many suburban New York counties, another common problem is highlighted whereby districts that have adopted the same teacher observational system (in this case as mandated by the state) are scoring what are likely to be very similar teachers very differently. Whereby teachers in one of the best school districts not only in the state but in the nation apparently has no “highly effective” teachers on staff, teachers in a neighboring district apparently have a staff 99% filled with “highly effective” teachers.

The “believed to be” model developer, Charlotte Danielson, is cited as stating that “Saying 99 percent of your teachers are highly effective is laughable.” I don’t know if I completely agree with her statement, and I do have to admit I question her perspective on this one, and all of her comments throughout this article for that matter, as she is the one who is purportedly offering up her “valid” Framework for Teaching for such observational purposes. Perhaps she’s displacing blame and arguing that it’s the subjectivity of the scorers rather than the subjectivity inherent in her system that should be to blame for the stark discrepancies.

As per Danielson: “The local administrators know who they are evaluating and are often influenced by personal bias…What it also means is that they might have set the standards too low.” As per the Superintendent of the District with 99% highly effective teachers: The state’s “flawed” evaluation model forced districts to “bump up” the scores so “effective” teachers wouldn’t end up with a rating of “developing.” The Superintendent adds that it is possible under the state’s system to be rated “effective” across domains and still end up rated as “developing,” which means teachers may be in need of intervention/improvement, or may be eligible for an expedited hearing process that could lead to their termination. Rather it may have been the case that the scores were inflated to save effective teachers from what the district viewed as an ineffective set of consequences attached to the observational system (i.e., intervention or termination).

Danielson is also cited as saying that “teachers should live in “effective” and only [occasionally] visit “highly effective.” She also notes that if her system contradicts teachers’ value-added scores, this too should “raise red flags” about the quality of the teacher, although she does not (in this article) pay any respect or regard for the issues not only inherent in value-added measures but also her observational system.

What is most important in this article, though, is that reading through it illustrates well the arbitrariness of how all of the measures being mandated and used to evaluate teachers are actually being used. Take, for example, the other note herein that the state department’s intent seems to be that 70%-80% percent of teachers should “fall in the middle” as “developing” or “effective.” While this is mathematically impossible (i.e., to have 70%-80% hang around average), this could not be more arbitrary.

In the end, teacher evaluation systems are highly flawed and highly subjective and highly prone to error and the like, and for people who just don’t “get it” to be passing policies on the contrary, is nonsensical and absurd. These flaws are not as important when evaluation system data can be used for formative, or informative purposes whereas data consumers have more freedom to take the data for what they are worth. When summary, or summative decisions are to be made as based on these data, regardless of whether low or high-stakes are attached to the decision, this is where things really go awry.

3 thoughts on “The Arbitrariness Inherent in Teacher Observations

    • Nope. I do not believe such a thing exists. I do have a sound alternative proposal though, that has sound statistics to support it. It serves as the core of chapter 8 of my recent book, but if you are interested I could send you just this chapter?

  1. Posted somewhere on Diane Ravitch’s website about a week ago, but of possible interest because it has quotes from email exchane with Charlotte Danielson.
    On June 15, 2013, after looking for research on the reliability and validity of the Danielson Framework, I sent the following email to For this post, I have edited the email to give you the gist.

    Almost every research study posted on your website (last entry 2011) fails to report the reliability and validity of the Framework for Teaching (FFT) for the full spectrum of K-12 education and in subjects for which there are not standardized state-wide tests and/or some variant of value-added scores (offered as if proof of validity). The majority of reliability studies I have examined, including those reported in the MET project, focus on observations of instruction in reading or ELA and mathematics, grades 2 to 8. The job assignments of about 70% of teachers do not fit this profile.
    Question 1. As a worker in arts education am I correct there is little or no subject-specific research on the reliability and validity of the FFT beyond ELA, math, and a few studies in science for high school?
    Question 2. Can you point me to any studies that systematically compare ratings of art teachers by persons who do, and do not, have credentials and experience in teaching art?
    Question 3. What guidance do you provide for evaluators who do not, in fact, have subject-matter knowledge and teaching experience in a subject sufficient to judge the accuracy, precision, and appropriateness of a teacher’s work in a classroom and overall practice?
    Question 4. Only a few students experience the within-grade and grade-to-grade continuity in arts instruction that the FFT seems to take-for-granted. Have you any prepared policy documents offering guidance for the evaluation of teachers with job assignments that place serious limits on their ability to perform at the peak levels that the FTT is supposed to honor?
    Question 5. Of the approximately 30,000 observers in 46 states that have been trained in some version of the FFT, can you provide information (or a good guesstimate) of the number and state-by-state distribution of observers with verified backgrounds in teaching elementary or secondary visual and/or media arts?
    Question 6. I have not had an opportunity to view the 19 hours of classroom teaching in your training videos. If you can direct me to any that focus on teaching in the visual arts, or any of the arts, I would appreciate knowing about these. This would exclude videos with the incidental use of art materials for teaching another subject.

    Surprise, surprise. Within 3 days I received a courteous if rambling 900 + word reply from Charlotte Danielson. Edited version here.

    She acknowledged my questions were “challenging.” “First, your general premise is correct: all the validation studies have consisted of correlations …with “value-added” gains on standardized tests…. This is a serious limitation, as I’ll freely admit, and until we have reliable assessments for a broader range of subjects, and for more students, we simply can’t rely on those data to be definitive.
    1. Yes, you’re correct. As far as I know, there are no validation studies beyond the ones you mentioned. There it now some interest (and I hope to participate in the effort) to validate the FFT for special needs populations, but that’s a different matter altogether, and would still involve reading, math, etc.
    2. The FFT is intended to apply to all disciplines, K-12. That is grounded in the simple fact that teaching, in whatever context, requires the same basic tasks, namely, knowing one’s subject, knowing one’s students, having clear outcomes, establishing a culture for learning, etc., etc…. But your larger point is a good one: I don’t know of any systematic studies looking at this issue.
    3. This – the expertise of observers – constitutes one of the enduring challenges in crafting reliable systems of teacher evaluation, and I don’t, frankly, have a satisfactory answer to it. And in truth, as you suggest, it’s primarily a state and local policy matter. But it’s one reason why in those districts who are able to afford them, subject supervisors can play an important role. While I recognize that it’s inadequate, it’s the best guidance I can offer.
    4. This question is also one that can only be addressed at the local level, and I sense your frustration! …But I must beg off here – I have really NO influence over these decisions, and I fear that it’s going to get worse before it gets better, until, that is, the basic skills testing mania has run its course.
    5. A good question, and I simply don’t know the answer to it. … I have no idea of the numbers (although I might be able to come up with it)but (sic)would involve going back through years of records. ..
    6. The online modules are, I’m sure, the ones I helped Teachscape develop, and again, I plead guilty – they are overwhelmingly in the “core” subjects, for the simple reason that it’s where most schools have to begin. ……The fft, as you know, tries to be discipline and level agnostic. But I know well that teaching is not; it’s highly specific! I’ve been contemplating for some time assembling some groups of experts in the different disciplines to advise me going forward, to create, not discipline and level specific frameworks, but versions with specific examples from different disciplines. I’d love to engage you in that work, if you have the interest.
    Again, thanks so much for writing, and I’d like to continue the “conversation.” Cordially, Charlotte Danielson

    I declined an offer to assist her on some of these issues, thanked her for the reply, and copied her reply with my original questions to officers of the National Arts Education Association to make them aware of the issues. There was no obvious concern, perhaps because the full force of teacher evaluations is just being felt. The current president is school principal.

Leave a Reply

Your email address will not be published. Required fields are marked *