The technology section of The New York Times released an article yesterday called “Grading Teachers, With Data From Class.” It’s about using student-level survey data, or what students themselves have to say about the effectiveness of their teachers, to supplement (or perhaps trump) value-added and other test-based data when evaluating teacher effectiveness.
I recommend this article to you all in that it’s pretty much right on in terms of using “multiple measures” to measure pretty much anything educational these days, including teacher effectiveness. Likewise, such an approach aligns with the 2014 “Standards for Educational and Psychological Testing” measurement standards recently released by the leading professional organizations in the area of educational measurement, including the American Educational Research Association (AERA), American Psychological Association (APA), and National Council on Measurement in Education (NCME).
Some of the benefits of using student surveys to help measure teacher effectiveness:
- Student-level data based on such surveys typically yield data that are of more formative use to teachers than most other data, including data generated via value-added models (VAMs) and many observational systems.
- These data represent students’ perceptions and opinions. This is important as these data come directly from students in teachers’ classrooms, and students are the most direct “consumers” of (in)effective teaching.
- In this article in particular, the survey instrument described is open-source. This is definitely of “added value;” rare is it that products are offered to big (and small) money districts, more or less, for free.
- This helps with current issues of fairness, or the lack thereof (whereas only about 30% of current PreK-12 teachers can be evaluated using students’ test scores). Using survey data can apply to really all teachers, if all teachers agree that the more generalized items pertain to them and the subject areas they teach (e.g., physical education). One thing to note, however, is that there are typically issues that arise when using these survey data when the data are to come from young children. Our littlest ones are typically happy with most any teacher and do not really have the capacities to differentiate among teacher effectiveness items or sub-factors; hence, these data do not typically yield very useful data for either formative (informative) or summative (summary) purposes in the lowest grade levels. Whether student surveys are appropriate for students in such grades is highly questionable, accordingly.
Some things to consider and some major notes of caution when using student surveys to help measure teacher effectiveness:
- Response rates are always an issue when valid inferences are to be drawn from such survey data. Too often folks draw assertions and conclusions they believe to be valid from samples of respondents that are too small and not representative of the population, of in this case students, whom were initially solicited for their responses. Response rates cannot be overlooked; if response rates are inadequate this can and should void all data entirely.
- There is a rapidly growing market for student-level survey systems such as these, and some are rushing to satisfy the demand without conducting the research necessary to make the claims they are simultaneously marketing. Consumers need to make sure such survey instruments themselves (as well as the online/paper administration systems that often come along with them) are functioning appropriately, and accordingly yielding reliable, good, accurate, useful, etc. data. These instruments are very difficult to construct and validate, so serious attention should be paid to the actual research supporting marketers’ claims. Consumers should continue to ask for the research evidence, as such research is often incomplete or not done when tools are needed ASAP. District-level researchers should be more than capable of examining the evidence before any contracts are signed.
- Related, districts should not necessarily do this on their own. Not that district personnel are not capable, but as stated, validation research is a long, arduous, but also very necessary process. And typically, the instruments available (especially if for free) do a decent job capturing the general teacher effectiveness construct. This too can be debated, however (e.g., in terms of universal and/or too many items and halo effects).
- Many in higher education have experience with both developing and using student-level survey data, and much can be learned from the wealth of research and information on using such systems to evaluate college instructor/professor effectiveness. This research certainly applies here. Accordingly, there is much research about how such survey data can be gamed and manipulated by instructors (e.g., via the use of external incentives/disincentives), can be biased by respondent or student background variables (e.g., charisma, attractiveness, gender and race as compared to the gender and race of the teacher or instructor, grade expected or earned in the class, overall grade point average, perceived course difficulty or the lack thereof), and the like. These literature should be consulted, so that all users of such student-level survey data are aware of the potential pitfalls when using and consuming such output. Accordingly, this research can help future consumers be proactive in terms of ensuring, as best they can, that results might yield as valid inferences as possible.
- On that note, all educational measurements and measurement systems are imperfect. This is precisely why the standards of the profession call for “multiple measures” as with each multiple measure, the strengths of one hopefully help to offset the weaknesses of the others. This should yield a more holistic assessment of the construct of interest, which is in this case teacher effectiveness. However, the extent to which these data holistically capture teacher effectiveness, also needs to be continuously researched and assessed.
I hope this helps, and please do respond with comments if you all have anything else to add for the good of the group. I should also add that this is an incomplete list of both the strengths and drawbacks to such approaches; the aforementioned research literature, particularly as it represents 30+ years of using student-level surveys in higher education should be advised if more information is needed and desired.
Robert E. Haskell published several comprehensive articles in Education Policy Analysis Archives in 1997 that supported the reliability, validity, legality and ethics of student evaluation of faculty. His work is as valuable today as it was then. Here are the links:
I don’t think studies of faculty ratings given by college students provide much insight into the wisdom, ethics, and value of proctored administrations of student surveys, K-12. Surveys for K-12 are being marketed by about seven companies, and largely on publicity garnered for this measure from the Gates-funded Measures of Effective Teaching project. I looked at the seven constructs in the original survey for the MET project, constructed by economist Ron Ferguson. Four of the seven constructs seem to me biased toward sage-on-the-stage-teaching, giving and following up on homework, helicopter hovering, probing for rigor and the rest. I would love see some studies that take a serious look at the marketing of these surveys and the characteristics of teachers that students are being asked to rate. The effect of proctored administration on answers is another issue.
Dr Beardsley, Do you know of any states using Student Surveys as a quantitative measure?
I know of no state doing this statewide, yet, but there are states exploring this option. Unfortunately, the only “real” student survey instrument is coming from the Tripod (part of Gates MET studies, authored by a colleague of Tom Kane), but the psychometric values of the model have not yet been made transparent. This is another proprietary system, so that might have something to do with it. One of my goals in my professional life, actually, is to build and VALIDATE an open-source system that could be used, for free, by states/districts, also given doing such validation work is very important but not easily done or satisfied. In other words, it’s oft-difficult for states/districts to do this one their own and get “it” right; hence, I’d like to help…when I get the time! Hope that helps!