School-Level Bias in the PVAAS Model in Pennsylvania

Research for Action (click here for more about the organization and its mission) just released an analysis of the state of Pennsylvania’s school building rating system – the 100-point School Performance Profile (SPP) used throughout the state to categorize and rank public schools (including charters), although a school can go over 100-points by earning bonus points. The SPP is largely (40%) based on school-level value-added model (VAM) output derived via the state’s Pennsylvania Education Value-Added Assessment System (PVAAS), also generally known as the Education Value-Added Assessment System (EVAAS), also generally known as the VAM that is “expressly designed to control for out-of-school [biasing] factors.” This model should sound familiar to followers, as this is the model with which I am most familiar, on which I conduct most of my research, and widely considered one of “the most popular” and one of “the best” VAMs in town (see recent posts about this model here, here, and here).

Research for Action‘s report titled: “Pennsylvania’s School Performance Profile: Not the Sum of its Parts” details some serious issues with bias with the state’s SPP. That is, bias (a highly controversial issue covered in the research literature and also on this blog; see recent posts about bias here, here, and here), does also appear to exist in this state and particularly at the school-level for (1) subject areas less traditionally tested and, hence, not often consecutively tested (e.g., from one consecutive grade level to the next), and given (2) the state is combining growth measures with proficiency (i.e., “snapshot”) measures to evaluate schools, the latter being significantly negatively correlated with the populations of the students in the schools being evaluated.

First, “growth scores do not appear to be working as intended for writing and science,
the subjects that are not tested year-to-year.” The figure below “displays the correlation between poverty, PVAAS [growth scores], and proficiency scores in these subjects. The lines for both percent proficient and growth follow similar patterns. The correlation between poverty and proficiency rates is stronger, but PVAAS scores are also significantly and negatively correlated with poverty.”

FIgure 1

Whereas before we have covered on this blog how VAMs, and more importantly this VAM (i.e., the PVAAS, TVAAS, EVAAS, etc.), is also be biased by the subject areas teachers teach (see prior posts here and here), this is the first piece of evidence of which I am aware that illustrates that this VAM (and likely other VAMs) might also be biased, particularly for subject areas that are not consecutively tested (i.e., from one consecutive grade to the next) and subject areas that do not have pretest scores, when in this case VAM statisticians use other subject area pretest scores (e.g., from mathematics and English/language arts, as scores from these tests are likely correlated with the pretest scores that are missing) instead. This, of course, has implications for schools (as illustrated in these analyses) but also likely for teachers (as logically generalized from these findings).

Second, researchers found that because five of the six SPP indicators, accounting for 90% of a school’s base score (including an extra-credit portion) rely entirely on test scores, “this reliance on test scores, despite the partial use of growth measures, results in a school rating system that favors more advantaged schools.” The finding here is that using growth or value-added measures in conjunction with proficiency (i.e., “snapshot”) indicators also biases total output against schools serving more disadvantaged students. This is what this looks like, noting in the figures below that bias exists for both the proficiency and growth measures, although bias in the latter is less yet still evident.

FIgure 1

“The black line in each graph shows the correlation between poverty and the percent of students scoring proficient or above in math and reading; the red line displays the
relationship between poverty and growth in PVAAS. As displayed…the relationship between poverty measures and performance is much stronger for the proficiency indicator.”

“This analysis shows a very strong negative correlation between SPP and poverty in both years. In other words, as the percent of a school’s economically disadvantaged population increases, SPP scores decrease.” That is, school performance declines sharply as schools’ aggregate levels of student poverty increase, if and when states use combinations of growth and proficiency to evaluate schools. This is not surprising, but a good reminder of how much poverty is (co)related with achievement captured by test scores when growth is not considered.

Please read the full 16-page report as it includes much more important information about both of these key findings than described herein. Otherwise, well done Research for Action for “adding value,” for the lack of a better term, to the current literature surrounding VAM-based bias.

1 thought on “School-Level Bias in the PVAAS Model in Pennsylvania

  1. This is an incredibly interesting blog post because it truly shows both the advantages and disadvantages of the VAM program. The fact that research actually shows that there is an inherent bias within these ways of assessing teachers in and of itself is enlightening. The fact that tests are such a large portion of the way teachers are being assessed is not only unfair but also hindering to the education of children. Instead of learning about things that matter and will help them in the real world they are learning how best to take a test because if they don\\\’t perform well it will reflect negatively on the teacher. I especially liked the part of the blog that ties in the idea of disadvantage in schools and how these tests can show that poverty can disadvantage children resulting in poorer test performance when growth isn\\\’t considered. I think that they only truly effective and fair way to evaluate teachers would be to have a camera in the classroom when they are teaching that is used at random time points to asses the effectiveness of their teaching environment. I do understand however that this would be difficult to implement due to the high cost, especially in schools of lower SES where the tests are already showing such disadvantages. The problem of effectively evaluating teachers is a difficult one and there really isn\\\’t one clear solution.

Leave a Reply

Your email address will not be published. Required fields are marked *