Ed Fuller – Associate Professor of Educational Policy at The Pennsylvania State University (as also affiliated with this university’s Center for Evaluation and Education Policy Analysis [CEEPA]) recently released an externally-reviewed technical report that I believe you all might find of interest. He conducted “An Examination of Pennsylvania School Performance Profile Scores.”

He found that the Commonwealth of Pennsylvania’s School Performance Profile (SPP) scores, as defined by the Pennsylvania Department of Education (PDE) for each school as also 40% based on Pennsylvania Value-Added Assessment System (PVAAS, which is a derivative of the Education Value-Added Assessment System [EVAAS]), are strongly associated with student- and school-characteristics. Thus, he concludes, these measures are “inaccurate measures of school effectiveness. [Pennsylvania’s] SPP scores, in [fact], are more accurate indicators of the percentage of economically disadvantaged students in a school than of the effectiveness of a school.”

Take a look at Fuller’s Figure 1. In this figure is a scatterplot that illustrates that there is indeed a “strong relationship” or “strong (co)relationship” between the percent of economically disadvantaged students and elementary schools’ SPP scores. Put more simply, as the percentage of economically disadvantaged students goes up (moving from left to right on the x-axis), the SPP goes down (moving from top to bottom on the y-axis). The actual correlation coefficient illustrated, while not listed explicitly, is noted as being greater than *r = –*0.60.

Note also the very few outliers, or schools that defy the trend as is often assumed to occur when “we” more fairly measure growth, give all types of students a fair chance to demonstrate growth, level the playing field for all, and the like. Figures 2 and 3 in the same document illustrate essentially the same things, but for middle schools (*r *= 0.65) and high schools (*r *= 0.68). In these cases the outliers are even less rare.

When Fuller conducted a bit more complicated statistical analyses (i.e., regression analyses) that help to control or account for additional variables and factors than simple correlations themselves, he found still that “the vast majority of the differences in SPP scores across schools [were still] explained [better] by student- and school-characteristics that [were not and will likely never be] under the control of educators.”

Fuller’s conclusions? “If the scores accurately capture the true effectiveness of schools and the educators within schools, there should be only a weak or non-existent relationship between the SPP scores and the percentage of economically disadvantaged students.”

Hence: (1) “SPP scores should not be used as an indication of either school effectiveness or as a component of educator evaluations” because of the bias inherent within the system, of which PVAAS data are 40%. Bias in this case means that “teachers and principals in schools serving high percentages of economically disadvantaged students will be identified as less effective than they really are while those serving in schools with low percentages of economically disadvantaged students will be identified as more effective than in actuality.” (2) Because SPP scores are biased, and therefore inaccurate assessments of school effectiveness, “the use of the SPP scores in teacher and principal evaluations will lead to inaccurate judgments about teacher and principal effectiveness.” And (3) Using SPP scores in teacher and principal evaluations will create “additional incentive[s] for the most qualified and effective educators to seek employment in schools with high SPP scores. Thus, the use of SPP scores on educator evaluations will simply exacerbate the existing inequities in the distribution of educator quality across schools based on the characteristics of the students enrolled in the schools.”

Hi Vamboozlers,

Correlations and scatter plots like this should make more US people realize that treating the conditional modes (or school/teacher residuals or VAM estimates or school/teacher effectiveness estimates or whatever phrase one likes … the VAMs and effectiveness phrases are of course loaded) as measuring the causal impact of schools/teachers on students is problematic. As shown in a previous post on this blog this does not always work http://vamboozled.com/acts-dan-wright-on-a-simple-vam-experiment/). Here is an example relevant to this report.

Data are created for 100,000 students with 200 in each school. Each child varies in how “good” they are academically in school (variable called IndivChar) and each neighborhood varies in its influence on academic performance (called Neigh). I also created a “value-added” variable called VA, but for this example it isn’t influencing any of the other variables and is not influenced by any other variables. The previous test scores (or baseline scores) are created by adding IndivChar and Neigh together, plus some random error. Then the final test scores are the sum of IndivChar, Neigh, the previous test scores, and random error. Other more complex models can be used, but the main substantive conclusions remain. I used the freeware R and this is the code to create the data set (http://cran.r-project.org/):

set.seed(42)

n <- 100000

classes <- 200

schools <- rep(1:(n/classes),classes)

Neigh <- rep(rnorm(n/classes),classes)

IndivChar <- rnorm(n)

VA <- rnorm(n)

pre <- scale(Neigh + IndivChar + rnorm(n))

post <- scale(pre + IndivChar + Neigh + rnorm(n))

Then I let R calculated the “VAMs''.

library(lme4)

VAMs <- rep(ranef(lmer(post ~ pre + (1|schools)))$schools[[1]],classes)

The correlation between the VAMs and Neigh (found in R with cor.test(VAMs,Neigh)) is 0.98 meaning the VAMs variable is just measuring the Neighborhood effect (which also influenced the previous test scores). You can play with how the data are created to move this correlation around (and I could create data to mimic the correlations in the report if I had time). The point here is that we know that the correlation between Neigh and true value added should be ZERO (or close to it). It is not. The conditional modes that are found in the code above will only estimate true value-added in certain situations and there are various ways in which researchers could examine if these situational characteristics are at least plausible.