“Good news for schools using the Marzano framework,”…according to Marzano researchers. “One of the largest validation studies ever conducted on [the] observation framework shows that the Marzano model’s research-based structure is correlated with state VAMs.” See this claim, via the full report here.
The more specific claim is as follows: Study researchers found a “strong [emphasis added, see discussion forthcoming] correlation between Dr. Marzano’s nine Design Questions [within the model’s Domain 1] and increased student achievement on state math and reading scores.” All correlations were positive, with the highest correlation just below r = 0.4 and the lowest correlation just above r = 0.0. See the actual correlations illustrated here:
See also a standard model to categorize such correlations, albeit out of any particular context, below. Using this, one can see that the correlations observed were indeed small to moderate, but not “strong” as claimed. Elsewhere, as also cited in this report, other observed correlations from similar studies on the same model ranged from r = 0.13 to 0.15, r = 0.14 to 0.21, and r = 0.21 to 0.26. While these are also noted as statistically significant, using the table below one can determine that statistical significance does not necessarily mean that such “very weak” to “weak” correlations are of much practical significance, especially if and when high-stakes decisions about teachers and their effects are to be attached to such evidence.
Likewise, if such results (i.e., 0.0 < r < 0.4) sound familiar, they should, as a good number of researchers have set out to explore similar correlations in the past, using different value-added and observational data, and these researchers have also found similar zero-to-moderate (i.e., 0.0 < r < 0.4), but not (and dare I say never) “strong” correlations. See prior posts about such studies, for example, here, here, and here. See also the authors’ Endnote #1 in their report, again, here.
As the authors write: “When evaluating the validity of observation protocols, studies [sic] typically assess the correlations between teacher observation scores and their value-added scores.” This is true, but this is true only in that such correlations offer only one piece of validity evidence.
Validity, or rather evidencing that something from which inferences are drawn is in fact valid, is MUCH more complicated than simply running these types of correlations. Rather, the type of evidence that these authors are exploring is called convergent-related evidence of validity; however, for something to actually be deemed valid, MUCH more validity evidence is needed (e.g., content-, consequence-, predictive-, etc.- related evidence of validity). See, for example, some of the Educational Testing Service (ETS)’s Michael T. Kane’s work on validity here. See also The Standards for Educational and Psychological Testing developed by the American Educational Research Association (AERA), American Psychological Association (APA), and National Council on Measurement in Education (NCME) here.
Instead, in this report the authors write that “Small to moderate correlations permit researchers to claim that the framework is validated (Kane, Taylor, Tyler, & Wooten, 2010).” This is false. As well, this unfortunately demonstrates a very naive conception and unsophisticated treatment of validity. This is also illustrated in that the authors use one external citation, authored by Thomas Kane NOT the aforementioned validity expert Michael Kane, to allege that validity can be (and is being) claimed. Here is the actual Thomas Kane et al. article the Marzano authors reference to support their validity claim, also noting that nowhere in this piece do Thomas Kane et al. make this actual claim. In fact, a search for “small” or “moderate” correlations yields zero total hits.
In the end, what can be more fairly and appropriately asserted from this research report is that the Marzano model is indeed correlated with value-added estimates, and their correlation coefficients fall right in line with all other correlation coefficients evidenced via other current studies on this topic, again, whereby researchers have correlated multiple observational models with multiple value-added estimates. These correlations are small to moderate, and certainly not “strong,” and definitely not “strong” enough to warrant high-stakes decisions (e.g., teacher termination) given everything (i.e., the unexplained variance) that is still not captured among these multiple measures…and that still threatens the validity of the inferences to be drawn from these measures combined.