In Schools, Teacher Quality Matters Most

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

Education Next — a non peer-reviewed journal with a mission to “steer a steady course, presenting the facts as best they can be determined…[while]…partak[ing] of no program, campaign, or ideology,” although these last claims are certainly of controversy (see, for example, here and here) — just published an article titled “In Schools, Teacher Quality Matters Most” as part of the journal’s series commemorating the 50th anniversary of James Coleman’s (and colleagues’) groundbreaking 1966 report, “Equality of Educational Opportunity.”

For background, the purpose of The Coleman Report was to assess the equal educational opportunities provided to children of different race, color, religion, and national origin. The main finding was that what we know today as students of color (although African American students were of primary focus in this study), who are (still) often denied equal educational opportunities due to a variety of factors, are largely and unequally segregated across America’s public schools, especially as segregated from their white and wealthier peers. These disparities were most notable via achievement measures, and what we know today as “the achievement gap.” Accordingly, Coleman et al. argued that equal opportunities for students in said schools mattered (and continue to matter) much more for these traditionally marginalized and segregated students than for those who were/are whiter and more economically fortunate. In addition, Coleman argued that out-of-school influences also mattered much more than in-school influences on said achievement. On this point, though, The Coleman Report was of great controversy, and (mis)interpreted as (still) supporting arguments that students’ teachers and schools do/don’t matter as much as students’ families and backgrounds do.

Hence, the Education Next article of focus in this post takes this up, 50 years later, and post the advent of value-added models (VAMs) as better measures than those to which Coleman and his colleagues had access. The article is authored by Dan Goldhaber — a Professor at the University of Washington Bothell, Director of the National Center for Analysis of Longitudinal Data in Education Research (CALDER), and a Vice-President at the American Institutes of Research (AIR). AIR is one of our largest VAM consulting/contract firms, and Goldabher is, accordingly, perhaps one of the field’s most vocal proponents of VAMs and their capacities to both measure and increase teachers’ noteworthy effects (see, for example here); hence, it makes sense he writes about said teacher effects in this article, and in this particular journal (see, for example, Education Next’s Editorial and Editorial Advisory Board members here).

Here is his key claim.

Goldhaber argues that The Coleman Report’s “conclusions about the importance of teacher quality, in particular, have stood the test of time, which is noteworthy, [especially] given that today’s studies of the impacts of teachers [now] use more-sophisticated statistical methods and employ far better data” (i.e., VAMs). Accordingly, Goldhaber’s primary conclusion is that “the main way that schools affect student outcomes is through the quality of their teachers.”

Note that Goldhaber does not offer in this article much evidence, other than evidence not cited or provided by some of his econometric friends (e.g., Raj Chetty). Likewise, Goldhaber cites none of the literature coming from educational statistics, even though recent estimates [1] suggest that approximately 83% of articles written since 1893 (the year in which the first article about VAMs was ever published, in the Journal of Political Economy) on this topic have been published in educational journals, and 14% have been published in economics journals (3% have been published in education finance journals). Hence, what we are clearly observing as per the literature on this topic are severe slants in perspective, especially when articles such as these are written by econometricians, versus educational researchers and statisticians, who often marginalize the research of their education, discipline-based colleagues.

Likewise, Goldhaber does not cite or situate any of his claims within the recent report released by the American Statistical Association (ASA), in which it is written that “teachers account for about 1% to 14% of the variability in test scores.” While teacher effects do matter, they do not matter nearly as much as many, including many/most VAM proponents including Goldhaber, would like us to naively accept and believe. The truth of the matter is is that teachers do indeed matter, in many ways including their impacts on students’ affects, motivations, desires, aspirations, senses of efficacy, and the like, all of which are not estimated on the large-scale standardized tests that continue to matter and that are always the key dependent variables across these and all VAM-based studies today. As Coleman argued 50 years ago, as recently verified by the ASA, students’ out-of-school and out-of-classroom environments matter more, as per these dependent variables or measures.

I think I’ll take ASA’s “word” on this, also as per Coleman’s research 50 years prior.

*****

[1] Reference removed as the manuscript is currently under blind peer-review. Email me if you have any questions at audrey.beardsley@asu.edu

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

You Are Invited to Participate in the #HowMuchTesting Debate!

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

As the scholarly debate about the extent and purpose of educational testing rages on, the American Educational Research Association (AERA) wants to hear from you.  During a key session at its Centennial Conference this spring in Washington DC, titled How Much Testing and for What Purpose? Public Scholarship in the Debate about Educational Assessment and Accountability, prominent educational researchers will respond to questions and concerns raised by YOU, parents, students, teachers, community members, and public at large.

Hence, any and all of you with an interest in testing, value-added modeling, educational assessment, educational accountability policies, and the like are invited to post your questions, concerns, and comments using the hashtag #HowMuchTesting on Twitter, Facebook, Instagram, Google+, or the social media platform of your choice, as these are the posts to which AERA’s panelists will respond.

Organizers are interested in all #HowMuchTesting posts, but they are particularly interested in video-recorded questions and comments of 30 – 45 seconds in duration so that you can ask your own questions, rather than having it read by a moderator. In addition, in order to provide ample time for the panel of experts to prepare for the discussion, comments and questions posted by March 17 have the best chances for inclusion in the debate.

Thank you all in advance for your contributions!!

To read more about this session, from the session’s organizer, click here.

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

New York Teacher Sheri Lederman’s Lawsuit Update

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

Recall the New York lawsuit pertaining to Long Island teacher Sheri Lederman? The teacher who by all accounts other than her recent (2013-2014) 1 out of 20 growth score is a terrific 4th grade, 18 year veteran teacher. She, along with her attorney and husband Bruce Lederman, are suing the state of New York to challenge the state’s growth-based teacher evaluation system. See prior posts about Sheri’s case herehere and here. I, along with Linda Darling-Hammond (Stanford), Aaron Pallas (Columbia University Teachers College), Carol Burris (Executive Director of the Network for Public Education Foundation), Brad Lindell (Long Island Research Consultant), Sean Corcoran (New York University) and Jesse Rothstein (University of California – Berkeley) are serving as part of Sheri’s team.

Bruce Lederman just emailed me with an update, and some links re: this update (below), and he gave me permission to share all of this with you.

The judge hearing this case recently asked the lawyers on both sides of Sheri’s case to brief the court by the end of this month (February 29, 2016) on a new issue, positioned and pushed back into the court by the New York State Education Department (NYSED). The issue to be heard pertains to the state’s new “moratorium” or “emergency regulations” related to the state’s high-stakes use of its growth scores, all of which is likely related to the political reaction to the opt-out movement throughout the state of New York, the publicity pertaining to the Lederman lawsuit in and of itself, and the federal government’s adoption of the recent Every Student Succeeds Act (ESSA) given its specific provision that now permits states to decide whether (and if so how) to use teachers’ students’ test scores to hold teachers accountable for their levels of growth (in New York) or value-added.

While the federal government did not abolish such practices via its ESSA, the federal government did hand back to the states all power and authority over this matter. Accordingly, this does not mean growth models/VAMs are going to simply disappear, as states do still have the power and authority to move forward with their prior and/or their new teacher evaluation systems, based in small or large part, on growth models/VAMs. As also quite evident since President Obama’s signing of the ESSA, some states are continuing to move forward in this regard, and regardless of the ESSA, in some cases at even higher speeds than before, in support of what some state policymakers still apparently believe (despite the research) are the accountability measures that will still help them to (symbolically) support educational reform in their states. See, for example, prior posts about the state of Alabama, here, New Mexico, here, and Texas, here, which is still moving forward with its plans introduced pre-ESSA. See prior posts about New York here, here, and here, the state in which also just one year ago Governor Cuomo was promoting increased use of New York’s growth model and publicly proclaiming that it was “baloney” that more teachers were not being found “ineffective,” after which Cuomo pushed through the New York budget process amendments to the law increasing the weight of teachers’ growth scores to an approximate 50% weight in many cases.

Nonetheless, as per this case in New York, state Attorney General Eric Schneiderman, on behalf of the NYSED, offered to settle this lawsuit out of court by giving Sheri some accommodation on her aforementioned 2013-2014 score of 1 out of 20, if Sheri and Bruce dropped the challenge to the state’s VAM-based teacher evaluation system. Sheri and Bruce declined, for a number or reasons, including that under the state’s recent “moratorium,” the state’s growth model is still set to be used throughout the state of New York for the next four years, with teachers’ annual performance reviews based in part on growth scores reported to parents, newspapers (on an aggregate basis), and the like. While, again, high-stakes are not to be attached to the growth output for four years, the scores will still “count.”

Hence, Sheri and Bruce believe that because they have already “convincingly” shown that the state’s growth model does not “rationally” work for teacher evaluation purposes, and that teacher evaluations as based on the state’s growth model actually violate state law since teachers like Sheri are not capable of getting perfect scores (which is “irrational”), they will continue with this case, also on behalf of New York teachers and principals who are “demoralized” by the system, as well as New York taxpayers who are paying (millions “if not tens of millions of dollars” for the system’s (highly) unreliable and inaccurate results.

As per Bruce’s email: “Spending the next 4 years studying a broken system is a terrible idea and terrible waste of taxpayer $$s. Also, if [NYSED] recognizes that Sheri’s 2013-14 score of 1 out of 20 is wrong [which they apparently recognize given their offer to settle this suit out of court], it’s sad and frustrating that [NYSED] still wants to fight her score unless she drops her challenge to the evaluation system in general.”

“We believe our case is already responsible for the new administrative appeal process in NY, and also partly responsible for Governor Cuomos’ apparent reversal on his stand about teacher evaluations. However, at this point we will not settle and allow important issues to be brushed under the carpet. Sheri and I are committed to pressing ahead with our case.”

To read more about this case via a Politico New York article click here (registration required). To hear more from Bruce Lederman about this case via WCNY-TV, Syracuse, click here. The pertinent section of this interview starts at 22:00 minutes and ends at 36:21. It’s well worth listening!

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

New Mexico to Change its Teacher Evaluation System, But Not Really

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

As you all likely recall, the American Federation of Teachers (AFT), joined by the Albuquerque Teachers Federation (ATF), last year, filed a “Lawsuit in New Mexico Challenging [the] State’s Teacher Evaluation System.” In December 2015, state District Judge David K. Thomson granted a preliminary injunction preventing consequences from being attached to the state’s teacher evaluation data. More specifically, Judge Thomson ruled that the state can proceed with “developing” and “improving” its teacher evaluation system, but the state is not to make any consequential decisions about New Mexico’s teachers using the data the state collects until the state (and/or others external to the state) can evidence to the court that the system is reliable, valid, fair, uniform, and the like (see prior post on this ruling here).

Late Friday afternoon, New Mexico’s Public Education Department (PED) announced that they are accordingly changing their NMTEACH teacher evaluation system, and they will be issuing new regulations. Their primary goal is as follows: To (1) “Address major liabilities resulting from litigation” as these liabilities specifically pertain to the former NMTEACH system’s (a) Uniformity, (b) Transparency, and (c) Clarity. On the surface level, this is gratifying to the extent that the state is attempting to, at least theoretically, please the court. But we, and especially those in New Mexico, might refrain from celebrating too soon…given when one reads the PED announcement here, one will see this is yet another example of the state’s futile attempts to keep with a very top-down teacher evaluation system. Note, however, that a uniform teacher evaluation system is also required under state law, although the governor has the right to change state statute should those at the state (including the governor, state superintendent, and PED) decide to eventually work with local districts and schools regarding the construction of a better teacher evaluation system for the state.

As per the PED’s subsequent goals, accordingly, things do not look much different than what they did in the past, especially given why and what got the state involved in such litigation in the first place. While the state also intends to (2) Simplify processes for districts/charters and also for the PED, and this is more or less fair, the state is also to (3) Establish a timeline for providing to districts and schools more current data in that currently such data are delayed by one school year, and these data are (still) needed for the state’s Pay for Performance plans (which was considered one high-stakes consequence at issue in Judge Thompson’s ruling). A tertiary goal is also to deliver in a more timely fashion such data to teacher preparation programs, which is something also of great controversy, as (uninformed) policymakers also continue to believe that colleges of education should also be held accountable for the test scores of their graduates’ students (see why this is problematic, for example, here). In the state’s final expressed goal, they make it explicit that (4) “Moving the timeline enhances the understanding that this system isn’t being used for termination decisions.” While this is certainly good, at least for now, the performance pay program is still something that is of concern. As is the state’s continued attempts to (still) use students’ test scores to evaluate teachers, and the state’s perpetual beliefs that the data errors also exposed by the lawsuit were the fault of the school districts, not the state, which Judge Thomson also noted.

Regardless, here is the state’s “Legal Rationale,” and here is also where things go a bit more awry. As re-positioned by the state/PED, they write that “the NEA and AFT recently advanced lawsuits set on eliminating any meaningful teacher evaluation [emphasis added to highlight the language that state is using to distort the genuine purposes of these lawsuits]. These lawsuits have exposed that the flexibility provided to local authorities has created confusion and complexity. Judge Thomson used this complexity when granting an injunction in the AFT case—citing a confusing array of classifications, tags, assessments, graduated considerations, etc. Judge Thomson made clear that he views this local authority as a threat to the statutorily required uniformity of the system [emphasis added given Judge Thompson said nothing of this sort, in terms of devaluing local authority or control, but rather, he emphasized the state’s menu of options was arbitrary and not uniform, especially given the consequences the state was requiring districts to enforce].” This, again, pertains to what is written in the current state statute in terms of a uniform teacher evaluation system.

Accordingly, and unfortunately, the state’s proposed changes would: “Provide a single plan that all districts and charters would use, providing greater uniformity,” and “Simplify the model from 107 possible classifications to three.” See three other moves detailed in the PED announcement here (e.g., moving data delivery dates, eliminating all but three tests, and the fall 2016 date which all of this is to become official).

Related, see a visual of what the state’s “new and improved” teacher evaluation system, in response to said litigation, is to look like. Unfortunately, again, it really does not look much different than it did prior except, perhaps, in the proposed reductions of testing options. See also the full document from which all of this came here.

Screen Shot 2016-01-30 at 10.20.01 AM

Nonetheless, we will have to wait to see if this, again, will please the court, and Judge Thompson’s ruling that the state is not to make any consequential decisions about New Mexico’s teachers using the data the state collects until the state (and/or others external to the state) can evidence to the court that the system is reliable, valid, etc.

And as for what the President of the American Federation of Teachers (AFT) New Mexico – Stephanie Biondo-Ly – had to say in response, see her press release below. See also an article in the Las Cruces – Sun Times here, in which President Ly is cited as “denounc[ing] the changes and call[ing] them attempts to obscure deficiencies in the [state’s] evaluation system.” From her original press release, she also wrote: “We are troubled…that once again, these changes are being implemented from the top down and if the secretary [Hanna Skandera] and her [PED] staff were serious about improving student outcomes and producing a fair evaluation system, they would have involved teachers, principals, and superintendents in the process.”

here

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

Report on the Stability of Student Growth Percentile (SGP) “Value-Added” Estimates

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

The Student Growth Percentiles (SGPs) model, which is loosely defined by value-added model (VAM) purists as a VAM, uses students’ level(s) of past performance to determine students’ normative growth over time, as compared to his/her peers. “SGPs describe the relative location of a student’s current score compared to the current scores of students with similar score histories” (Castellano & Ho, p. 89). Students are compared to themselves (i.e., students serve as their own controls) over time; therefore, the need to control for other variables (e.g., student demographics) is less necessary, although this is of debate. Nonetheless, the SGP model was developed as a “better” alternative to existing models, with the goal of providing clearer, more accessible, and more understandable results to both internal and external education stakeholders and consumers. For more information about the SGP please see prior posts here and here. See also an original source about the SGP here.

Related, in a study released last week, WestEd researchers conducted an “Analysis of the stability of teacher-level growth scores [derived] from the student growth percentile [SGP] model” in one, large school district in Nevada (n=370 teachers). The key finding they present is that “half or more of the variance in teacher scores from the [SGP] model is due to random or otherwise unstable sources rather than to reliable information that could predict future performance. Even when derived by averaging several years of teacher scores, effectiveness estimates are unlikely to provide a level of reliability desired in scores used for high-stakes decisions, such as tenure or dismissal. Thus, states may want to be cautious in using student growth percentile [SGP] scores for teacher evaluation.”

Most importantly, the evidence in this study should make us (continue to) question the extent to which “the learning of a teacher’s students in one year will [consistently] predict the learning of the teacher’s future students.” This is counter to the claims continuously made by VAM proponents, including folks like Thomas Kane — economics professor from Harvard University who directed the $45 million worth of Measures of Effective Teaching (MET) studies for the Bill & Melinda Gates Foundation. While faint signals of what we call predictive validity might be observed across VAMs, what folks like Kane overlook or avoid is that very often these faint signals do not remain constant over time. Accordingly, the extent to which we can make stable predictions is limited.

Worse is when folks falsely assume that said predictions will remain constant over time, and they make high-stakes decisions about teachers unaware of the lack of stability present, in typically 25-59% of teachers’ value-added (or in this case SGP) scores (estimates vary by study and by analyses using one to three years of data — see, for example, the studies detailed in Appendix A of this report; see also other research on this topic here, here, and here). Nonetheless, researchers in this study found that in mathematics, 50% of the variance in teachers’ value-added scores were attributable to differences among teachers, and the other 50% was random or unstable. In reading, 41% of the variance in teachers’ value-added scores were attributable to differences among teachers, and the other 59% was random or unstable.

In addition, using a 95% confidence interval (which is very common in educational statistics) researchers found that in mathematics, a teacher’s true score would span 48 points, “a margin of error that covers nearly half the 100 point score scale,” whereby “one would be 95 percent confident that the true math score of a teacher who received a score of 50 [would actually fall] between 26 and 74.” For reading, a teacher’s true score would span 44 points, whereby one would be 95 percent confident that the true reading score of a teacher who received a score of 50 would actually fall between 38 and 72. The stability of these scores would increase with three years of data, which has also been found by other researchers on this topic. However, they too have found that such error rates persist to an extent that still prohibits high-stakes decision making.

In more practical terms, what this also means is that a teacher who might be considered highly ineffective might be terminated, even though the following year (s)he could have been observed to be highly effective. Inversely, teachers who are awarded tenure might be observed as ineffective one, two, and/or three years following, not because their true level(s) of effectiveness change, but because of the error in the estimates that causes such instabilities to occur. Hence, examinations of the the stability of such estimates over time provides essential evidence of the validity, and in this case predictive validity, of the interpretations and uses of such scores over time. This is particularly pertinent when high-stakes decisions are to be based on (or in large part on) such scores, especially given some researchers are calling for reliability coefficients of .85 or higher to make such decisions (Haertel, 2013; Wasserman & Bracken, 2003).

In the end, researchers’ overall conclusion is that SGP-derived “growth scores alone may not be sufficiently stable to support high-stakes decisions.” Likewise, relying on the extant research on this topic, the overall conclusion can be broadened in that neither SGP- or VAM-based growth scores may be sufficiently stable to support high-stakes decisions. In other words, it is not just the SGP model that is yielding such issues with stability (or a lack thereof). Again, see the other literature in which researchers situated their findings in Appendix A. See also other similar studies here, here, and here.

Accordingly, those who read this report, and consequently seek to find a better or more stable model that yields more stable estimates, will unfortunately but likely fail in their search.

References:

Castellano, K. E., & Ho, A. D. (2013). A practitioner’s guide to growth models. Washington, DC: Council of Chief State School Officers.

Haertel, E. H. (2013). Reliability and validity of inferences about teachers based on student test scores (14th William H. Angoff Memorial Lecture). Princeton, NJ: Educational Testing Service (ETS).

Lash, A., Makkonen, R., Tran, L., & Huang, M. (2016). Analysis of the stability of teacher-level growth scores [derived] from the student growth percentile [SGP] model. (16–104). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory West.

Wasserman, J. D., & Bracken, B. A. (2003). Psychometric characteristics of assessment procedures. In I. B. Weiner, J. R. Graham, & J. A. Naglieri (Eds.), Handbook of psychology:
Assessment psychology (pp. 43–66). Hoboken, NJ: John Wiley & Sons.

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

Special Issue of “Educational Researcher” (Paper #7 of 9): VAMs Situated in Appropriate Ecologies

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

Recall that the peer-reviewed journal Educational Researcher (ER) – recently published a “Special Issue” including nine articles examining value-added measures (VAMs). I have reviewed the next of nine articles (#7 of 9), which is actually a commentary titled “The Value in Value-Added Depends on the Ecology.” This commentary is authored by Henry Braun – Professor of Education and Public Policy, Educational Research, Measurement, and Evaluation at Boston College (also the author of a previous post on this site here).

In this article Braun, importantly, makes explicit the assumptions on which this special issue of ER is based; that is, on assumptions that (1) too many students in America’s public schools are being inadequately educated, (2) evaluation systems as they currently exist “require radical overhaul,” and (3) it is therefore essential to use student test performance with low- and high-stakes attached to improve that which educators do (or don’t do) to adequately address the first assumption. There are counterarguments Braun also offers to readers on each of these assumptions (see p. 127), but more importantly he makes evident that the focus of this special issue is situated otherwise, as in line with current education policies. This special issue, overall, then “raise[s] important questions regarding the potential for high-stakes, test-driven educator accountability systems to contribute to raising student achievement” (p. 127).

Given this context, the “value-added” provided within this special issue, again according to Braun, is that the authors of each of the five main research articles included report on how VAM output actually plays out in practice, given “careful consideration to how the design and implementation of teacher evaluation systems could be modified to enhance the [purportedly, see comments above] positive impact of accountability and mitigate the negative consequences” at the same time (p. 127). In other words, if we more or less agree to the aforementioned assumptions, also given the educational policy context influence, perpetuating, or actually forcing these assumptions, these articles should help others better understand VAMs’ and observational systems’ potentials and perils in practice.

At the same time, Braun encourages us to note that “[t]he general consensus is that a set of VAM scores does contain some useful information that meaningfully differentiates among teachers, especially in the tails of the distribution [although I would argue bias has a role here]. However, individual VAM scores do suffer from high variance and low year-to-year stability as well as an undetermined amount of bias [which may be greater in the tails of the distribution]. Consequently, if VAM scores are to be used for evaluation, they should not be given inordinate weight and certainly not treated as the “gold standard” to which all other indicators must be compared” (p. 128).

Likewise, it’s important to note that IF consequences are to be attached to said indicators of teacher evaluation (i.e., VAM and observational data), there should be validity evidence made available and transparent to warrant the inferences and decisions to be made, and the validity evidence “should strongly support a causal [emphasis added] argument” (p. 128). However, both indicators still face major “difficulties in establishing defensible causal linkage[s]” as theorized, and desired (p. 128); hence, this prevents validity in inference. What does not help, either, is when VAM scores are given precedence over other indicators OR when principals align teachers’ observational scores with the same teachers’ VAM scores given the precedence often given to (what are often viewed as the superior, more objective) VAM-based measures. This sometimes occurs given external pressures (e.g., applied by superintendents) to artificially conflate, in this case, levels of agreement between indicators (i.e., convergent validity).

Related, in the section Braun titles his “Trio of Tensions,” (p. 129) he notes that (1) [B]oth accountability and improvement are undermined, as attested to by a number of the articles in this issue. In the current political and economic climate, [if possible] it will take thoughtful and inspiring leadership at the state and district levels to create contexts in which an educator evaluation system constructively fulfills its roles with respect to both public accountability and school improvement” (p. 129-130); (2) [T]he chasm between the technical sophistication of the various VAM[s] and the ability of educators to appreciate what these models are attempting to accomplish…sow[s] further confusion…[hence]…there must be ongoing efforts to convey to various audiences the essential issues—even in the face of principled disagreements among experts on the appropriate roles(s) for VAM[s] in educator evaluations” (p. 130); and finally (3) [H]ow to balance the rights of students to an adequate education and the rights of teachers to fair evaluations and due process [especially for]…teachers who have value-added scores and those who teach in subject-grade combinations for which value-added scores are not feasible…[must be addressed; this] comparability issue…has not been addressed but [it] will likely [continue to] rear its [ugly] head” (p. 130).

In the end, Braun argues for another “Trio,” but this one including three final lessons: (1) “although the concerns regarding the technical properties of VAM scores are not misplaced, they are not necessarily central to their reputation among teachers and principals. [What is central is]…their links to tests of dubious quality, their opaqueness in an atmosphere marked by (mutual) distrust, and the apparent lack of actionable information that are largely responsible for their poor reception” (p. 130); (2) there is a “very substantial, multiyear effort required for proper implementation of a new evaluation system…[related, observational] ratings are not a panacea. They, too, suffer from technical deficiencies and are the object of concern among some teachers because of worries about bias” (p. 130); and (3) “legislators and policymakers should move toward a more ecological approach [emphasis added; see also the Review of Article (Essay) #6 – on VAMs as tools for “egg-crate” schools here] to the design of accountability systems; that is, “one that takes into account the educational and political context for evaluation, the behavioral responses and other dynamics that are set in motion when a new regime of high-stakes accountability is instituted, and the long-term consequences of operating the system” (p. 130).

*****

If interested, see the Review of Article #1 – the introduction to the special issue here; see the Review of Article #2 – on VAMs’ measurement errors, issues with retroactive revisions, and (more) problems with using standardized tests in VAMs here; see the Review of Article #3 – on VAMs’ potentials here; see the Review of Article #4 – on observational systems’ potentials here; see the Review of Article #5 – on teachers’ perceptions of observations and student growth here; and see the Review of Article (Essay) #6 – on VAMs as tools for “egg-crate” schools here.

Article #7 Reference: Braun, H. (2015). The value in value-added depends on the ecology. Educational Researcher, 44(2), 127-131. doi:10.3102/0013189X15576341

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

How Measurement Fails Doctors and Teachers: NY Times Op-ed

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

In case you missed it, click here for the full op-ed “How Measurement Fails Doctors and Teachers” piece published in The New York Times on Saturday.

It’s well worth the read, especially given the comparisons that author, Robert M. Wachter – MD, Professor and Interim Chair of the Department of Medicine at the University of California, San Francisco – makes between medicine and education, in terms of how measurement systems in many ways have worked to hurt, not help improve, both professions.

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

Is Alabama the New, New Mexico?

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

In Alabama, the Grand Old Party (GOP) has put forth a draft bill to be entitled as an act and ultimately called the Rewarding Advancement in Instruction and Student Excellence (RAISE) Act of 2016. The purpose of the act will be to…wait for it…use test scores to grade and pay teachers annual bonuses (i.e., “supplements”) as per their performance. More specifically, the bill is to “provide a procedure for observing and evaluating teachers” to help make “significant differentiation[s] in pay, retention, promotion, dismissals, and other staffing decisions, including transfers, placements, and preferences in the event of reductions in force, [as] primarily [based] on evaluation results.” Related, Alabama districts may no longer use teachers’ “seniority, degrees, or credentials as a basis for determining pay or making the retention, promotion, dismissal, and staffing decisions.” Genius!

Accordingly, Larry Lee whose blog is based on the foundation that “education is everyone’s business,” sent me this bill to review, and critique, and help make everyone’s business. I attach it here for others who are interested, but I also summarize and critique it’s most relevant (but also contemptible) issues below.

For the Alabama teachers who are eligible, they are (after a staggered period of time) to be primarily evaluated (i.e., for up to 45% of a teacher’s total evaluation score) on the extent to which they purportedly cause student growth in achievement, with student growth being defined as the teachers’ purported impacts on “[t]he change in achievement for an individual student between two or more points in time.” Teachers are also to be observed at least twice per year (i.e., for up to 45% of a teacher’s total evaluation score), by their appropriate and appropriately trained evaluators/supervisors, and an unnamed and undefined set of parent and student surveys are to be used to evaluate the teachers (i.e., up to 15% of a teacher’s total evaluation score).

Again, no real surprises here as the adoption of such measures is common among states like Alabama (and New Mexico), but when these components are explained in more detail is where things really go awry.

“For grade levels and subjects for which student standardized assessment data is not available and for teachers for whom student standardized assessment data is not available, the [state’s] department [of education] shall establish a list of preapproved options for governing boards to utilize to measure student growth.” This is precisely what has gotten the whole state of New Mexico wrapped up in, and currently losing their ongoing lawsuit (see my most recent post on this here). While providing districts with menus of preapproved assessment options might make sense to policymakers, any self respecting researcher or even assessment commoner should know why this is entirely inappropriate. To read more about this, the best research study explaining why doing just this will set any state up for lawsuits comes from Brown University’s John Papay in his highly esteemed and highly cited “Different tests, different answers: The stability of teacher value-added estimates across outcome measures” article. The title of this research article alone should explain enough why simply positioning and offering up such tests in such casual (and quite careless) ways makes way for legal recourse.

Otherwise, the only test mentioned that is also to be used to measure teachers’ purported impacts on student growth is the ACT Aspire – the ACT test corporation’s “college and career readiness” test that is aligned to and connected with their more familiar college-entrance ACT. This, too, was one of the sources of the aforementioned lawsuit in New Mexico in terms of what we call content validity, in that states cannot simply pull in tests that are not adequately aligned with a state’s curriculum (e.g., I could find no information about the alignment of the ACT Aspire to Alabama’s curriculum here, which is also highly problematic as this information should definitely be available) and that have not been validated for such purposes (i.e., to measure teachers’ impacts on student growth).

Regardless of the tests, however, all of the secondary measures to be used to evaluate Alabama teachers (e.g., student and parent survey scores, observational scores) are also to be “correlated with impacts on student achievement results.” We’ve also increasingly seen this becoming the case across the nation, whereas state/district leaders are not simply assessing whether these indicators are independently correlated, which they should be if they all, in fact, help to measure our construct of interest = teacher effectiveness, but state/district leaders are rather manufacturing and forcing these correlations via what I have termed “artificial conflation” strategies (see also a recent post here about how this is one of the fundamental and critical points of litigation in Houston).

The state is apparently also set on going “all in” on evaluating their principals in many of the same ways, although I did not critique those sections for this particular post.

Most importantly, though, for those of you who have access to such leaders in Alabama, do send them this post so they might be a bit more proactive, and appropriately more careful and cautious, before going down this poor educational policy path. While I do embrace my professional responsibility as a public scholar to be called to court to testify about all of this when such high-stakes consequences are ultimately, yet inappropriately based upon invalid inferences, I’d much rather be proactive in this regard and save states and states’ taxpayers their time and money, respectively.

Accordingly, I see the state is also to put out a request for proposals to retain an external contractor to help them measure said student growth and teachers’ purported impacts on it. I would also be more than happy to help the state negotiate this contract, much more wisely than so many other states and districts have negotiated similar contracts thus far (e.g., without asking for reliability and validity evidence as a contractual deliverable)…should this poor educational policy actually come to fruition.

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

Deep Pockets, Corporate Reform, and Teacher Education

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

A colleague whom I have never formally met, but with whom I’ve had some interesting email exchanges with over the past few months — James D. Kirylo, Professor of Teaching and Learning in Louisiana — recently sent me an email I read, and appreciated; hence, I asked him to turn it into a blog post. He responded with a guest post he has titled “Deep Pockets, Corporate Reform, and Teacher Education,” pasted below. Do give this a read, and a social media share, as this one is deserving of some legs.

Here is what he wrote:

Money is power. Money is influence. Money shapes direction. Notwithstanding the influential nature of it in the electoral process, one only needs to see how bags of dough from the mega-rich-one-percenters—largely led by Bill Gates—have bought their way in their attempt to corporatize K-12 education (see, for example, here).  

This corporatization works to defund public education, grossly blames teachers for all that ails society, is obsessed with testing, and aims to privatize.  And next on the corporatized docket: teacher education programs.

In a recent piece by Valerie Strauss, “Gates Foundation Puts Millions of Dollars into New Education Focus: Teacher Preparation,” she sketches how Gates is awarding $35 million to a three-year project called Teacher Preparation Transformation Centers funneled through five different projects, one of which is the Texas Tech based University-School Partnerships for the Renewal of Educator Preparation (U.S. Prep) National Center.

A framework that will guide this “renewal” of educator preparation comes from the National Institute for Excellence in Teaching (NIET), along with the peddling of their programs, The System for Teacher and Student Advancement (TAP) and Student and Best Practices Center (BPC). Yet, again, coming from another guy with oodles of money, leading the charge of NIET is Lowell Milken who is Chairmen and TAP founder (see, for example, here).

The state of Louisiana serves as an example on how NIET is already working overtime in chipping its way into K-12 education. One can spend hours at the Louisiana Department of Education (LDE) website and view the various links on how TAP is applying a full-court-press in hyping its brand (see, for example, here).  

And now that TAP has entered the K-12 door in Louisiana, the brand is now squiggling its way into teacher education preparation programs, namely through the Texas Tech based U.S. Prep National Center. This Gates Foundation backed project involves five teacher education programs in the country (Southern Methodist University, University of Houston, Jackson State University, and the University of Memphis, including one in Louisiana (Southeastern Louisiana University) (see more information about this here).  

Therefore, teacher educators must be “trained” to use TAP in order to “rightly” inculcate the prescription to teacher candidates.

TAP: Four Elements of Success

TAP principally plugs four Elements of Success: Multiple Career Paths (for educators as career, mentor and master teachers); Ongoing Applied Professional Growth (through weekly cluster meetings, follow-up support in the classroom, and coaching); Instructionally Focused Accountability (through multiple classroom observations and evaluations utilizing a research based instrument and rubric that identified effective teaching practices); and, Performance-Based Compensation (based on multiple; measures of performance, including student achievement gains and teachers’ instructional practices).

And according to the TAP literature, the elements of success “…were developed based upon scientific research, as well as best practices from the fields of education, business, and management” (see, for example, here). Recall, perhaps, that No Child Left Behind (NCLB) was also based on “scientific-based” research. Enough said. It is also interesting to note their use of the words “business” and “management” when referring to educating our children. Regardless, “The ultimate goal of TAP is to raise student achievement” so students will presumably be better equipped to compete in the global society (see, for example, here). 

While each element is worthy of discussion, a brief comment is in order on the first element Multiple Career Paths and fourth element, Performance-Based Compensation. Regarding the former, TAP has created a mini-hierarchy within already-hierarchical school systems (which most are) in identifying three potential sets of teachers, to reiterate from the above: a “career” teacher; a “mentor” teacher, and a “master” teacher. A “career” teacher as opposed to what? As opposed to a “temporary” teacher, a Teach For America (TFA) teacher, a substitute teacher? But, of course, according to TAP, as opposed to a “mentor” teacher and a “master” teacher.

This certainly begs the question: Why in the world would any parent want their child to be taught by a “career” teacher as opposed to a “mentor” teacher or better yet a “master” teacher? Wouldn’t we want “master” teachers in all our classrooms? To analogize, I would rather have a “master” doctor performing heart surgery on me than a “lowly” career doctor. Indeed, words, language, and concepts matter.

With respect to the latter, the notion of having an ultimate goal on raising student achievement is perhaps more than euphemistic on raising test scores, cultivating a test-centric way of doing things.

Achievement and VAM

That is, instead of focusing on learning, opportunity, developmentally appropriate practices, and falling in love with learning, “achievement” is the goal of TAP. Make no mistake, this is far from an argument on semantics. And this “achievement” linked to student growth to merit pay relies heavily on a VAM-aligned rubric.

Yet, there are multiple problems with VAM, an instrument that has been used in K-12 education since 2011. Among many other outstanding sources, one may simply want to check out this cleverly called blog here, “VAMboozled,” or see what Diane Ravitch has said about VAMs (among other places, see, for example, here), not to mention the well-visited site produced by Mercedes Schneider here. Finally, see the 2015 position statement issued by the American Educational Research Association (AERA) regarding VAMs here, as well as a similar statement issued by the American Statistical Association (ASA) here

Back to the Gates Foundation and the Texas Tech based (U.S. Prep) National Center, though. To restate, at the aforementioned university in Louisiana (though likely in the other four recruited institutions, as well), TAP will be the chief vehicle that will drive this process, and teacher education programs will be used as the host to prop the brand.

With presumably some very smart, well-educated, talented, and experienced professionals at respective teacher education sites, how is it possible that they capitulated to be the samples for the petri dish that will only work to enculturate the continuation of corporate reform, which will predictably lead to what Hofstra University Professor, Alan Singer, calls the “McDonaldization of Teacher Education“?

Strauss puts the question this way, “How many times do educators need to attempt to reinvent the wheel just because someone with deep pockets wants to try when the money could almost certainly be more usefully spent somewhere else?” I ask this same question, in this case, here.

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

Special Issue of “Educational Researcher” (Paper #6 of 9): VAMs as Tools for “Egg-Crate” Schools

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit

Recall that the peer-reviewed journal Educational Researcher (ER) – published a “Special Issue” including nine articles examining value-added measures (VAMs). I have reviewed the next of nine articles (#6 of 9), which is actually an essay here, titled “Will VAMS Reinforce the Walls of the Egg-Crate School?” This essay is authored by Susan Moore Johnson – Professor of Education at Harvard and somebody who I in the past I had the privilege of interviewing as an esteemed member of the National Academy of Education (see interviews here and here).

In this article, Moore Johnson argues that when policymakers use VAMs to evaluate, reward, or dismiss teachers, they may be perpetuating an egg-crate model, which is (referencing Tyack (1974) and Lortie (1975)) a metaphor for the compartmentalized school structure in which teachers (and students) work, most often in isolation. This model ultimately undermines the efforts of all involved in the work of schools to build capacity school wide, and to excel as a school given educators’ individual and collective efforts.

Contrary to the primary logic supporting VAM use, however, “teachers are not inherently effective or ineffective” on their own. Rather, their collective effectiveness is related to their professional development that may be stunted when they work alone, “without the benefit of ongoing collegial influence” (p. 119). VAMs then, and unfortunately, can cause teachers and administrators to (hyper)focus “on identifying, assigning, and rewarding or penalizing individual [emphasis added] teachers for their effectiveness in raising students’ test scores [which] depends primarily on the strengths of individual teachers” (p. 119). What comes along with this, then, are a series of interrelated egg-crate behaviors including, but not limited to, increased competition, lack of collaboration, increased independence versus interdependence, and the like, all of which can lead to decreased morale and decreased effectiveness in effect.

Inversely, students are much “better served when human resources are deliberately organized to draw on the strengths of all teachers on behalf of all students, rather than having students subjected to the luck of the draw in their classroom assignment[s]” (p. 119). Likewise, “changing the context in which teachers work could have important benefits for students throughout the school, whereas changing individual teachers without changing the context [as per VAMs] might not [work nearly as well] (Lohr, 2012)” (p. 120). Teachers learning from their peers, working in teams, teaching in teams, co-planning, collaborating, learning via mentoring by more experienced teachers, learning by mentoring, and the like should be much more valued, as warranted via the research, yet they are not valued given the very nature of VAM use.

Hence, there are also unintended consequences that can also come along with the (hyper)use of individual-level VAMs. These include, but are not limited to: (1) Teachers who are more likely to “literally or figuratively ‘close their classroom door’ and revert to working alone…[This]…affect[s] current collaboration and shared responsibility for school improvement, thus reinforcing the walls of the egg-crate school” (p. 120); (2) Due to bias, or that teachers might be unfairly evaluated given the types of students non-randomly assigned into their classrooms, teachers might avoid teaching high-needs students if teachers perceive themselves to be “at greater risk” of teaching students they cannot grow; (3) This can perpetuate isolative behaviors, as well as behaviors that encourage teachers to protect themselves first, and above all else; (4) “Therefore, heavy reliance on VAMS may lead effective teachers in high-need subjects and schools to seek safer assignments, where they can avoid the risk of low VAMS scores[; (5) M]eanwhile, some of the most challenging teaching assignments would remain difficult to fill and likely be subject to repeated turnover, bringing steep costs for students” (p. 120); While (6) “using VAMS to determine a substantial part of the teacher’s evaluation or pay [also] threatens to sidetrack the teachers’ collaboration and redirect the effective teacher’s attention to the students on his or her roster” (p. 120-121) versus students, for example, on other teachers’ rosters who might also benefit from other teachers’ content area or other expertise. Likewise (7) “Using VAMS to make high-stakes decisions about teachers also may have the unintended effect of driving skillful and committed teachers away from the schools that need them most and, in the extreme, causing them to leave the profession” in the end (p. 121).

I should add, though, and in all fairness given the Review of Paper #3 – on VAMs’ potentials here, many of these aforementioned assertions are somewhat hypothetical in the sense that they are based on the grander literature surrounding teachers’ working conditions, versus the direct, unintended effects of VAMs, given no research yet exists to examine the above, or other unintended effects, empirically. “There is as yet no evidence that the intensified use of VAMS interferes with collaborative, reciprocal work among teachers and principals or sets back efforts to move beyond the traditional egg-crate structure. However, the fact that we lack evidence about the organizational consequences of using VAMS does not mean that such consequences do not exist” (p. 123).

The bottom line is that we do not want to prevent the school organization from becoming “greater than the sum of its parts…[so that]…the social capital that transforms human capital through collegial activities in schools [might increase] the school’s overall instructional capacity and, arguably, its success” (p. 118). Hence, as Moore Johnson argues, we must adjust the focus “from the individual back to the organization, from the teacher to the school” (p. 118), and from the egg-crate back to a much more holistic and realistic model capturing what it means to be an effective school, and what it means to be an effective teacher as an educational professional within one. “[A] school would do better to invest in promoting collaboration, learning, and professional accountability among teachers and administrators than to rely on VAMS scores in an effort to reward or penalize a relatively small number of teachers” (p. 122).

*****

If interested, see the Review of Article #1 – the introduction to the special issue here; see the Review of Article #2 – on VAMs’ measurement errors, issues with retroactive revisions, and (more) problems with using standardized tests in VAMs here; see the Review of Article #3 – on VAMs’ potentials here; see the Review of Article #4 – on observational systems’ potentials here; and see the Review of Article #5 – on teachers’ perceptions of observations and student growth here.

Article #6 Reference: Moore Johnson, S. (2015). Will VAMS reinforce the walls of the egg-crate school? Educational Researcher, 44(2), 117-126. doi:10.3102/0013189X15573351

ShareTweet about this on TwitterShare on FacebookEmail this to someoneShare on Google+Share on LinkedInShare on Reddit