New York Teacher Sheri Lederman’s Lawsuit Update

Recall the New York lawsuit pertaining to Long Island teacher Sheri Lederman? The teacher who by all accounts other than her recent (2013-2014) 1 out of 20 growth score is a terrific 4th grade, 18 year veteran teacher. She, along with her attorney and husband Bruce Lederman, are suing the state of New York to challenge the state’s growth-based teacher evaluation system. See prior posts about Sheri’s case herehere and here. I, along with Linda Darling-Hammond (Stanford), Aaron Pallas (Columbia University Teachers College), Carol Burris (Executive Director of the Network for Public Education Foundation), Brad Lindell (Long Island Research Consultant), Sean Corcoran (New York University) and Jesse Rothstein (University of California – Berkeley) are serving as part of Sheri’s team.

Bruce Lederman just emailed me with an update, and some links re: this update (below), and he gave me permission to share all of this with you.

The judge hearing this case recently asked the lawyers on both sides of Sheri’s case to brief the court by the end of this month (February 29, 2016) on a new issue, positioned and pushed back into the court by the New York State Education Department (NYSED). The issue to be heard pertains to the state’s new “moratorium” or “emergency regulations” related to the state’s high-stakes use of its growth scores, all of which is likely related to the political reaction to the opt-out movement throughout the state of New York, the publicity pertaining to the Lederman lawsuit in and of itself, and the federal government’s adoption of the recent Every Student Succeeds Act (ESSA) given its specific provision that now permits states to decide whether (and if so how) to use teachers’ students’ test scores to hold teachers accountable for their levels of growth (in New York) or value-added.

While the federal government did not abolish such practices via its ESSA, the federal government did hand back to the states all power and authority over this matter. Accordingly, this does not mean growth models/VAMs are going to simply disappear, as states do still have the power and authority to move forward with their prior and/or their new teacher evaluation systems, based in small or large part, on growth models/VAMs. As also quite evident since President Obama’s signing of the ESSA, some states are continuing to move forward in this regard, and regardless of the ESSA, in some cases at even higher speeds than before, in support of what some state policymakers still apparently believe (despite the research) are the accountability measures that will still help them to (symbolically) support educational reform in their states. See, for example, prior posts about the state of Alabama, here, New Mexico, here, and Texas, here, which is still moving forward with its plans introduced pre-ESSA. See prior posts about New York here, here, and here, the state in which also just one year ago Governor Cuomo was promoting increased use of New York’s growth model and publicly proclaiming that it was “baloney” that more teachers were not being found “ineffective,” after which Cuomo pushed through the New York budget process amendments to the law increasing the weight of teachers’ growth scores to an approximate 50% weight in many cases.

Nonetheless, as per this case in New York, state Attorney General Eric Schneiderman, on behalf of the NYSED, offered to settle this lawsuit out of court by giving Sheri some accommodation on her aforementioned 2013-2014 score of 1 out of 20, if Sheri and Bruce dropped the challenge to the state’s VAM-based teacher evaluation system. Sheri and Bruce declined, for a number or reasons, including that under the state’s recent “moratorium,” the state’s growth model is still set to be used throughout the state of New York for the next four years, with teachers’ annual performance reviews based in part on growth scores reported to parents, newspapers (on an aggregate basis), and the like. While, again, high-stakes are not to be attached to the growth output for four years, the scores will still “count.”

Hence, Sheri and Bruce believe that because they have already “convincingly” shown that the state’s growth model does not “rationally” work for teacher evaluation purposes, and that teacher evaluations as based on the state’s growth model actually violate state law since teachers like Sheri are not capable of getting perfect scores (which is “irrational”), they will continue with this case, also on behalf of New York teachers and principals who are “demoralized” by the system, as well as New York taxpayers who are paying (millions “if not tens of millions of dollars” for the system’s (highly) unreliable and inaccurate results.

As per Bruce’s email: “Spending the next 4 years studying a broken system is a terrible idea and terrible waste of taxpayer $$s. Also, if [NYSED] recognizes that Sheri’s 2013-14 score of 1 out of 20 is wrong [which they apparently recognize given their offer to settle this suit out of court], it’s sad and frustrating that [NYSED] still wants to fight her score unless she drops her challenge to the evaluation system in general.”

“We believe our case is already responsible for the new administrative appeal process in NY, and also partly responsible for Governor Cuomos’ apparent reversal on his stand about teacher evaluations. However, at this point we will not settle and allow important issues to be brushed under the carpet. Sheri and I are committed to pressing ahead with our case.”

To read more about this case via a Politico New York article click here (registration required). To hear more from Bruce Lederman about this case via WCNY-TV, Syracuse, click here. The pertinent section of this interview starts at 22:00 minutes and ends at 36:21. It’s well worth listening!

Report on the Stability of Student Growth Percentile (SGP) “Value-Added” Estimates

The Student Growth Percentiles (SGPs) model, which is loosely defined by value-added model (VAM) purists as a VAM, uses students’ level(s) of past performance to determine students’ normative growth over time, as compared to his/her peers. “SGPs describe the relative location of a student’s current score compared to the current scores of students with similar score histories” (Castellano & Ho, p. 89). Students are compared to themselves (i.e., students serve as their own controls) over time; therefore, the need to control for other variables (e.g., student demographics) is less necessary, although this is of debate. Nonetheless, the SGP model was developed as a “better” alternative to existing models, with the goal of providing clearer, more accessible, and more understandable results to both internal and external education stakeholders and consumers. For more information about the SGP please see prior posts here and here. See also an original source about the SGP here.

Related, in a study released last week, WestEd researchers conducted an “Analysis of the stability of teacher-level growth scores [derived] from the student growth percentile [SGP] model” in one, large school district in Nevada (n=370 teachers). The key finding they present is that “half or more of the variance in teacher scores from the [SGP] model is due to random or otherwise unstable sources rather than to reliable information that could predict future performance. Even when derived by averaging several years of teacher scores, effectiveness estimates are unlikely to provide a level of reliability desired in scores used for high-stakes decisions, such as tenure or dismissal. Thus, states may want to be cautious in using student growth percentile [SGP] scores for teacher evaluation.”

Most importantly, the evidence in this study should make us (continue to) question the extent to which “the learning of a teacher’s students in one year will [consistently] predict the learning of the teacher’s future students.” This is counter to the claims continuously made by VAM proponents, including folks like Thomas Kane — economics professor from Harvard University who directed the $45 million worth of Measures of Effective Teaching (MET) studies for the Bill & Melinda Gates Foundation. While faint signals of what we call predictive validity might be observed across VAMs, what folks like Kane overlook or avoid is that very often these faint signals do not remain constant over time. Accordingly, the extent to which we can make stable predictions is limited.

Worse is when folks falsely assume that said predictions will remain constant over time, and they make high-stakes decisions about teachers unaware of the lack of stability present, in typically 25-59% of teachers’ value-added (or in this case SGP) scores (estimates vary by study and by analyses using one to three years of data — see, for example, the studies detailed in Appendix A of this report; see also other research on this topic here, here, and here). Nonetheless, researchers in this study found that in mathematics, 50% of the variance in teachers’ value-added scores were attributable to differences among teachers, and the other 50% was random or unstable. In reading, 41% of the variance in teachers’ value-added scores were attributable to differences among teachers, and the other 59% was random or unstable.

In addition, using a 95% confidence interval (which is very common in educational statistics) researchers found that in mathematics, a teacher’s true score would span 48 points, “a margin of error that covers nearly half the 100 point score scale,” whereby “one would be 95 percent confident that the true math score of a teacher who received a score of 50 [would actually fall] between 26 and 74.” For reading, a teacher’s true score would span 44 points, whereby one would be 95 percent confident that the true reading score of a teacher who received a score of 50 would actually fall between 38 and 72. The stability of these scores would increase with three years of data, which has also been found by other researchers on this topic. However, they too have found that such error rates persist to an extent that still prohibits high-stakes decision making.

In more practical terms, what this also means is that a teacher who might be considered highly ineffective might be terminated, even though the following year (s)he could have been observed to be highly effective. Inversely, teachers who are awarded tenure might be observed as ineffective one, two, and/or three years following, not because their true level(s) of effectiveness change, but because of the error in the estimates that causes such instabilities to occur. Hence, examinations of the the stability of such estimates over time provides essential evidence of the validity, and in this case predictive validity, of the interpretations and uses of such scores over time. This is particularly pertinent when high-stakes decisions are to be based on (or in large part on) such scores, especially given some researchers are calling for reliability coefficients of .85 or higher to make such decisions (Haertel, 2013; Wasserman & Bracken, 2003).

In the end, researchers’ overall conclusion is that SGP-derived “growth scores alone may not be sufficiently stable to support high-stakes decisions.” Likewise, relying on the extant research on this topic, the overall conclusion can be broadened in that neither SGP- or VAM-based growth scores may be sufficiently stable to support high-stakes decisions. In other words, it is not just the SGP model that is yielding such issues with stability (or a lack thereof). Again, see the other literature in which researchers situated their findings in Appendix A. See also other similar studies here, here, and here.

Accordingly, those who read this report, and consequently seek to find a better or more stable model that yields more stable estimates, will unfortunately but likely fail in their search.

References:

Castellano, K. E., & Ho, A. D. (2013). A practitioner’s guide to growth models. Washington, DC: Council of Chief State School Officers.

Haertel, E. H. (2013). Reliability and validity of inferences about teachers based on student test scores (14th William H. Angoff Memorial Lecture). Princeton, NJ: Educational Testing Service (ETS).

Lash, A., Makkonen, R., Tran, L., & Huang, M. (2016). Analysis of the stability of teacher-level growth scores [derived] from the student growth percentile [SGP] model. (16–104). Washington, DC: U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance, Regional Educational Laboratory West.

Wasserman, J. D., & Bracken, B. A. (2003). Psychometric characteristics of assessment procedures. In I. B. Weiner, J. R. Graham, & J. A. Naglieri (Eds.), Handbook of psychology:
Assessment psychology (pp. 43–66). Hoboken, NJ: John Wiley & Sons.

Special Issue of “Educational Researcher” (Paper #7 of 9): VAMs Situated in Appropriate Ecologies

Recall that the peer-reviewed journal Educational Researcher (ER) – recently published a “Special Issue” including nine articles examining value-added measures (VAMs). I have reviewed the next of nine articles (#7 of 9), which is actually a commentary titled “The Value in Value-Added Depends on the Ecology.” This commentary is authored by Henry Braun – Professor of Education and Public Policy, Educational Research, Measurement, and Evaluation at Boston College (also the author of a previous post on this site here).

In this article Braun, importantly, makes explicit the assumptions on which this special issue of ER is based; that is, on assumptions that (1) too many students in America’s public schools are being inadequately educated, (2) evaluation systems as they currently exist “require radical overhaul,” and (3) it is therefore essential to use student test performance with low- and high-stakes attached to improve that which educators do (or don’t do) to adequately address the first assumption. There are counterarguments Braun also offers to readers on each of these assumptions (see p. 127), but more importantly he makes evident that the focus of this special issue is situated otherwise, as in line with current education policies. This special issue, overall, then “raise[s] important questions regarding the potential for high-stakes, test-driven educator accountability systems to contribute to raising student achievement” (p. 127).

Given this context, the “value-added” provided within this special issue, again according to Braun, is that the authors of each of the five main research articles included report on how VAM output actually plays out in practice, given “careful consideration to how the design and implementation of teacher evaluation systems could be modified to enhance the [purportedly, see comments above] positive impact of accountability and mitigate the negative consequences” at the same time (p. 127). In other words, if we more or less agree to the aforementioned assumptions, also given the educational policy context influence, perpetuating, or actually forcing these assumptions, these articles should help others better understand VAMs’ and observational systems’ potentials and perils in practice.

At the same time, Braun encourages us to note that “[t]he general consensus is that a set of VAM scores does contain some useful information that meaningfully differentiates among teachers, especially in the tails of the distribution [although I would argue bias has a role here]. However, individual VAM scores do suffer from high variance and low year-to-year stability as well as an undetermined amount of bias [which may be greater in the tails of the distribution]. Consequently, if VAM scores are to be used for evaluation, they should not be given inordinate weight and certainly not treated as the “gold standard” to which all other indicators must be compared” (p. 128).

Likewise, it’s important to note that IF consequences are to be attached to said indicators of teacher evaluation (i.e., VAM and observational data), there should be validity evidence made available and transparent to warrant the inferences and decisions to be made, and the validity evidence “should strongly support a causal [emphasis added] argument” (p. 128). However, both indicators still face major “difficulties in establishing defensible causal linkage[s]” as theorized, and desired (p. 128); hence, this prevents validity in inference. What does not help, either, is when VAM scores are given precedence over other indicators OR when principals align teachers’ observational scores with the same teachers’ VAM scores given the precedence often given to (what are often viewed as the superior, more objective) VAM-based measures. This sometimes occurs given external pressures (e.g., applied by superintendents) to artificially conflate, in this case, levels of agreement between indicators (i.e., convergent validity).

Related, in the section Braun titles his “Trio of Tensions,” (p. 129) he notes that (1) [B]oth accountability and improvement are undermined, as attested to by a number of the articles in this issue. In the current political and economic climate, [if possible] it will take thoughtful and inspiring leadership at the state and district levels to create contexts in which an educator evaluation system constructively fulfills its roles with respect to both public accountability and school improvement” (p. 129-130); (2) [T]he chasm between the technical sophistication of the various VAM[s] and the ability of educators to appreciate what these models are attempting to accomplish…sow[s] further confusion…[hence]…there must be ongoing efforts to convey to various audiences the essential issues—even in the face of principled disagreements among experts on the appropriate roles(s) for VAM[s] in educator evaluations” (p. 130); and finally (3) [H]ow to balance the rights of students to an adequate education and the rights of teachers to fair evaluations and due process [especially for]…teachers who have value-added scores and those who teach in subject-grade combinations for which value-added scores are not feasible…[must be addressed; this] comparability issue…has not been addressed but [it] will likely [continue to] rear its [ugly] head” (p. 130).

In the end, Braun argues for another “Trio,” but this one including three final lessons: (1) “although the concerns regarding the technical properties of VAM scores are not misplaced, they are not necessarily central to their reputation among teachers and principals. [What is central is]…their links to tests of dubious quality, their opaqueness in an atmosphere marked by (mutual) distrust, and the apparent lack of actionable information that are largely responsible for their poor reception” (p. 130); (2) there is a “very substantial, multiyear effort required for proper implementation of a new evaluation system…[related, observational] ratings are not a panacea. They, too, suffer from technical deficiencies and are the object of concern among some teachers because of worries about bias” (p. 130); and (3) “legislators and policymakers should move toward a more ecological approach [emphasis added; see also the Review of Article (Essay) #6 – on VAMs as tools for “egg-crate” schools here] to the design of accountability systems; that is, “one that takes into account the educational and political context for evaluation, the behavioral responses and other dynamics that are set in motion when a new regime of high-stakes accountability is instituted, and the long-term consequences of operating the system” (p. 130).

*****

If interested, see the Review of Article #1 – the introduction to the special issue here; see the Review of Article #2 – on VAMs’ measurement errors, issues with retroactive revisions, and (more) problems with using standardized tests in VAMs here; see the Review of Article #3 – on VAMs’ potentials here; see the Review of Article #4 – on observational systems’ potentials here; see the Review of Article #5 – on teachers’ perceptions of observations and student growth here; and see the Review of Article (Essay) #6 – on VAMs as tools for “egg-crate” schools here.

Article #7 Reference: Braun, H. (2015). The value in value-added depends on the ecology. Educational Researcher, 44(2), 127-131. doi:10.3102/0013189X15576341

Why Standardized Tests Should Not Be Used to Evaluate Teachers (and Teacher Education Programs)

David C. Berliner, Regents’ Professor Emeritus here at Arizona State University (ASU), who also just happens to be my former albeit forever mentor, recently took up research on the use of test scores to evaluate teachers, for example, using value-added models (VAMs). While David is world-renowned for his research in educational psychology, and more specific to this case, his expertise on effective teaching behaviors and how to capture and observe them, he has also now ventured into the VAM-related debates.

Accordingly, he recently presented his newest and soon-to-be-forthcoming published research on using standardized tests to evaluate teachers, something he aptly termed in the title of his presentation “A Policy Fiasco.” He delivered his speech to an audience in Melbourne, Australia, and you can click here for the full video-taped presentation; however, given the whole presentation takes about one hour to watch, although I must say watching the full hour is well worth it, I highlight below what are his highlights and key points. These should certainly be of interest to you all as followers of this blog, and hopefully others.

Of main interest are his 14 reasons, “big and small’ for [his] judgment that assessing teacher competence using standardized achievement tests is nearly worthless.”

Here are his fourteen reasons:

  1. “When using standardized achievement tests as the basis for inferences about the quality of teachers, and the institutions from which they came, it is easy to confuse the effects of sociological variables on standardized test scores” and the effects teachers have on those same scores. Sociological variables (e.g., chronic absenteeism) continue to distort others’ even best attempts to disentangle them from the very instructional variables of interest. This, what we also term as biasing variables, are important not to inappropriately dismiss, as purportedly statistically “controlled for.”
  2. In law, we do not hold people accountable for the actions of others, for example, when a child kills another child and the parents are not charged as guilty. Hence, “[t]he logic of holding [teachers and] schools of education responsible for student achievement does not fit into our system of law or into the moral code subscribed to by most western nations.” Related, should medical school or doctors, for that matter, be held accountable for the health of their patients? One of the best parts of his talk, in fact, is about the medical field and the corollaries Berliner draws between doctors and medical schools, and teachers and colleges of education, respectively (around the 19-25 minute mark of his video presentation).
  3. Professionals are often held harmless for their lower success rates with clients who have observable difficulties in meeting the demands and the expectations of the professionals who attend to them. In medicine again, for example, when working with impoverished patients, “[t]here is precedent for holding [doctors] harmless for their lowest success rates with clients who have observable difficulties in meeting the demands and expectations of the [doctors] who attend to them, but the dispensation we offer to physicians is not offered to teachers.”
  4. There are other quite acceptable sources of data, besides tests, for judging the efficacy of teachers and teacher education programs. “People accept the fact that treatment and medicine may not result in the cure of a disease. Practicing good medicine is the goal, whether or not the patient gets better or lives. It is equally true that competent teaching can occur independent of student learning or of the achievement test scores that serve as proxies for said learning. A teacher can literally “save lives” and not move the metrics used to measure teacher effectiveness.
  5. Reliance on standardized achievement test scores as the source of data about teacher quality will inevitably promote confusion between “successful” instruction and “good” instruction. “Successful” instruction gets test scores up. “Good” instruction leaves lasting impressions, fosters further interest by the students, makes them feel competent in the area, etc. Good instruction is hard to measure, but remains the goal of our finest teachers.
  6. Related, teachers affect individual students greatly, but affect standardized achievement test scores very little. All can think of how their own teachers impacted their lives in ways that cannot be captured on a standardized achievement test.  Standardized achievement test scores are much more related to home, neighborhood and cohort than they are to teachers’ instructional capabilities. In more contemporary terms, this is also due the fact that large-scale standardized tests have (still) never been validated to measure student growth over time, nor have they been validated to attribute that growth to teachers. “Teachers have huge effects, it’s just that the tests are not sensitive to them.”
  7. Teacher’s effects on standardized achievement test scores fade quickly, barely discernable after a few years. So we might not want to overly worry about most teachers’ effects on their students—good or bad—as they are hard to detect on tests after two or so years. To use these ephemeral effects to then hold teacher education programs accountable seems even more problematic.
  8. Observational measures of teacher competency and achievement tests of teacher competency do not correlate well. This suggest nothing more than that one or both of these measures, and likely the latter, are malfunctioning in their capacities to measure the teacher effectiveness construct. See other Vamboozled posts about this here, here, and here.
  9. Different standardized achievement tests, both purporting to measure reading, mathematics, or science at the same grade level, will give different estimates of teacher competency. That is because different test developers have different visions of what it means to be competent in each of these subject areas. Thus one achievement test in these subject areas could find a teacher exemplary, but another test of those same subject areas would find the teacher lacking. What then? Have we an unstable teacher or an ill-defined subject area?
  10. Tests can be administered early or late in the fall, early or late in the spring, and the dates they are given influence the judgments about whether a teacher is performing well or poorly. Teacher competency should not be determined by minor differences in the date of testing, but that happens frequently.
  11. No standardized achievement tests have provided proof that their items are instructionally sensitive. If test items do not, because they cannot “react to good instruction,” how can one make a claim that the test items are “tapping good instruction?”
  12. Teacher effects show up more dramatically on teacher made tests than on standardized achievement tests because the former are based on the enacted curriculum, while the latter are based on the desired curriculum. You get seven times more instructionally sensitive tests the closer the test is to the classroom (i.e., teacher made tests).
  13. The opt-out testing movement invalidates inferences about teachers and schools that can be made from standardized achievement test results. Its not bad to remove these kids from taking these tests, and perhaps it is even necessary in our over-tested schools, but the tests and the VAM estimates derived via these tests, are far less valid when that happens. This is because the students who opt out are likely different in significant ways from those who do take the tests. This severely limits the validity claims that are made.
  14. Assessing new teachers with standardized achievement tests is likely to yield many false negatives. That is, the assessments would identify teachers early in their careers as ineffective in improving test scores, which is, in fact, often the case for new teachers. Two or three years later that could change. Perhaps the last thing we want to do in a time of teacher shortage is discourage new teachers while they acquire their skills.

Victory in Court: Consequences Attached to VAMs Suspended Throughout New Mexico

Great news for New Mexico and New Mexico’s approximately 23,000 teachers, and great news for states and teachers potentially elsewhere, in terms of setting precedent!

Late yesterday, state District Judge David K. Thomson, who presided over the ongoing teacher-evaluation lawsuit in New Mexico, granted a preliminary injunction preventing consequences from being attached to the state’s teacher evaluation data. More specifically, Judge Thomson ruled that the state can proceed with “developing” and “improving” its teacher evaluation system, but the state is not to make any consequential decisions about New Mexico’s teachers using the data the state collects until the state (and/or others external to the state) can evidence to the court during another trial (set for now, for April) that the system is reliable, valid, fair, uniform, and the like.

As you all likely recall, the American Federation of Teachers (AFT), joined by the Albuquerque Teachers Federation (ATF), last year, filed a “Lawsuit in New Mexico Challenging [the] State’s Teacher Evaluation System.” Plaintiffs charged that the state’s teacher evaluation system, imposed on the state in 2012 by the state’s current Public Education Department (PED) Secretary Hanna Skandera (with value-added counting for 50% of teachers’ evaluation scores), is unfair, error-ridden, spurious, harming teachers, and depriving students of high-quality educators, among other claims (see the actual lawsuit here).

Thereafter, one scheduled day of testimonies turned into five in Santa Fe, that ran from the end of September through the beginning of October (each of which I covered here, here, here, here, and here). I served as the expert witness for the plaintiff’s side, along with other witnesses including lawmakers (e.g., a state senator) and educators (e.g., teachers, superintendents) who made various (and very articulate) claims about the state’s teacher evaluation system on the stand. Thomas Kane served as the expert witness for the defendant’s side, along with other witnesses including lawmakers and educators who made counter claims about the system, some of which backfired, unfortunately for the defense, primarily during cross-examination.

See articles released about this ruling this morning in the Santa Fe New Mexican (“Judge suspends penalties linked to state’s teacher eval system”) and the Albuquerque Journal (“Judge curbs PED teacher evaluations).” See also the AFT’s press release, written by AFT President Randi Weingarten, here. Click here for the full 77-page Order written by Judge Thomson (see also, below, five highlights I pulled from this Order).

The journalist of the Santa Fe New Mexican, though, provided the most detailed information about Judge Thomson’s Order, writing, for example, that the “ruling by state District Judge David Thomson focused primarily on the complicated combination of student test scores used to judge teachers. The ruling [therefore] prevents the Public Education Department [PED] from denying teachers licensure advancement or renewal, and it strikes down a requirement that poorly performing teachers be placed on growth plans.” In addition, the Judge noted that “the teacher evaluation system varies from district to district, which goes against a state law calling for a consistent evaluation plan for all educators.”

The PED continues to stand by its teacher evaluation system, calling the court challenge “frivolous” and “a legal PR stunt,” all the while noting that Judge Thomson’s decision “won’t affect how the state conducts its teacher evaluations.” Indeed it will, for now and until the state’s teacher evaluation system is vetted, and validated, and “the court” is “assured” that the system can actually be used to take the “consequential actions” against teachers, “required” by the state’s PED.

Here are some other highlights that I took directly from Judge Thomson’s ruling, capturing what I viewed as his major areas of concern about the state’s system (click here, again, to read Judge Thomson’s full Order):

  • Validation Needed: “The American Statistical Association says ‘estimates from VAM should always be accompanied by measures of precision and a discussion of the assumptions and possible limitations of the model. These limitations are particularly relevant if VAM are used for high stake[s] purposes” (p. 1). These are the measures, assumptions, limitations, and the like that are to be made transparent in this state.
  • Uniformity Required: “New Mexico’s evaluation system is less like a [sound] model than a cafeteria-style evaluation system where the combination of factors, data, and elements are not easily determined and the variance from school district to school district creates conflicts with the [state] statutory mandate” (p. 2)…with the existing statutory framework for teacher evaluations for licensure purposes requiring “that the teacher be evaluated for ‘competency’ against a ‘highly objective uniform statewide standard of evaluation’ to be developed by PED” (p. 4). “It is the term ‘highly objective uniform’ that is the subject matter of this suit” (p. 4), whereby the state and no other “party provided [or could provide] the Court a total calculation of the number of available district-specific plans possible given all the variables” (p. 54). See also the Judge’s points #78-#80 (starting on page 70) for some of the factors that helped to “establish a clear lack of statewide uniformity among teachers” (p. 70).
  • Transparency Missing: “The problem is that it is not easy to pull back the curtain, and the inner workings of the model are not easily understood, translated or made accessible” (p. 2). “Teachers do not find the information transparent or accurate” and “there is no evidence or citation that enables a teacher to verify the data that is the content of their evaluation” (p. 42). In addition, “[g]iven the model’s infancy, there are no real studies to explain or define the [s]tate’s value-added system…[hence, the consequences and decisions]…that are to be made using such system data should be examined and validated prior to making such decisions” (p. 12).
  • Consequences Halted: “Most significant to this Order, [VAMs], in this [s]tate and others, are being used to make consequential decisions…This is where the rubber hits the road [as per]…teacher employment impacts. It is also where, for purposes of this proceeding, the PED departs from the statutory mandate of uniformity requiring an injunction” (p. 9). In addition, it should be noted that indeed “[t]here are adverse consequences to teachers short of termination” (p. 33) including, for example, “a finding of ‘minimally effective’ [that] has an impact on teacher licenses” (p. 41). These, too, are to be halted under this injunction Order.
  • Clarification Required: “[H]ere is what this [O]rder is not: This [O]rder does not stop the PED’s operation, development and improvement of the VAM in this [s]tate, it simply restrains the PED’s ability to take consequential actions…until a trial on the merits is held” (p. 2). In addition, “[a] preliminary injunction differs from a permanent injunction, as does the factors for its issuance…’ The objective of the preliminary injunction is to preserve the status quo [minus the consequences] pending the litigation of the merits. This is quite different from finally determining the cause itself” (p. 74). Hence, “[t]he court is simply enjoining the portion of the evaluation system that has adverse consequences on teachers” (p. 75).

The PED also argued that “an injunction would hurt students because it could leave in place bad teachers.” As per Judge Thomson, “That is also a faulty argument. There is no evidence that temporarily halting consequences due to the errors outlined in this lengthy Opinion more likely results in retention of bad teachers than in the firing of good teachers” (p. 75).

Finally, given my involvement in this lawsuit and given the team with whom I was/am still so fortunate to work (see picture below), including all of those who testified as part of the team and whose testimonies clearly proved critical in Judge Thomson’s final Order, I want to thank everyone for all of their time, energy, and efforts in this case, thus far, on behalf of the educators attempting to (still) do what they love to do — teach and serve students in New Mexico’s public schools.

IMG_0123

Left to right: (1) Stephanie Ly, President of AFT New Mexico; (2) Dan McNeil, AFT Legal Department; (3) Ellen Bernstein, ATF President; (4) Shane Youtz, Attorney at Law; and (5) me 😉

Including Summers “Adds Considerable Measurement Error” to Value-Added Estimates

A new article titled “The Effect of Summer on Value-added Assessments of Teacher and School Performance” was recently released in the peer-reviewed journal Education Policy Analysis Archives. The article is authored by Gregory Palardy and Luyao Peng from the University of California, Riverside. 

Before we begin, though, here is some background so that you all understand the importance of the findings in this particular article.

In order to calculate teacher-level value added, all states are currently using (at minimum) the large-scale standardized tests mandated by No Child Left Behind (NCLB) in 2002. These tests were mandated for use in the subject areas of mathematics and reading/language arts. However, because these tests are given only once per year, typically in the spring, to calculate value-added statisticians measure actual versus predicted “growth” (aka “value-added”) from spring-to-spring, over a 12-month span, which includes summers.

While many (including many policymakers) assume that value-added estimations are calculated from fall to spring during time intervals under which students are under the same teachers’ supervision and instruction, this is not true. The reality is that the pre- to post-test occasions actually span 12-month periods, including the summers that often cause the nettlesome summer effects often observed via VAM-based estimates. Different students learn different things over the summer, and this is strongly associated (and correlated) with student’s backgrounds, and this is strongly associated (and correlated) with students’ out-of-school opportunities (e.g., travel, summer camps, summer schools). Likewise, because summers are the time periods over which teachers and schools tend to have little control over what students do, this is also the time period during which research  indicates that achievement gaps maintain or widen. More specifically, research indicates that indicates that students from relatively lower socio-economic backgrounds tend to suffer more from learning decay than their wealthier peers, although they learn at similar rates during the school year.

What these 12-month testing intervals also include are prior teachers’ residual effects, whereas students testing in the spring, for example, finish out every school year (e.g., two months or so) with their prior teachers before entering the classrooms of the teachers for whom value-added is to be calculated the following spring, although teachers’ residual effects were not of focus in this particular study.

Nonetheless, via the research, we have always known that these summer (and prior or adjacent teachers’ residual effects) are difficult if not impossible to statistically control. This in and of itself leads to much of the noise (fluctuations/lack of reliability, imprecision, and potential biases) we observe in the resulting value-added estimates. This is precisely what was of focus in this particular study.

In this study researchers examined “the effects of including the summer period on value-added assessments (VAA) of teacher and school performance at the [1st] grade [level],” as compared to using VAM-based estimates derived from a fall-to-spring test administration within the same grade and same year (i.e., using data derived via a nationally representative sample via the National Center for Education Statistics (NCES) with an n=5,034 children).

Researchers found that:

  • Approximately 40-62% of the variance in VAM-based estimates originates from the summer period, depending on the reading or math outcome;
  • When summer is omitted from VAM-based calculations using within year pre/post-tests, approximately 51-61% of the teachers change performance categories. What this means in simpler terms is that including summers in VAM-based estimates is indeed causing some of the errors and misclassification rates being observed across studies.
  • Statistical controls to control for student and classroom/school variables reduces summer effects considerably (e.g., via controlling for students’ prior achievement), yet 36-47% of teachers still fall into different quintiles when summers are included in the VAM-based estimates.
  • Findings also evidence that including summers within VAM-based calculations tends to bias VAM-based estimates against schools with higher relative concentrations of poverty, or rather higher relative concentrations of students who are eligible for the federal free-and-reduced lunch program.
  • Overall, results suggest that removing summer effects from VAM-based estimates may require biannual achievement assessments (i.e., fall and spring). If we want VAM-based estimates to be more accurate, we might have to double the number of tests we administer per year in each subject area for which teachers are to be held accountable using VAMs. However, “if twice-annual assessments are not conducted, controls for prior achievement seem to be the best method for minimizing summer effects.”

This is certainly something to consider in terms of trade-offs, specifically in terms of whether we really want to “double-down” on the number of tests we already require our public students to take (also given the time that testing and test preparation already takes away from students’ learning activities), and whether we also want to “double-down” on the increased costs of doing so. I should also note here, though, that using pre/post-tests within the same year is (also) not as simple as it may seem (either). See another post forthcoming about the potential artificial deflation/inflation of pre/post scores to manufacture artificial levels of growth.

To read the full study, click here.

*I should note that I am an Associate Editor for this journal, and I served as editor for this particular publication, seeing it through the full peer-reviewed process.

Citation: Palardy, G. J., & Peng, L. (2015). The effects of including summer on value-added assessments of teachers and schools. Education Policy Analysis Archives, 23(92). doi:10.14507/epaa.v23.1997 Retrieved from http://epaa.asu.edu/ojs/article/view/1997

The Forgotten VAM: The A-F School Grading System

Here is another post from our “Concerned New Mexico Parent” (see prior posts from him/her here and here). This one is about New Mexico’s A-F School Grading System and how it is not only contradictory, within and beyond itself, but how it also provides little instrumental value to the public as an invalid indicator of the “quality” of any school.

(S)he writes:

  1. What do you call a high school that has only 38% of its students proficient in reading and 35% of its students proficient in mathematics?
  2. A school that needs help improving their student scores.
  3. What does the New Mexico Public Education Department (NMPED) call this same high school?
  4. A top-rated “A” school, of course.

Readers of this blog are familiar with the VAMs being used to grade teachers. Many states have implemented analogous formulas to grade entire schools. This “forgotten” VAM suffers from all of the familiar problems of the teacher formulas — incomprehensibility, lack of transparency, arbitrariness, and the like.

The first problem with the A-F Grading System is inherent in its very name. The “A-F” terminology implies that this one static assessment is an accurate representation of a school’s quality. As you will see, it is nothing of the sort.

The second problem with the A-F Grading System is that is is composed of benchmarks that are arbitrarily weighted and scored by the NMPED using VAM methodologies.

Thirdly, the “collapsing of the data” from a numeric score to a grade (corresponding to a range of values) causes valuable information to be lost.

Table 1 shows the range of values for reading and mathematics proficiencies for each of the five A-F grade categories for New Mexico schools.

Table 1: Ranges and Median of Reading and Mathematics Proficiencies by A-F School Grade

School A-F Grade Number of Schools

Reading
Proficiency Range

Median

Mathematics Proficiency Range

Median

A 86

37.90 – 94.00

66.16

31.50 – 95.70

58.95

B 237

16.90 – 90.90

58.00

4.90 – 90.90

51.30

C 177

0.00 – 83.80

46.30

0.00 – 76.20

38.00

D 21

4.50 – 64.60

40.70

2.20 – 70.00

31.80

F 88

7.80 – 52.30

31.85

3.30 – 40.90

23.30

For example, to earn an A rating, a school can have between 37.9% and 94.0% of its students proficient in reading. In other words, a school can have roughly two-thirds of its students fail reading proficiency yet be rated as an “A” school!

The median value listed shows the point which splits the group in half — one-half of the scores are below the median value. Thus, an “A” school median of 66.2% indicates that one-half of the “A” schools have a reading proficiency below 66.2%. In other words, in one-half of the “A” schools 1/3 or more of their students are NOT proficient in reading!

Amazingly, the figures for mathematics are even worse, the minimum proficiency for a B rating is only 4.9% proficient! Scandalous!

Obviously, and contrary to popular and press perceptions, the A-F Grading System has nothing to do with the actual or current quality of the school!

A few case studies will highlight further absurdities of the New Mexico A-F School Grading System next.

Case Study 1 – Highest “A”   vs. Lowest “A” High School

Logan High School, Logan, New Mexico received the lowest reading proficiency of any “A” school, and the Albuquerque Institute of Math and Science received the highest reading proficiency score.

These two schools have both received an “A” rating. The Albuquerque Institute had a reading proficiency of 94% and a mathematics proficiency rating of 93%. Logan HS had a reading proficiency of only 38% and a mathematics proficiency rating of only 35%!

How is that possible?

First, much of the A-F VAM, like the teacher VAM is based on multi-year growth calculations and predictions. Logan has plenty of opportunity for growth whereas the Math Academy has “maxed” out most of its scores. Thus, the Albuquerque Institute is penalized in a manner analogous to Gifted and Talented teachers when teacher-level VAM is used. With already excellent scores, there is little, if any, room for improvement.

Second, Logan has an emphasis on shop/trade classes which yields a very high college and CAREER readiness score for the VAM calculation.

Also, a final factor is that the NMPED-defined range for an “A” extends from 75 to 100 points, and Logan barely made it into the A grouping.

Thus, a proficiency score of only 37.9% is no deterrent to an A score for Logan High.

Case Study 2: Hanging on by a Thread

As noted above, any school that scores between 75 and 100 points is considered an “A” school.

This statistical oddity was very beneficial to Hagerman High (Hagerman, NM) in their 2014 School Grade Report Card. They fell 5.99 points overall from the previous year’s score, but they managed to still receive an “A” score since their resulting 2014 score was exactly 75.01.

With this one one-hundredth of a point, they are in the same “A” grade category as the Albuquerque Institute of Math and Science (rated best in New Mexico by NMPED) and the Cottonwood Classical Preparatory School of Albuquerque (rated best in New Mexico by US News).

Case Study 3: A Tale of Two Ranking Systems

This inaccuracy and arbitrariness of any A-F School Grading System was also apparent in a recent Albuquerque Journal News article (May 14, 2015) which reported on the most recent US News ratings of high schools nationwide.

The Journal reported on the top 12 high schools in New Mexico as rated by US News. It is not surprising that most were NMPED A-rated schools. What was unusual is that the 3rd and 5th US News highest rated schools in New Mexico (South Valley Academy and Albuquerque High, both in Albuquerque) were actually rated as B schools by the NMPED A-F School Grading System.

According to NMPED data, I tabulated at least forty-four (44) high schools that were rated as “A” schools with higher NMPED scores than South Valley Academy which had an NMPED score of 71.4.

None of these 44 higher NMPED scoring schools were rated above South Valley Academy by US News.

Case Study 4: Punitive Grading

Many school districts and school boards throughout New Mexico have adopted policies that prohibit punitive grading based on behavior. It is no longer possible to lower a student’s grade just because of their behavior. The grade should reflect classroom assessment only.

NMPED ignores this policy in the context of the A-F School Grading System. Schools were graded down one letter grade if they did not achieve 95% participation rates.

One such school was Mills Elementary in the Hobbs Municipal Schools District. Only 198 students were tested; they fell 11 short of the 95% mark and were penalized one “grade”-level. Their grade was reduced from a “D” to an “F” In fact, Mills Elementary proficiency scores were higher than the A-rated Logan High School discussed earlier.

The likely explanation is that Hobbs has a highly transient population with both seasonal farm laborers and oil-field workers predominating in the local economy.

For more urban schools, it will be interesting to see how the NMPED policy of punitive grading will play out with the increasingly popular Opt-Out movement.

Conclusion

It is apparent that the NMPED’s A-F School Grading System rates schools deceptively using VAM-augmented data and provides little of any value to the public as to the “quality” of a school. By presenting it in the form of an “NMPED School Grade Report Card” the state seeks to hide its arbitrary nature.

Such a useless grade should certainly not be used to declare a school a “failure” and in need of radical reform.

Florida’s Superintendents Have Officially “Lost Confidence” in the State’s Teacher and School Evaluation System

In an article written by Valerie Strauss and featured yesterday in The Washington Post, school superintendents throughout the state of Florida have officially declared to the state that they have “lost confidence” in Florida’s teacher and school evaluation system (click here for the original article). More specifically, state school superintendents submitted a letter to the state asserting that “they ‘have lost confidence’ in the system’s accuracy and are calling for a suspension and a review of the system” (click here also for their original letter).

This is the evaluation model instituted by Florida’s former governor Jeb Bush, whom we all also identify as the brother of former President George W. Bush and son of former President George H. W. Bush. Former President GW Bush was also the architect of what most now agree is the failed No Child Left Behind (NCLB) Act of 2001, that former President GW Bush put forth as federal policy given his “educational reforms” in the state of Texas in the late 1990s. Beyond this brotherly association, however, Jeb is also currently a running candidate for the 2016 Republican presidential nomination, and via his campaign he is also (like his brother, pre-NCLB) touting his “educational reforms” in the state of Florida, hoping to take these reforms also to the White House.

It is Jeb Bush’s “educational reforms” that, without research evidence in support, also became models of “educational reform” for other states around the country, including states like New Mexico. In New Mexico, the state in which its current evaluation system, as based on the Florida model, is at the core of a potentially significant lawsuit (see prior posts about this lawsuit here, here, here, and here). Not surprisingly, one of Jeb Bush’s protégés – Hanna Skandera, who is currently the state of New Mexico’s Secretary of its Public Education Department (PED) – took Bush’s model to even more of an extreme, making this particular model one of the most arbitrary and capricious of all such models currently in place throughout the U.S. (as also defined and detailed more fully here).

Hence, it should certainly be of interest to those in New Mexico (and other states) that the teacher and school evaluation model upon which their state model was built, is now apparently falling apart, at least for now.

“The superintendents [in Florida] are calling for the state to suspend the system for a year – meaning that the scores from this spring’s administration of the [state’s] exams will not be used in [the state-level] evaluations [of teachers or schools].” The state school superintendents are also calling for “a full review” of the system, which certainly seems warranted in Florida, New Mexico, and any other state for that matter, in which similar evaluation models are being paid for using taxpayer revenues, without much if any research evidence to support that they are working as intended, and sold, and paid for.

New Mexico’s Teacher Evaluation Lawsuit Underway: Day Two

Following up on my most recent post, about the “Lawsuit in New Mexico Challenging [the] State’s Teacher Evaluation System,” I testified yesterday for four hours regarding the state’s new teacher evaluation model. As noted in both articles linked to below, I positioned this state’s model as one of the most “arbitrary and capricious” systems I’ve seen across all states currently working with or implementing such systems, again, as incentivized via President Obama’s Race to the Top program and as required if states were to receive (and are to keep) the waivers excusing them from not meeting the previous requirement written into President George W. Bush’s No Child Left Behind Act of 2011, that all children would be academically proficient by the year 2014.

I positioned this teacher evaluation model as one of the most “arbitrary and capricious” systems I’ve seen across all states in that this state, unlike no other I have ever seen, because the state has worked very hard to make 100% of its teachers value-added eligible, while essentially pulling off-the-shelf, criterion- and norm-referenced tests and also developing (yet not satisfactorily vetting or validating) a series of end-of-course exams (EoCs) to do this. This also includes early childhood teachers using, for example, the norm-referenced DIBELS. Let us just say, for purposes of brevity, this (and many other of the state’s educational measurement actions in this regard) defy the Standards for Educational & Psychological Testing.

Nonetheless, the day’s testimony also included testimony from one high school science teacher who expressed his concerns about his evaluation outcomes, as well as the evaluation outcomes of his high school, science colleagues. The day ended with two hours of testimony given by the state’s model developer – Pete Goldschmidt – who is now an Associate Professor at California State University Northridge. Time ran out, however, before cross-examination.

For more information, though, I will direct you all to two articles that capture the main events or highlights of the day.

The author of the first article titled, “Experts differ on test-based evaluations at NM hearing,” fairly captures the events of the day, as well as the professional and collegial agreements and disagreements expressed to the judge by both Pete Goldschmidt and me.

The author of the second article titled, “Professor’s testimony: Teacher eval system ‘not ready for prime time,” however, was less fair. Nonetheless, for purposes of transparency, I include both articles for you all here, to also see how polarizing this topic can be (as many of us already know). I will say, though, that I liked the picture included in this latter article. Very Santa Fe-ish 😉

Day three of this trial will occur in Santa Fe next Thursday, during which Thomas Kane, an economics professor from Harvard University who also directed the $45 million worth of Measures of Effective Teaching (MET) studies for the Bill & Melinda Gates Foundation, and who has also been the subject of prior posts (see, for example, here, here and here) will testify. Also to testify will be Matthew Montaño, Educator Quality Division, New Mexico Public Education Department (PED).

Will keep you posted, again…

“Efficiency” as a Constitutional Mandate for Texas’s Educational System

The Texas Constitution requires that the state “establish and make suitable provision for the support and maintenance of an efficient system of public free schools,” as the “general diffusion of knowledge [is]…essential to the preservation of the liberties and rights of the people.” Following this notion, The George W. Bush Institute’s Education Reform Initiative  recently released its first set of reports as part of its The Productivity for Results Series: “A Legal Lever for Enhancing Productivity.” The report was authored by an affiliate of The New Teacher Project (TNTP) – the non-profit organization founded by the controversial former Chancellor of Washington DC’s public schools Michelle Rhee; an unknown and apparently unaffiliated “education researcher” named Krishanu Sengupta; and Sandy Kress, the “key architect of No Child Left Behind [under the presidential leadership of George W. Bush] who later became a lobbyist for Pearson, the testing company” (see, for example, here).

Authors of this paper review the economic and education research (although if you look through the references the strong majority of pieces come from economics research, which makes sense as this is an economically driven venture) to identify characteristics that typify enterprises that are efficient. More specifically, the authors use the principles of x-efficiency set out in the work of the highly respected Henry Levin that require efficient organizations, in this case as (perhaps inappropriately) applied to schools, to have: 1) Clear objective outcomes with measurable outcomes; 2) Incentives that are linked to success on the objective function; 3) Efficient access to useful information for decisions; 4) Adaptability to meet changing conditions; and 5) Use of the most productive technology consistent with cost constraints.

The authors also advance another series of premises, as related to this view of x-efficiency and its application to education/schools in Texas: (1) that “if Texas is committed to diffusing knowledge efficiently, as mandated by the state constitution, it should ensure that the system for putting effective teachers in classrooms and effective materials in the hands of teachers and students is characterized by the principles that undergird an efficient enterprise, such as those of x-efficiency;” (2) this system must include value-added measurement systems (i.e., VAMs), as deemed throughout this paper as not only constitutional but also rational and in support of x-efficiency; (3) given “rational policies for teacher training, certification, evaluation, compensation, and dismissal are key to an efficient education system;” (4) “the extent to which teacher education programs prepare their teachers to achieve this goal should [also] be [an] important factor;”  (5) “teacher evaluation systems [should also] be properly linked to incentives…[because]…in x-efficient enterprises, incentives are linked to success in the objective function of the organization;” (6) which is contradictory with current, less x-efficient teacher compensation systems that link incentives to time on the job, or tenure, rather than to “the success of the organization’s function; (6), in the end, “x-efficient organizations have efficient access to useful information for decisions, and by not linking teacher evaluations to student achievement, [education] systems [such as the one in Texas will] fail to provide the necessary information to improve or dismiss teachers.”

The two districts highlighted as being most x-efficient in Texas, and in this report include, to no surprise: “Houston [which] adds a value-added system to reward teachers, with student performance data counting for half of a teacher’s overall rating. HISD compares students’ academic growth year to year, under a commonly used system called EVAAS.” We’ve discussed not only this system but also its use in Houston often on this blog (see, for example, here, here, and here). Teachers in Houston who consistently perform poorly can be fired for “insufficient student academic growth as reflected by value added scores…In 2009, before EVAAS became a factor in terminations, 36 of 12,000 teachers were fired for performance reasons, or .3%, a number so low the Superintendent [Terry Grier] himself called the dismissal system into question. From 2004-2009, the district
fired or did not renew 365 teachers, 140 for “performance reasons,” including poor discipline management, excessive absences, and a lack of student progress. In 2011, 221 teacher contracts were not renewed, multiple for “significant lack of student progress attributable to the educator,” as well as “insufficient student academic growth reflected by [SAS EVAAS] value-added scores….In the 2011-12 school year, 54% of the district’s low-performing teachers were dismissed.” That’s “progress,” right?!?

Anyhow, for those of you who have not heard, this same (controversial) Superintendent, who pushed this system throughout his district is retiring (see, for example, here).

The other district of (dis)honorable mention was Dallas Independent School district; it also uses a VAM called the Classroom Effectiveness Index (CIE), although I know less about this system as I have never examined or researched it myself, nor have I read really anything about it. But in 2012, the district’s Board “decided not to renew 259 contracts due to poor performance, five times more than the previous year.” The “progress” such x-efficiency brings…

What is still worrisome to the authors, though, is that “[w]hile some districts appear to be increasing their efforts to eliminate ineffective teachers, the percentage of teachers dismissed for any reason, let alone poor performance, remains well under one percent in the state’s largest districts.” Related, and I preface this one noting that this next argument is one of the most over-cited and hyper-utilized by organizations backing “initiatives” or “reforms” such as these, that this “falls well below the five to eight percent that Hanushek calculates would elevate achievement to internationally competitive levels” “Calculations by Eric Hanushek of Stanford University show that removing the bottom five percent of teachers in the United States and replacing them with teachers of average effectiveness would raise student achievement in the U.S. 0.4 standard deviations, to the level of student achievement in Canada. Replacing the bottom eight percent would raise student achievement to the level of Finland, a top performing country on international assessments.” As Linda Darling-Hammond, also of Stanford would argue, we cannot simply “fire our way to Finland.” Sorry Eric! But this is based on econometric predictions, and no evidence exists whatsoever that this is in fact a valid inference. Nontheless, it is cited over and over again by the same/similar folks (such as the authors of this piece) to justify their currently trendy educational reforms.

The major point here, though, is that “if Texas wanted to remove (or improve) the bottom five to eight percent of its teachers, the current evaluation system would not be able to identify them;” hence, the state desperately needs a VAM-based system to do this. Again,  no research to counter this or really any claim is included in this piece; only the primarily economics-based literatures were selected in support.

In the end, though, the authors conclude that “While the Texas Constitution has established a clear objective function for the state school system and assessments are in place to measure the outcome, it does not appear that the Texas education system shares the other four characteristics of x-efficient enterprises as identified by Levin. Given the constitutional mandate for efficiency and the difficult economic climate, it may be a good time for the state to remedy this situation…[Likewise] the adversity and incentives may now be in place for Texas to focus on improving the x-efficiency of its school system.”

As I know and very much respect Henry Levin (see, for example, an interview I conducted with him a few years ago, with the shorter version here and the longer version here), I’d be curious to know what his response might be to the authors’ use of his x-efficiency framework to frame such neo-conservative (and again trendy) initiatives and reforms. Perhaps I will email him…