The Obama Administration’s (Smoke and Mirrors) Calls for “Less Testing”

For those of you who have not yet heard, last weekend the Obama Administration released a new “Testing Action Plan” in which the administration calls for a “decreased,” “curbed,” “reversed,” less “obsessed,” etc. emphasis on standardized testing for the nation. The plan, headlined as such, has hit the proverbial “front pages” of many news (and other) outlets since. See, for example, articles in The New York Times, The Huffington Post, The Atlantic, The Washington Post, CNN, US News & World Report, Education Week, and the like.

The gist of the “Testing Action Plan” is that student-level tests, when “[d]one poorly, in excess, or without clear purpose…take valuable time away from teaching and learning, draining creative approaches from our classrooms.” It is about time the federal government acknowledges this, officially, and kudos to them for “bear[ing] some of the responsibility for this” throughout the nation. However, they also assert that testing is, nevertheless, still essential as long as tests “take up the minimum necessary time, and reflect the expectation that students will be prepared for success.”

What is this “necessary time” of which they speak?

They set the testing limits for all states not to exceed 2%. More specifically, they, “recommend that states place a cap on the percentage of instructional time students spend taking required statewide standardized assessments to ensure that… [pause marker added] no child spends more than 2 percent of her classroom time taking these tests [emphasis added].” Notice the circumlocution here as per No Child Left Behind (NCLB) — that which substantively helped bring us to become such a test-crazed nation in the first place.

When I first heard this, though, the first thing I did was pull out my trusty calculator to determine what this would actually mean in practice. If students across the nation attend school 180 days (which is standard), and they spend approximately 5 of approximately 6 hours each of these 180 days in instruction (e.g., not including lunch), this would mean that students spend approximately 900 educative hours in school every year (i.e., 180 days x 5 hours/day). If we take 2% of 900, that yields an approximate number of actual testing hours (as “recommended” and possibly soon to be mandated by the feds, pending a legislative act of congress) equal to 18 hours per academic year. “Assess” for yourself whether you think that amount of testing time (i.e., 18 hours of just test taking per student across all public schools) is to reduce the nation’s current over-emphasis on testing, especially given this does not include the time it takes for what the feds also define as high-quality “test preparation strategies,” either.

Nonetheless, President Obama also directed the U.S. Department of Education to review its test-based policies to also address places where the feds may have contributed to the problem, but might also contribute to the (arguably token) solutions (i.e., by offering financial support to help states develop better and less burdensome tests, by offering “expertise” to help states reduce time spent on testing – see comment about the 2% limit prior). You can see their other strategies in their “Testing Action Plan.” Note, however, that it also clearly states within this plan that the feds will do this to help states still “meet [states’] policy objectives and requirements [as required] under [federal] law,” although the feds also state that they will become at least a bit more flexible on this end, as well.

In this regard, the feds express that they will provide more flexibility and support in terms of non-tested grades and subjects, and the extent to which states that wish to amend their NCLB flexibility waivers (e.g., in terms of evaluating out-of-tested-subject-area teachers). However, states will still be required to maintain their “teacher and leader evaluation and support systems that include [and rely upon] growth in student learning [emphasis added]” (e.g., by providing states with greater flexibility when determining how much weight to ascribe to teacher-level growth measures).

How clever of the feds to carry out such a smoke and mirrors explanation.

Another indicator of this is the fact that the 10 states that the feds highlight in their “Testing Action Plan” as the states in which educational leaders are helping to lead these federal initiatives are as follows: Delaware, Florida, New Mexico, New York, North Carolina, Massachusetts, Minnesota, Rhode Island, Tennessee, Washington DC. Seven of these 10 states (except for Delaware, Minnesota, and Rhode Island) are the 7 states about which I write blog posts most often, as these 7 states have the most draconian educational policies mandating high-stakes use of said tests across the nation. In addition, Massachusetts, New Mexico, and New York are the states leading the nation in terms of the national opt-out movement. This is not because these states are leading the way in focusing less on said tests.

In addition, all of this was also based (at least in part, see also here) on new survey results recently released by the Council of the Great City Schools, in which researchers set out to determine how much time is spent on testing. They found that across their (large) district members, the average time spent testing was “surprisingly low [?!?]” at 2.34%, which study authors calculate to be approximately 4.22 total days spent on just testing (i.e., around 21 hours if one assumes, again, an average day’s instructional time = 5 hours). Again, this does not include time spent preparing for tests, nor does it include other non-standardized tests (e.g., those that teachers develop and use to assess their students’ learning).

So, really, the feds did not decrease the amount of time spent testing really at all, they literally just rounded down, losing 34 hundredths of a whole. For more information about this survey research study, click here.

If these two indicators are not both indicators of something (else) gone awry within the feds’ “Testing Action Plan,” not to mention the hype surrounding it, I don’t know what are. I do know one thing for certain, though, that it is way too soon to call this announcement or the Obama administration’s “Testing Action Plan” a victory. Relief from testing is really not on the way (see, for example, here). Additional details are to be released in January.

New York’s Ahern V. King: Court-Ordered to Move Forward

In Ahern v. King, filed on April 16, 2014, the Syracuse Teachers Association (and its President Kevin Ahern, along with a set of teachers employed by the Syracuse School District) sued the state of New York (and its Commissioner of the New York State Department of Education John King, *who is Arne Duncan’s U.S. Secretary of Education Successor, along with others representing the state) regarding its teacher evaluation system.

They alleged that the state’s method of evaluating teachers is unfair to those who teach economically disadvantaged students. More specifically, and as per an article in Education Week to which I referred in a post a couple of weeks ago, they alleged that the state’s value-added model (which is actually a growth model) fails to “fully account for poverty in the student-growth formula used for evaluation, penalizing some teachers.” They also alleged that “the state imposed regulations without public comment.”

Well, the decision on the state’s motion to dismiss is in. The Court rejected the state’s attempt to dismiss this case; hence, the case will move forward. For the full report, see attached the official Decision and Order.

As per plaintiff’s charge that controlling for student demographic variables, within the Decision and Order document it is written that the growth model used by the state was developed, under contract, by the American Institutes for Research (AIR). In AIR’s official report/court exhibit, AIR acknowledged the need for controlling for students’ demographic variables, to moderate potential bias; hence, “if a student’s family was participating in any one of a number of economic assistance programs, that student would have been ‘flagged,’ and his or her growth would have been compared with students who were similarly ‘flagged’…If a student’s economic status was unknown, the student was treated as if he or she was not economically disadvantaged” (p. 7).

Nothing in the document was explicated, however, regarding whether the model output were indeed biased, or rather biased as per the individual plaintiffs charging that their individual value-added scores were biased as such.

What was explicated was that the defendants attempted to dismiss this case given a four-month statute of limitations. Plaintiffs, however, prevailed in that the state never made a “definitive decision” that was “readily ascertainable” to teachers on this matter (i.e., about controlling for students’ demographic factors within the model); hence, the statute of limitations of clock really never started taking time.

As per the Court, the state’s definition of “similar students” to whom other students are to be compared (as mentioned prior) was indeterminate until December of 2013. This was evidenced by the state’s Board of Regents’ June 2013 approval of an “enhanced list of characteristics used to define” such students, and this was evidenced by the state’s December 2013 publication of AIR’s technical report in which their methods were (finally) made public. Hence, the Court found that the plaintiffs’ claims fell within the four-month statute of limitations upon which the state was riding in order to dismiss, because the state had not “properly promulgated” its methodology for measuring student growth at that time (and arguably still).

So ultimately, the Court found that the defendants failed to establish their entitlement to dismiss the specific teachers on the plaintiffs side in this case, given a statute of limitation violation (that is also more fully explained in detail in the Decision and Order). The court denied the state of New York’s motion to dismiss, and this one will move forward in New York.

This is great news for those in New York, especially considering this state is one more than most others in which education “leaders” have attached high-stakes consequences to such teacher evaluation output. As per state law, teacher evaluation scores (as based in large part on value-added or growth in this particular case) can be used as “a significant factor for employment decisions including but not limited to, promotion, retention, tenure determination, termination, and supplemental compensation.”

Special Issue of “Educational Researcher” (Paper #3 of 9): Exploring VAMs’ Potentials

Recall that the peer-reviewed journal Educational Researcher (ER) – recently published a “Special Issue” including nine articles examining value-added measures (VAMs). I have reviewed the next of nine articles (#3 of 9) here, titled “Exploring the Potential of Value-Added Performance Measures to Affect the Quality of the Teacher Workforce” as authored by Dan Goldhaber – Professor at the University of Washington Bothell, Director of the National Center for Analysis of Longitudinal Data in Education Research (CALDER), and a Vice-President at the American Institutes of Research (AIR). AIR is one of our largest VAM consulting/contract firms, and Goldabher is, accordingly, perhaps one of the field’s most vocal proponents of VAMs, also self-described as an “advocate of using value-added measurements carefully to inform some high-stakes decisions” (see original reference here). Hence, it makes sense he writes about VAMs’ potentials herein.

Here’s what he has to add to the conversation, specifically about “the various mechanisms through which the use of value added might affect teacher quality and…what we know empirically about the potential of each mechanism” (p. 87).

Most importantly in this piece, and in my opinion, Goldhaber discusses the “[s]everal [which turns out to be two] studies that simulate the effects of using value-added estimates for high-stakes purposes [and] suggest there may be significant student achievement benefits” (p. 88). Here are the two sections in support of these benefits as defined and claimed:

  • “There is evidence that high value-added teachers are perceived to engage in better teaching practices, and they are valued by principals as reflected in formal evaluations (Harris, Ingle, & Rutledge, 2014)” (p. 88). Contrary to this claim as interpreted herein by Goldhaber, however, is that these authors actually found “that some principals give high value-added teachers low ratings” which implies that the opposite is also true (i.e., inconsistencies in ratings), and “that teacher value-added measures and informal principal evaluations are positively, but weakly [emphasis added], correlated.” This puts a different spin on both of the actual results derived via this study, as per Goldhaber’s interpretation (see the cited study here).
  • “Perhaps most importantly, value added is also associated with long-term schooling (e.g., high school graduation and college-going), labor market (e.g., earnings), and nonacademic outcomes (e.g., teen pregnancy) (Chetty, Friedman, & Rockoff, 2014)” (p. 88). As you all likely recall, this study is of much controversy (see prior posts on this study here, here, here, and here.)

Otherwise, Goldhaber explores “the various mechanisms through which the use of value added might [emphasis in the original] affect teacher quality and describe[s] what we know empirically about the potential of [emphasis added] each mechanism.” The word “might” (with or without emphases added) is notably used throughout this manuscript, as is the word “assumption” albeit less often, which leaves us with not much more than a clear impression that most of what is offered in this piece, is still conjecture.

I write this even though some of the other research cited in this piece is peripherally related, for example, given what we know from labor economics. “We know” that “teachers who believe they will be effective and rewarded for their effectiveness are more likely to see teaching as a desirable profession” (p. 89). But do we really know this? Little mention is made of our reality here, however, given the real and deleterious effects we witness, for example, as current teacher educators when we work with potential/future teachers who almost daily express serious concerns about joining a profession now with very little autonomy, not much respect, and a stark increase in draconian accountability measures that will be used to hold them accountable for that which they do, or do not do well. This also makes no mention of the prospective teachers who have now chosen not to enter teacher education, pre-profession, either, and for similar reasons. “On the other hand, the use of value-added performance measures might lead to positive changes in the perception of teachers, making teaching a more prestigious profession and hence leading more people to pursue a teaching career” (p. 89). Hmm…

Nonetheless, these conjectures are categorized into sections about how VAMs might help us to (1) change the supply of people who opt into pursuing a teaching career and who are selected into the labor market, (2) change the effectiveness of those currently teaching, and (3) change which teachers elect to, or are permitted to, stay in teaching. Unfortunately again, however, there’s not much else in terms of research-based statements (other than the two articles briefly mentioned in this manuscript, bulleted above) that Goldhaber “adds” in terms of “value” regarding the “Potential of Value-Added Performance Measures.”

I write this with some regret in that it would be fine with me if this thing actually worked, and more importantly, helped any of the three above desired outcomes come to fruition, or helped teachers improve their professional practice, professional selves, and the like. Indeed, in theory, this should work, but it doesn’t….yet. I write “yet” here with serious reservations about whether VAMs will ever satisfy that for which they have been tasked, largely via educational policies.

Related, and on this point we agree, “teacher pay incentives is one area that we know a good deal about, based on analysis of actual policy variation, and the results are not terribly promising…experiments generally show performance bonuses, a particular form of pay for performance, have no significant student achievement effects, whether the bonus is rewarded at the individual teacher level” (p. 89). We disagree, though, again on Goldhaber’s conjectures for the future, that “there are several reasons why it is premature to write off pay for performance entirely…” (p. 89; see also a prior post on this here as related to a study Goldhaber (overly) cites in support some of his latter claims).

In the end (which is actually near the beginning of the manuscript), Goldhaber notes that VAMs are “distinct” as compared to classroom observations, because they offer “an objective measure that does not rely on human interpretation of teacher practices, and by design, [they offer] a system in which teachers are evaluated relative to one another rather than relative to an absolute standard (i.e., it creates a distribution in which teachers can be ranked). It is also a more novel measure” (p. 88). As just stated, in theory, this should work, but it just doesn’t….yet.

In the actual end (actually in terms of Goldhaber’s conclusions) he suggests we “Take a Leap of Faith?” (p. 90). I, for one, am not jumping.

*****

If interested, see the Review of Article #1 – the introduction to the special issue here and the Review of Article #2 – on VAMs’ measurement errors, issues with retroactive revisions, and (more) problems with using standardized tests in VAMs here.

Article #3 Reference: Goldhaber, D. (2015). Exploring the potential of value-added performance measures to affect the quality of the teacher workforce. Educational Researcher, 44(2), 87-95. doi:10.3102/0013189X15574905

“Value-Less” Value-Added Data

Peter Greene, a veteran teacher of English in Pennsylvania who works as a teacher in a state using the Pennsylvania version of the Education Value-Added Assessment System (EVAAS), wrote last week (October 5, 2015) in his Curmudgucation blog about his “Value-Less Data.” I thought it very important to share with you all, as he does a great job deconstructing one of the most widespread claims being made, and most lacking research support, about using the data derived via value-added models (VAMs) to inform and improve what teachers do in their classrooms.

Greene sententiously critiques this claim, writing:

It’s autumn in Pennsylvania, which means it’s time to look at the rich data to be gleaned from our Big Standardized Test (called PSSA for grades 3-8, and Keystone Exams at the high school level).

We love us some value added data crunching in PA (our version is called PVAAS, an early version of the value-added baloney model). This is a model that promises far more than it can deliver, but it also makes up a sizeable chunk of our school evaluation model, which in turn is part of our teacher evaluation model.

Of course the data crunching and collecting is supposed to have many valuable benefits, not the least of which is unleashing a pack of rich and robust data hounds who will chase the wild beast of low student achievement up the tree of instructional re-alignment. Like every other state, we have been promised that the tests will have classroom teachers swimming in a vast vault of data, like Scrooge McDuck on a gold bullion bender. So this morning I set out early to the states Big Data Portal to see what riches the system could reveal.

Here’s what I can learn from looking at the rich data.

* the raw scores of each student
* how many students fell into each of the achievement subgroups (test scores broken down by 20 point percentile slices)
* if each of the five percentile slices was generally above, below, or at its growth target

Annnnd that’s about it. I can sift through some of that data for a few other features.

For instance, PVAAS can, in a Minority Report sort of twist, predict what each student should get as a score based on– well, I’ve been trying for six years to find someone who can explain this to me, and still nothing. But every student has his or her own personal alternate universe score. If the student beats that score, they have shown growth. If they don’t, they have not.

The state’s site will actually tell me what each student’s alternate universe score was, side by side with their actual score. This is kind of an amazing twist– you might think this data set would be useful for determining how well the state’s predictive legerdemain actually works. Or maybe a discrepancy might be a signal that something is up with the student. But no — all discrepancies between predicted and actual scores are either blamed on or credited to the teacher.

I can use that same magical power to draw a big target on the backs of certain students. I can generate a list of students expected to fall within certain score ranges and throw them directly into the extra test prep focused remediation tank. Although since I’m giving them the instruction based on projected scores from a test they haven’t taken yet, maybe I should call it premediation.

Of course, either remediation or premediation would be easier to develop if I knew exactly what the problem was.

But the website gives only raw scores. I don’t know what “modules” or sections of the test the student did poorly on. We’ve got a principal working on getting us that breakdown, but as classroom teachers we don’t get to see it. Hell, as classroom teachers, we are not allowed to see the questions, and if we do see them, we are forbidden to talk about them, report on them, or use them in any way. (Confession: I have peeked, and many of the questions absolutely suck as measures of anything).

Bottom line– we have no idea what exactly our students messed up to get a low score on the test. In fact, we have no idea what they messed up generally.

So that’s my rich data. A test grade comes back, but I can’t see the test, or the questions, or the actual items that the student got wrong.

The website is loaded with bells and whistles and flash-dependent functions along with instructional videos that seem to assume that the site will be used by nine-year-olds, combining instructions that should be unnecessary (how to use a color-coding key to read a pie chart) to explanations of “analysis” that isn’t (by looking at how many students have scored below basic, we can determine how many students have scored below basic).

I wish some of the reformsters who believe that BS [i.e., not “basic skills” but the “other” BS] Testing gets us rich data that can drive and focus instruction would just get in there and take a look at this, because they would just weep. No value is being added, but lots of time and money is being wasted.

Valerie Strauss also covered Greene’s post in her Answer Sheet Blog in The Washington Post here, in case you’re interested in seeing her take on this as well: “Why the ‘rich’ student data we get from testing is actually worthless.”

The Student Success Act: Passed in House of Representatives

As many of you may know, there is broad, bipartisan agreement that No Child Left Behind (NCLB) needs to be overhauled, if not entirely replaced. For those of you who have not yet heard, though, The Student Success Act (H.R. 5) is to help do this, by reducing “the federal footprint and restor[ing] local control, while empowering parents and education leaders to hold schools accountable for effectively teaching students” within their states.

This represents the federal government’s most serious attempt to overhaul NCLB yet, since it was last rewritten in 2001. Although this Act was initially pulled from the floor losing GOP support last spring, near the end of this past summer the House passed the Act (see more information here).

As per another post here, this has all come about given “[t]he federal government’s involvement in local K-12 schools is at an all-time high.” For example, in 2013, the federal government spent nearly $35 billion on K-12 education, under the condition that states and districts, for example, adopt growth or value-added models (VAMs) to hold their teachers accountable for that which they do (or do not do well).

Hence, The Student Success Act is to primarily:

  • Replace the current national accountability scheme based on high stakes tests with state-led accountability systems, returning responsibility for measuring student and school performance to states and school districts.
  • Protect state and local autonomy over decisions in the classroom by preventing the [U.S.] Secretary of Education from coercing states into adopting Common Core or any other common standards or assessments, as well as reining in the secretary’s regulatory authority.
  • Strengthen existing efforts to improve student performance among targeted student populations, including English learners and homeless children.
  • Ensure parents continue to have the information they need to hold local schools accountable.

Kudos go out to U.S. Secretary Arne Duncan for pushing the federal government’s role, and more specifically their accountability-based “initiatives,” so far (e.g., via Race to the Top and his NCLB Waiver program). It seems that this finally pushed the House (and Senate) to at least begin to put a stop to their stronger accountability adventures.

Kudos also go out to President Obama for not keeping U.S. Secretary Arne Duncan in check, and also purportedly being “unwilling to work with Congress to change the law” and, instead, implementing the aforementioned NCLB “waiver process that replaces some of the law’s more onerous requirements with new mandates dictated by [Duncan] – compounding the confusion and frustration shared by states and schools.”

This could be a win for educators, and states, across the country, especially as related to our shared interests on this blog: the use of VAMs to hold teachers accountable as per the efforts of most of our far-removed VAM modelers. This is not a win for those who oppose charter schools, however, as within this Act an imperative also exists to “Empower parents with more school choice options by continuing support for magnet schools and expand[ing] charter school opportunities, as well as allow[ing] Title I funds to follow low-income children to the traditional public or charter school of the parent’s choice.”

House Education and the Workforce Committee Chairman John Kline (R-MN) and Early Childhood, Elementary, and Secondary Education Subcommittee Chairman Todd Rokita (R-IN) are the two that introduced this act to the House, under the auspices that: “Effective education reform cannot come from the top down – it must be encouraged from the bottom up. Washington bureaucrats will never have the same personal understanding of the diverse needs of students as the teachers, administrators, and parents who spend time with them every day. The Student Success Act (H.R. 5) [therefore] reduces the federal footprint and places control back in the hands of parents and the state and local education leaders who know their children best” (see also more here).

New Mexico’s Teacher Evaluation Lawsuit: Final Day Five

The final day in the New Mexico’s Teacher Evaluation Lawsuit in Santa Fe was this past Thursday, October 8th. Unfortunately I could not attend day five (or day four) due to a prior travel engagement, but here are two articles that highlight the events of the final day (for now) in court.

The first comes from The Santa Fe New Mexican. This article captures some of the closing arguments on the plaintiffs’ side, primarily in terms of (1) the state model “incorrectly rating nearly 10 percent of the state’s 20,000 teachers,” (2) the state using flawed data, leading to incorrect results, and (3) attempts to convince Judge Thompson to suspend the provision that these teachers — who purportedly post low value-added scores — be put on “growth plans” that lead to termination. As per one of the plaintiffs’ two lawyers: “The mere issue of a notice that you are minimally effective or ineffective puts each district in a position where they can terminate these employees…These are the [high-stakes consequences and] provisions that need to be removed” from New Mexico’s teacher evaluation model.

The main attorney on the defendant’s side avoided talking about #1 and #2 above, and instead focused on #3, “that no teacher had reported losing a job, license or promotion as a result of the teacher evaluations”…yet. In other words, so the defense argued, the system in New Mexico has not (yet) caused “irreparable harm,” although how one defines that is certainly also up for debate (e.g., good teachers leaving teaching because of the system in and of itself might be considered “irreparable harm”).

This defendant’s attorney also focused on what it might mean for the state of New Mexico to lose its No Child Left Behind (NCLB) waiver, which by federal mandate requires New Mexico (and all other states) to have a teacher evaluation system in place as based on student test scores. However, the defendant’s key witness on this topic – Matthew Montaño, Educator Quality Division, New Mexico Public Education Department (PED) – evidently overstated claims about these purported threats while on the stand, after which an expert from Washington DC was phoned in to testify regarding the truth about New Mexico’s waivers; the truth being that approximately a dozen states are thus far not in compliance, and only the state of Washington has thus far lost its waiver. Washington, however, did not lose any federal funding as a result.

The plaintiffs’ attorneys other primary goal was to convince the Judge to file at least a partial injunction now, and then come back to court in April, after both sides have had an equal opportunity to examine the state’s data as per the state’s actual model and model output. The goal here would be to actually determine how the model is functioning and whether it is actually functioning as those on the side of the defense testified (or not, as some of information about the model’s functionality was oddly unknown to the state).

The second article coming from The Albuquerque Journal highlights more details regarding the NCLB waiver, as well as the defendant’s claims regarding “irreparable harm,” or rather that “the plaintiffs cannot show that they were harmed by the teacher evaluation system through loss of a job or reduction of salary…It has to not be theoretical (harm) to these plaintiffs.”

Judge Thomson is expected to present his findings/ruling by early November.

Teacher Evaluation Systems “At Issue” Across U.S. Courts

As you have likely noticed, lawsuits continue to emerge whereby (typically) state’s “new and improved” teacher evaluation systems, based in part on value-added model (VAM) output, are at legal issue.

If you have lost track, Education Week just published an article with all lawsuits currently filed, underway, or completed across the nation. In sum, there have been 14 cases filed thus far across seven states: Florida n=2, Louisiana n=1, Nevada n=1, New Mexico n=3, New York n=3, Tennessee n=3, and Texas n=1.

While I won’t try to recreate Education Week‘s user-friendly chart for this particular post, I do suggest that you click here to read more about each lawsuit, as well as the charges or alleged allegations per lawsuit, who has filed each one, and each lawsuit’s current status.

New Mexico’s Teacher Evaluation Lawsuit: Day Four

Unfortunately I could not attend “Day Four” of the New Mexico’s Teacher Evaluation Lawsuit yesterday (Tuesday, October 6th) in Santa Fe, but for those of you following, tomorrow (Thursday, October 8th) closing remarks are scheduled. Originally, only two days of testimony were scheduled to secure an injunction stopping the state’s teacher evaluation system, but the judge, stating that “the material under consideration is complex” will end with a five-day hearing on Thursday.

But as for what happened in court yesterday, here is an article with a short video (compliments of local news station KOB4) capturing at least some of the day:

http://www.kob.com/article/stories/s3926421.shtml#.VhUDuaTWpUc

Here is another article capturing another reporter’s views of the hearing, as per The Albuquerque Journal, who notes that “Tuesday’s testimony highlighted the state witnesses’ stance that the new [evaluating system is] a big improvement over the previous approach, which rated 99 percent of teachers effective.”

Perhaps most important to the hearing was the testimony of Matthew Montaño, Educator Quality Division, New Mexico Public Education Department (PED), who “testified at length about the creation of the new teacher evaluation system, which PED instituted administratively in 2012.”

Most interesting to come Thursday will be his cross-examination, for reasons impossible to list in both length and his history as per his role and position within the state.

U.S. Secretary Arne Duncan Resigning in December

In case you have not yet heard, U.S. “Education Secretary Arne Duncan [is] stepping down” from his President Obama-appointed post. As per an article in CNN: “President Barack Obama praised Arne Duncan’s service as secretary of education on Friday [October 2, 2015], hours after Duncan said he would step down in December. “He’s done more to bring our education system — sometimes kicking and screaming — into the 21st century more than anybody else…America is going to be better off for what he’s done.”

Is that a fact…

Obama continued that Duncan’s record is one “that I truly believe that no other education secretary can match…Arne bleeds this stuff. He cares so much about our kids. And he’s been so passionate about this work.”

See also other another article on Duncan’s resignation in The New York Times in which Duncan was positioned as “the subject of criticism from both parties, angering Democrats by challenging teachers’ unions and infuriating Republicans by promoting national academic standards.” See also other another article in The Washington Post.

Obama has selected Deputy Secretary of Education John B. King, Jr. to replace him. As per The Washington Post, “King is a Brooklyn native who often credits teachers with guiding him toward a successful path after he was orphaned at age 12. A former charter school leader in Boston and New York, he joined the Education Department in January after a turbulent tenure as commissioner of education for the state of New York. In that role, he was a key architect of new teacher evaluations tied to test scores and played a key role in pushing New York to adopt new tests aligned to the Common Core State Standards years before other states did the same.

The Network for Public Education (NPE) announced their reaction to both Arne Duncan’s resignation and John King as Duncan’s replacement. “The Network for Public Education assigned an “F” to the selection of John King as a replacement to Arne Duncan,” who according to Diane Ravitch (among others) did more harm than good during his time in this position. Perhaps Obama’s remarks were not entirely on base.

Under Duncan’s (and Obama’s) watch, “The policies of the US Department of Education have inflicted immeasurable harm on American public education. The blind faith in standardized testing as the most meaningful measure of students, teachers, principals, and schools has distorted the true meaning of education and demoralized educators. Punitive policies have created teacher shortages across the nation, as well as a precipitous decline in the number of people preparing to become teachers. The Race to the Top preference for privately managed charter schools over public schools has encouraged privatization of a vitally important public responsibility.”

Regarding the appointment of King, The Executive Director of the NPE Fund and former New York State principal, Carol Burris, noted that, “John King’s tenure in New York State was a disaster [as] he alienated teachers, parents, and principals. He was the first Commissioner of Education in New York to receive a vote of ‘no confidence’ by the New York State Teachers Union, and referred to parents as ‘a special interest group.’ Not only was his implementation of the Common Core standards rushed and chaotic, but the horrific state exams that were given during his tenure resulted in over 200,000 students opting out of the exam last spring. It is difficult to imagine a worse choice to run the US Department of Education.”

To read more from the NPE click here.

 

New Mexico’s Teacher Evaluation Lawsuit: Day Three

As per my last post about the second day in court in Santa Fe, New Mexico, I was purposefully ambiguous about one of the two articles written in The Santa Fe New Mexican regarding my testimony. The first article titled, “Experts differ on test-based evaluations at NM hearing,” I felt fairly captured the events of the second day in court, but the second article titled, “Professor’s testimony: Teacher eval system ‘not ready for prime time,” did not. But that is about all that I said, being purposefully ambiguous for two reasons that I can now (more or less) share.

The first reason was that the author of this article (in my opinion) unfairly captured my four hours of testimony, by primarily positioning me as an “expert witness” who did not know really anything about the New Mexico teacher evaluation model. Just to be clear, during my testimony I explained that I had not analyzed the actual data from New Mexico. I also argued (but this was unfortunately not highlighted in this particular article), that I could not find anything about the New Mexico model’s output (e.g., indicators of reliability or consistency in terms of teachers’ rankings over time, indicators of validity as per, for example, whether the state’s value-added output correlated, or not, with the other “multiple measures” used in New Mexico’s teacher evaluation system), pretty much anywhere given my efforts.

I testified that this, in and of itself, was problematic, given much of what should have been accessible and retrievable, for example, via the website of New Mexico’s Public Education Department (PED), via the internet in general (e.g., about developer Pete Goldschmidt’s value-added model), and via my attempts through contacts throughout New Mexico to gather the particular information for which I was looking, was not available to the public via any of these means. This, in and of itself, as I also testified, was problematic, especially given New Mexico Governor Martinez’s claims about “Ensuring Transparency and Ethics in our Government.”

Also not explained in this article was that I also analyzed New Mexico’s model using the documents made available via the exhibits submitted for this case, by both the plaintiffs and the defense. What I did not do, however, was conduct any direct analyses of any of the state’s actual data, yet, and for a variety of reasons (e.g., these data are a part of a lawsuit, and I am on the “wrong side,” there are standard data procedural and confidentiality agreements that take considerable time to secure, there is a certain timeline in place). This was also not explained.

I wrote a letter to the editor of The Santa Fe New Mexican in response, but not to my surprise, I never received a response, nor was this letter published. The goal of this letter, though, was to give those in Santa Fe, but also others throughout the state of New Mexico, not only a more comprehensive and accurate assessment of my testimony, but more a note about how the taxpayers of New Mexico have a right to know much, much more about their state’s teacher evaluation model, as well as the model’s output (i.e., to see how the model is actually functioning as claimed).

As I also testified, one of my goals (and I believe this to also be a goal of the plaintiffs’ writ large) is to (a) get the state of New Mexico to release the data to an external evaluator to evaluate the models’ functionality (this person certainly does not have to be me) or (2) release the data to the “expert witnesses” on both the plaintiffs’ side (i.e., me) and the defendants’ side (i.e., Thomas Kane of Harvard), so that we can both examine these data independently, and then come back to the court with our findings and overall assessments regarding the model’s overall strengths and weakness, as per the actual data. 

This brings me to my second reason for being purposefully ambiguous throughout my prior post: The defendants’ “expert witness” Thomas Kane of Harvard. I wanted to wait to be 100% certain that Thomas Kane had not examined any state data before I commented that he would eventually be critiqued for not having done so. But having witnessed his 5.5 hours of testimony yesterday, let me just say it was an interesting 5.5 hours, that (in my opinion) did not work in the defendants’ favor.

Kane, like me, had not examined any of New Mexico’s actual data. This was surprising in the sense that he was actually retained by the state, and his lawyers could have much more likely, and literally, handed him the state’s dataset as their “expert witness,” likely regardless of the aforementioned procedures and security measures (but perhaps not the timeline). Also surprising, though, was that Kane had clearly not examined any of the exhibits submitted for this case, by both the plaintiffs and the defense, either. He was essentially in attendance, on behalf of the state as their “expert witness,” to “only speak to [teacher] evaluations in general.” As per an article in The Albuquerque Journal, he “stressed that numerous studies [emphasis added] show that teachers make a big impact on student success,” expressly contradicting the American Statistical Association (ASA) on the stand, while also (only) referencing (non-representative) studies of primarily his econ-friends (e.g. Raj Chetty, Eric Hanushek, Doug Staiger) and studies of his own (e.g, as per his Measures of Effective Teaching (MET) studies). For more information in general, though, see the full articles in both The Albuquerque Journal and The Santa Fe New Mexican.

As also highlighted in both of these articles is that as one of the defendants’ witnesses, the Superintendent of the Roswell Independent School District testified in favor of the state’s model. As as highlighted in the The Santa Fe New Mexican, he testified that he viewed the new system as “an improvement over past practices [namely the Adequate Yearly Progress (AYP) measures written into No Child Left Behind (NCLB)] because [he believed the new system gave] him more information about his teachers.” He did not, however, testify as to how value-added output gave him this information, as the information about which he spoke was primarily about the observational system now in place, and the conversations surrounding such data, although districts also used similar observational systems prior.

He also testified that, as per the article in The Santa Fe New Mexican, “he had renewed licenses for teachers who received low ratings, despite the state Public Education Department’s protocol…[for one reason being]…he has too few teachers and can’t afford to lose any” due to this system. Related, as per the article in The Albuquerque Journal he continued testifying that “I am down teachers. I don’t need teachers, number 1, quitting over this and, number 2, I am not going to be firing teachers over this.” His district of about 600 teachers currently has approximately 30 open teaching positions, “an unusually high number;” hence, “he would rather work with his current staff than bring on substitutes” in compliance. So while he testified on behalf of the state, he also testified he was not necessarily in favor of the consequences being attached to the state’s teacher evaluation output, even if as currently being positioned by the defense as “low-stakes.”

It was more than an interesting day in court indeed. Stay tuned for Day Four (although due to prior conflicts, I will not be in attendance). I will still report on it, though, as best I can.