“Ineffective,” Veteran, Primary Grade Teacher in Tennessee Resigns

As per a recent article in The Tennessean, it seems yet another teacher has resigned, this time from the 1st grade – a grade in which teacher-level value-added normally does not “count.” This teacher, a 15-year career 1st grade teacher, was recently categorized as “ineffective” in terms of “adding value” to her students’ learning and achievement, as her district added a new test to start holding primary grade teachers accountable for their value-added as well. To read her full letter of resignation and the conditions driving her decision, click here (http://www.tennessean.com/story/news/local/sumner/2014/06/03/defines-ineffective-teacher/9918539/)

Thirty-five percent of her evaluation score was based on student growth or value-added as determined by the Tennessee Value-Added Assessment System (TVAAS), often called outside the state of Tennessee (where it was originally developed) the Education Value-Added Assessment System (EVAAS). Both of these systems should be of increasing familiarity to readers/followers of this blog.

But given a different test recently introduced to help evaluate more teachers like her, again in the primary grades for which no other state-level tests exist (like in grades 3-8), just this year she “received a growth score of 1, [after which she] was placed on a list of ineffective teachers needing additional coaching.” Ironically, the person to serve as her mentor, to help her become better than an “ineffective teacher,” was her own student teacher from a few years prior. It seems her new teacher mentor was not able to increase her former-mentor’s effectiveness in due time, however.

But here’s the real issue: In this case, and exponentially growing numbers of cases like this across the country, the district decided to use a national versus state test (i.e., the SAT 10) which can (but should not) be used to test students in kindergarten and 1st grades, and then more importantly used to attribute growth on these tests over time to their teachers, again, to include more teachers in these evaluation systems.

In just this case, this test’s data were run through the TVAAS system – a system that has been evidenced elsewhere in the research to label teachers ineffective or effective despite contradictory data, sometimes 30% to 50% of the time. In other cases, and in all fairness, other systems do not seem to be faring much better. Regardless, when the foolish add a test that is completely different than the tests being (erroneously) used elsewhere for other teachers, and then foolishly assume that the tests should just work, is foolish, to put it lightly, albeit so unfortunately faddish.

Stanford’s Shavelson & Boulder’s Domingue on VAMs in Higher Ed

A few months ago, well over 10,000 educational researchers/academics from throughout the world attended the Annual Conference of the American Educational Research Association (AERA) in Philadelphia, Pennsylvania, during which many of the said researchers/academics also presented their newest educational research and findings.

One presentation I did not get to attend, but from which I fortunately received the PowerPoint Slides, was a presentation titled “Measuring College Value-Added: A Delicate Instrument” presented by Stanford University’s Richard Shavelson and University of Colorado – Boulder’s Benjamin Domingue.

As summarized from their presentation, the motivation for measuring value added in higher education, while similar to what is happening in America’s K-12 public schools (i.e., to theoretically measure college program quality and the “value” various higher education programs “add” to student learning), is also very different. Whereas in K-12 schools, for what they are worth, there are standardized tests that can be used to at least attempt to measure value-added. Most testing in higher education is course based, in many ways fortunately—in that it is closely linked to content covered by the professor, and in other ways not—in that it is typically unreliable and narrow. Hence, while tests used in higher education (for the most part) are often idiosyncratic and non-standardized, they are often more instructionally sensitive and relevant than the large-scale standardized tests found across America’s public schools (i.e., the tests used to measure value-added). In short, while both course-based and external types of tests are relevant in higher education, depending on their uses, they do yield different types of information.

On this note, see also Slide 7 of the PowerPoint Slides to examine the key and problematic, implausible assumptions with which people must agree should they try to do value-added research for said purposes in higher ed. This slide is interesting in and of itself.

In this study, however, Shavelson and Domingue gained access to a unique data set for modeling higher-ed value added. Colombia governmentally mandates estimation of value added. To this end it has created a unique assessment system where high-school seniors take the high-school leaving/college matriculation examination and all college seniors take the same test upon college leaving. Their sample included over 64,000 students and included 168 higher education institutions and 19 different reference groups by program areas including engineering, law, and education. In addition, a sample of college seniors in Colombia participated in the Organisation for Economic Co-operation and Development’s [OECD] Assessment of Higher Education Learning Outcomes [AHELO] generic skills assessment that all college graduates in Columbia also take).

Their findings? Even with Colombia’s unique assessment system, VAM is a delicate instrument. There are still major conceptual, methodological, and statistical issues involved when measuring value added in higher education. The value-added estimates showed about 5-15 percent variation among colleges (depending on the model), which is not unlike what has been reported for colleges. Consequently, the magnitude of variation among institutions left room for making descriptive, albeit non-causal distinctions. Moreover it provided an opportunity to compare like-situated institutions in an attempt to understand “what works” as the basis for hypotheses about changes.

Teachers’ “Actual” Impacts on Tests: A Reality that Defies VAMs

Drs. Gene Glass and David Berliner (both Regents’ Professors Emeriti from Arizona State University) recently published a book with 15 doctoral students titled the 50 Myths & Lies that Threaten America’s Public Schools (published by Teachers College Press). In one of its chapters, as it pertains to VAMs, they deconstruct the myth that “Teachers are the most important thing in the world (so we should fire them if their kids’ scores don’t go up.”

Glass, recently highlighted this chapter on his Education in Two Worlds blog, and I’ve included his highlights from this entry for you all here. Glass writes:

Myth 9. Teachers are the most important influence in a child’s education.

The full statement of Myth 9 might take the following form: “Dear Teachers, you are so overwhelmingly important in the education of our children, you are the be-all and the end-all, the Alpha and the Omega, that when the children aren’t learning, it has to be your fault, that’s why we are going to fire you if the test scores don’t go up.”

As obvious as it is to note the importance of good teachers, research makes it clear that teachers are not the most important influence in a child’s education. Most research shows that less than 30% of a student’s academic success in school is attributable to schools, and teachers are only a part of that overall school effect, perhaps not even the most important part. Student achievement is most strongly associated with socioeconomic status of the child’s family. Outside-of-school factors having nothing to do with teacher ability appear to have at least twice the weight in predicting student achievement as inside-of-school factors. Schools can’t supply all of what society fails to give children.

Politicians and education reformers – the Value-Added Measurement group, let’s call them – argue that holding teachers accountable for student success is the best way to improve education. The mythological importance of teachers in determining student achievement is then promoted, as policymakers strive to show that what they are doing is best for children, namely, holding teachers accountable for student success. This illusion of “doing something really important,” even if it is not likely to cause the desired changes, lets many politicians and citizens close their eyes to the larger social and economic problems that limit what schools can achieve.

The federal government through “Race to the Top” has forced many states to adopt programs that tie large portions of teachers’ evaluations to student achievement on standardized tests – the system known as value added measurement, or VAM. It would be reasonable, if we were sure that teachers were the most important factor in determining student achievement, to promote policies holding them accountable for what students learn. But accountability policies built on this myth are a hoax because it is assumed that teachers have more control over student achievement than they actually do. Teachers cannot change the conditions of students’ lives outside of school, and it is those conditions that account for much of the difference in student achievement. In addition, teachers are often among the most powerless people in the school when it comes to making decisions that affect student achievement.

As a result, policies flowing from this myth of the all-important teacher put teachers in an untenable position. They are asked to overcome many problems outside of their control, and this can lead to devastating consequences for both students and teachers. As the pressure increases on teachers (and their administrators) to improve student performance, so does their temptation to game and to cheat the assessment system to show improvement. Cheating scandals in Atlanta, Washington, DC, Denver, and elsewhere point not just to the possibility of this regrettable situation, but to its reality (Nichols and Berliner, 2007; Ravitch, 2010).

The policies growing out of the myth of the all-powerful teacher can also result in lower teacher morale and push talented people away from considering a career in teaching. Working in an environment where you are evaluated on outcomes that are largely outside of your control is a recipe for stress, discouragement, and exit.

What appears on the surface to be a song of praise for teachers – “You are the most important thing in all the world” – turns out in the end to be an attempt to deny teachers due process and to bust unions.

Texas Hangin’ its Hat on its New VAM System

A fellow blogger, James Hamric and author of Hammy’s Education Reform Blog, emailed a few weeks ago connecting me with a recent post he wrote about teacher evaluations in Texas, titling them and his blog post “The good, the bad and the ridiculous.”

It seems that the Texas Education Agency (TEA), which serves a similar role in Texas as a state department of education elsewhere, recently posted details about the state’s new Teacher Evaluation and Support System (TESS) that the state submitted to the U.S. Department of Education to satisfy the condition’s of its No Child Left Behind (NCLB) waiver, excusing Texas from not meeting NCLB’s prior goal that all students in the state (and all other states) would be 100% proficient in mathematics and reading/language arts by 2014.

While “80% of TESS will be rubric based evaluations consisting of formal observations, self assessment and professional development across six domains…The remaining 20% of TESS ‘will be reflected in a student growth measure at the individual teacher level that will include a value-add score based on student growth as measured by state assessments.’ These value added measures (VAMs) will only apply to approximately one quarter of the teachers [however, and as is the case more or less throughout the country] – those [who] teach testable subjects/grades. For all the other teachers, local districts will have flexibility for the remaining 20% of the evaluation score.” This “flexibility” will include options that include student learning objectives (SLOs), portfolios or district-level pre- and post-tests.

Hamric then goes onto review his concerns about the VAM-based component. While we have highlighted these issues and concerns many times prior on this blog, I do recommend reading these as summarized by others other than us who write here in this blog. This may just help to saturate our minds, and also prepare them to defend ourselves against the “good, bad, and the ridiculous” and perhaps work towards better systems of teacher evaluation, as is really the goal. Click here, again, to read this post in full.

Related, Hamric concludes with the following, “the vast majority of educators want constructive feedback, almost to a fault. As long as the administrator is well trained and qualified, a rubric based evaluation should be sufficient to assess the effectiveness of a teacher. While the mathematical validity of value added models are accepted in more economic and concrete realms, they should not be even a small part of educator evaluations and certainly not any part of high-stakes decisions as to continuing employment. It is my hope that, as Texas rolls out TESS in pilot districts in the 2014-2015 school year, serious consideration will be given to removing the VAM component completely.”

Arne Duncan’s Reaction to Recent VAM Research

Valerie Strauss wrote a recent piece for the Washington Post about an email she recently sent to Arne Duncan — the current U.S. Secretary of Education who is (still) advancing VAMs for the nation. She wrote him directly to get his take on the “growing mountain of evidence [that] has shown that the method now used in most states, known as “value-added measures,” is not reliable. With [specific attention paid to the] two [most] recent reports released” on VAMs, one being the position statement released by the American Statistical Association and the other being a peer-reviewed article recently published in which researchers also found “surprisingly weak” correlations among VAMs and other teacher quality indicators including teacher observations.

His response? As sent via email from Duncan’s Press Secretary:

“Including measures of how well students are learning as part of multiple indicators of educator effectiveness is part of a set of long-needed changes that will improve classroom learning for kids. Growth measures are a significant improvement over the system that existed before, which failed to produce useful distinctions in teacher performance. Growth measures — including value-added measures — focus attention on student learning and show progress. While these measures are better than what existed before, educators will continue to improve them, and sharp, critical attention from the research community can help.”

As to whether Duncan is aware of the latest research, Duncan’s press secretary wrote:

“We keep track of all major research on this topic.”

Whether Duncan is integrating the research into his thinking about his policy approach in this area is another question, or rather a question implicitly answered by these responses.

See the full piece here.

ASU Economics Professor Margarita Pivovarova on Northwestern Economics Professors Kirabo Jackson’s VAM Study

A “recent” (2012) economics-based study has recently come to VAMmers’ attention, about effective teachers’ effects on students’ non-cognitive skills, with evidence supporting that teachers’ effects on non-cognitive skills matter more than test scores. Given the econometric approach of the author, I invited my colleague — ASU Assistant Professor of Education Economics, Margarita Pivovarova — to give us a review of the study. She writes:

“Going back to the debate about what constitutes a “good” or “effective” teacher and whether value-added based on test scores alone could potentially capture all about which we care in a “good” or “effective” teacher when we find one, a working paper released back in 2012 in the National Bureau of Education Researh (NBER) series on economics of education (a paper, though, recently reviewed here in the Better Living through Mathematics blog and recently reviewed here in Diane Ravitch’s blog) takes a big leap forward in that discussion and provides empirical evidence that not all value-added are created equal. The paper, titled “Non-Cognitive Ability, Test Scores, and Teacher Quality: Evidence from 9th Grade Teachers in North Carolina” is available here.

C. Kirabo Jackson, an Associate Professor at Northwestern University’s Institute for Policy Research, proposes and empirically estimates (i.e., using actual data from North Carolina) a model that assumes that teachers may affect not only test scores, but also non-cognitive abilities of their students, and that these two effects may not overlap within the same teacher. In other words, a teacher who is very good at boosting test scores may not be the one who has high value-added in the behavioral and social skills domain. The author focuses on establishing the multidimensionality of a teacher’s effect – something many parents and students believe in and appreciate.

Statistically speaking, the strategy in the paper decomposes the value-added for the same teacher into two separate effects – one that affects cognitive skills (i.e., traditionally measured by test scores) and the other one that is widely overlooked by the adepts of value-added models, but nevertheless impacts important non-cognitive traits (i.e., behavioral and social characteristics of students). In fact, non-cognitive skills have been shown to be better predictors of adult outcomes than test scores when it comes right down to it.

But complicated statistical manipulations aside (although I should mention that the author does establish a credible causal link between teachers’ effect and students’ outcomes), let’s look at the findings. Here is where things become interesting.

First, the paper documents two interesting patterns: test scores are only weakly correlated with other measurable outcomes such as absences, suspension, and on-time progression, all of which can be called behavioral outcomes. At the same time, non-test-score outcomes are correlated with long-run outcomes – dropout rates, graduation rates, and SAT taking behaviors, independent of test scores. Moreover, behavioral outcomes are almost twice as effective as are test scores in improving long-run outcomes!

The next step would be to establish whether the teachers who are good at raising test scores are also as good at influencing students’ behavioral characteristics. It turns out, according to the results in the paper, that teachers “value-added” for test scores and non-test-scores outcomes are only weakly correlated, which implies that teachers have different skills and skills are not perfectly balanced within teachers. So, when we label a teacher as good, bad, or average, as based on the test scores included in value-added models alone, we are at the same time overlooking those who could have changed the life of their students for better by simply encouraging them to go to school more often and by helping them to progress through school (i.e., this flies in the face of retention policies based on test scores).

That being said, the main take away from this paper is that when we evaluate teachers based on their value-added as based on their test scores, we fail to measure the impact of good teachers on non-cognitive outcomes. This is highly unfortunate, and unfortunately non-trivial, as teachers who “add value” to non-cognitive factors and improve these in addition to just test scores, impact the future outcomes of their students to a much greater extent that teachers with the high value-added for test scores alone.”

Thanks to Dr. Pivovarova for her summary and post!! And thanks to C. Kirabo Jackson for such a unique article of so much “added value!”

*I should also mention, though, and in all fairness particularly given my prior posts and criticisms of the NBER and their premature releases of non-peer-reviewed studies (see, for example, here and here), this too has not yet made it to publication in a peer-reviewed journal outlet, although it is currently “under review” which is certainly farther along than many of the other NBER publications that never make it to print.

One Compliant County in Florida and its New 724 Tests

Beginning this fall, in Collier County Florida as per the state of Florida’s new teacher accountability policy, district teachers/administrators are to create new tests for each and every class it offers (including all electives) to hold all teachers accountable for the value they purportedly add to student learning and achievement over time.

While most often approximately 70% of teachers are excluded from being held accountable via these systems, this is the (band aid) approach meant to fix that.

But in this county, this means district teachers/administrators will be developing and implementing (without psychometric, financial, or really any other support)… 724 new tests to comply with the state policy.

Yes — 724 new tests for not only the teachers in this county, but all of the students being educated in this county as well. Thereafter, “Student scores and learning gains at all grade levels will be evaluated and, in the 2015-16 school year, a teacher’s paycheck will be based in part on how they did.”

This says nothing about the seriously major methodological issues that are no doubt to arise using educator developed tests. If we can’t get it right using the best tests we have going, as developed by testing companies with decades of experience developing tests that are still highly questionable when used for measuring teachers’ contributions to student learning, just think of the exponential problems this is to now cause too.

The state of Florida’s response, being yet another state that received Race to the Top funds to “support” this? “They’ve [as in personnel at the district] had three years to do this.”


Tennessee and Its Most Recent Performance on the NAEP

Following up on three posts over the past few months — the first about Tennessee Commissioner Huffman’s (un)inspiring TEDxNashville talk in which he vociferously celebrated Tennessee students’ recent (albeit highly questionable) gains on the National Assessment of Education Progress (NAEP) scores, the second about Huffman’s (and the Tennessee Department of Education’s) unexpected postponement of the release of its state-level (TCAP) standardized test scores, test scores that were, by law, to account for 15 to 25 percent of Tennessee students’ final grades, and the third about some of the “behind the scenes” details surrounding what might actually be going on in Tennessee —  it seems Tennessee’s most recent NAEP scores are in!

Recall that two years after Commissioner Huffman arrived “Tennessee’s kids had the most growth of any kids in America” on the NAEP. This was celebrated by both U.S. Secretary of Education Arne Duncan and President Obama, as evidence that tying accountability measures to teacher-level growth in fact works to increase student achievement.

Well, as per the most recent NAEP scores just released, Tennessee students’ mathematics and reading scores have gone, more or less, flat. More specifically, and as per Gary Rubenstein’s recent post, in Tennessee “3-8 Reading dropped from 50.3% to 49.5% while 3-8 Math increased from 50.7% to 51.3%. Now these are small changes and not tremendously ‘statistically significant’ either way.  But they are pretty ‘flat.”  Recall as well that this is the state in which “the best” VAM is being used state-wide for such purposes (i.e., the Tennessee Value-Added Assessment System [TVAAS], or what is more commonly known as the Education Value-Added Assessment System [EVAAS]).

In addition, “the largest drop of any grade in any subject was 3rd grade reading which dropped from 48.8% to 43.8%, a drop of 5%.  This might be statistically significant, but more importantly, this is the group of students who had the most opportunity to benefit from the reforms put in place in Tennessee, so ‘reformers’ should expect that group of 3rd graders to outperform previous groups.  Third grade math also dropped from 59% to 56.5%.”

Again, do read and see (in terms of graphs) more here. Definitely worth a view as this evidences well that history has, yet again, repeated itself. History has repeated itself, in fact, for now over 30 years when this was observed post the introduction of such accountability measures in Florida back in the late 1970s.

And how was it that Einstein defined insanity??

Mercedes Schneider and VAMs in Louisiana

I’ve been communicating recently with Mercedes Schneider, lifelong and now Louisiana teacher, blogger at deutsch29, and author of a newly-released book on corporate educational reform, A Chronicle of Echoes. We recently got onto the topic of George Noell – the infamous developer of the state of Louisiana’s VAM – pondering where he has gone as since 2012 he seems to have disappeared from the politics surrounding the VAM he created for the state (he had also been quite a vocal advocate of VAM-based policies in Washington D.C., and in particular their use for evaluating the effectiveness of colleges of education in preparing teachers who “add value” to student learning and achievement once graduated). I know…

Anyhow, it seems that Noell’s “service had come to an end,” at the Louisiana Department of Education in 2012 as apparently Noell could not endorse the state’s VAM as being suitable for teacher evaluations and dismissal, as the state apparently desired. Regardless of his endorsement, Louisiana became the first state well-known for terminating teachers as based on not-three (as is the minimum recommended), not-two, but one, yes just one year of low VAM scores. The goal was to allow for the termination of the bottom (arbitrarily set) 10% of teachers in the state per year until, I suppose, Louisiana’s poor achievement worked itself out of the quaggy swamps.

While things have since changed in the state, whereas the state has lightened up on this aggressive state policy and, for now, suspended VAM (see an article about this here), although the state has also become hyper-reliant on its “student learning targets” (SLTs) that can still be used to make a teacher unemployable or result in dismissal, Mercedes also sent me this video created by one of her colleagues (Heb Bassett). While a little outdated in terms of the state’s policy landscape surrounding this video, the video presents in a very easy-to-understand and accessible way the still very current issues surrounding all VAMs across the county. Hence, I thought it important to share with you all as this video is one of the best at illustrating some of the major issues.

See the imbedded video here: http://deutsch29.wordpress.com/2013/02/24/excellent-five-minute-video-on-vam/