Special Issue of “Educational Researcher” (Paper #8 of 9, Part I): A More Research-Based Assessment of VAMs’ Potentials

Recall that the peer-reviewed journal Educational Researcher (ER) – published a “Special Issue” including nine articles examining value-added measures (VAMs). I have reviewed the next of nine articles (#8 of 9), which is actually a commentary titled “Can Value-Added Add Value to Teacher Evaluation?” This commentary is authored by Linda Darling-Hammond – Professor of Education, Emeritus, at Stanford University.

Like with the last commentary reviewed here, Darling-Hammond reviews some of the key points taken from the five feature articles in the aforementioned “Special Issue.” More specifically, though, Darling-Hammond “reflect[s] on [these five] articles’ findings in light of other work in this field, and [she] offer[s her own] thoughts about whether and how VAMs may add value to teacher evaluation” (p. 132).

She starts her commentary with VAMs “in theory,” in that VAMs COULD accurately identify teachers’ contributions to student learning and achievement IF (and this is a big IF) the following three conditions were met: (1) “student learning is well-measured by tests that reflect valuable learning and the actual achievement of individual students along a vertical scale representing the full range of possible achievement measures in equal interval units” (2) “students are randomly assigned to teachers within and across schools—or, conceptualized another way, the learning conditions and traits of the group of students assigned to one teacher do not vary substantially from those assigned to another;” and (3) “individual teachers are the only contributors to students’ learning over the period of time used for measuring gains” (p. 132).

None of things are actual true (or near to true, nor will they likely ever be true) in educational practice, however. Hence, the errors we continue to observe that continue to prevent VAM use for their intended utilities, even with the sophisticated statistics meant to mitigate errors and account for the above-mentioned, let’s call them, “less than ideal” conditions.

Other pervasive and perpetual issues surrounding VAMs as highlighted by Darling-Hammond, per each of the three categories above, pertain to (1) the tests used to measure value-added is that the tests are very narrow, focus on lower level skills, and are manipulable. These tests in their current form cannot effectively measure the learning gains of a large share of students who are above or below grade level given a lack of sufficient coverage and stretch. As per Haertel (2013, as cited in Darling-Hammond’s commentary), this “translates into bias against those teachers working with the lowest-performing or the highest-performing classes’…and “those who teach in tracked school settings.” It is also important to note here that the new tests created by the Partnership for Assessing Readiness for College and Careers (PARCC) and Smarter Balanced, multistate consortia “will not remedy this problem…Even though they will report students’ scores on a vertical scale, they will not be able to measure accurately the achievement or learning of students who started out below or above grade level” (p.133).

With respect to (2) above, on the equivalence (or rather non-equivalence) of groups of student across teachers classrooms who are the ones whose VAM scores are relativistically compared, the main issue here is that “the U.S. education system is the one of most segregated and unequal in the industrialized world…[likewise]…[t]he country’s extraordinarily high rates of childhood poverty, homelessness, and food insecurity are not randomly distributed across communities…[Add] the extensive practice of tracking to the mix, and it is clear that the assumption of equivalence among classrooms is far from reality” (p. 133). Whether sophisticated statistics can control for all of this variation is one of most debated issues surrounding VAMs and their levels of outcome bias, accordingly.

And as per (3) above, “we know from decades of educational research that many things matter for student achievement aside from the individual teacher a student has at a moment in time for a given subject area. A partial list includes the following [that are also supposed to be statistically controlled for in most VAMs, but are also clearly not controlled for effectively enough, if even possible]: (a) school factors such as class sizes, curriculum choices, instructional time, availability of specialists, tutors, books, computers, science labs, and other resources; (b) prior teachers and schooling, as well as other current teachers—and the opportunities for professional learning and collaborative planning among them; (c) peer culture and achievement; (d) differential summer learning gains and losses; (e) home factors, such as parents’ ability to help with homework, food and housing security, and physical and mental support or abuse; and (e) individual student needs, health, and attendance” (p. 133).

“Given all of these influences on [student] learning [and achievement], it is not surprising that variation among teachers accounts for only a tiny share of variation in achievement, typically estimated at under 10%” (see, for example, highlights from the American Statistical Association’s (ASA’s) Position Statement on VAMs here). “Suffice it to say [these issues]…pose considerable challenges to deriving accurate estimates of teacher effects…[A]s the ASA suggests, these challenges may have unintended negative effects on overall educational quality” (p. 133). “Most worrisome [for example] are [the] studies suggesting that teachers’ ratings are heavily influenced [i.e., biased] by the students they teach even after statistical models have tried to control for these influences” (p. 135).

Other “considerable challenges” include: VAM output are grossly unstable given the swings and variations observed in teacher classifications across time, and VAM output are “notoriously imprecise” (p. 133) given the other errors observed as caused, for example, by varying class sizes (e.g., Sean Corcoran (2010) documented with New York City data that the “true” effectiveness of a teacher ranked in the 43rd percentile could have had a range of possible scores from the 15th to the 71st percentile, qualifying as “below average,” “average,” or close to “above average). In addition, practitioners including administrators and teachers are skeptical of these systems, and their (appropriate) skepticisms are impacting the extent to which they use and value their value-added data, noting that they value their observational data (and the professional discussions surrounding them) much more. Also important is that another likely unintended effect exists (i.e., citing Susan Moore Johnson’s essay here) when statisticians’ efforts to parse out learning to calculate individual teachers’ value-added causes “teachers to hunker down and focus only on their own students, rather than working collegially to address student needs and solve collective problems” (p. 134). Related, “the technology of VAM ranks teachers against each other relative to the gains they appear to produce for students, [hence] one teacher’s gain is another’s loss, thus creating disincentives for collaborative work” (p. 135). This is what Susan Moore Johnson termed the egg-crate model, or rather the egg-crate effects.

Darling-Hammond’s conclusions are that VAMs have “been prematurely thrust into policy contexts that have made it more the subject of advocacy than of careful analysis that shapes its use. There is [good] reason to be skeptical that the current prescriptions for using VAMs can ever succeed in measuring teaching contributions well (p. 135).

Darling-Hammond also “adds value” in one whole section (highlighted in another post forthcoming here), offering a very sound set of solutions, using VAMs for teacher evaluations or not. Given it’s rare in this area of research we can focus on actual solutions, this section is a must read. If you don’t want to wait for the next post, read Darling-Hammond’s “Modest Proposal” (p. 135-136) within her larger article here.

In the end, Darling-Hammond writes that, “Trying to fix VAMs is rather like pushing on a balloon: The effort to correct one problem often creates another one that pops out somewhere else” (p. 135).

*****

If interested, see the Review of Article #1 – the introduction to the special issue here; see the Review of Article #2 – on VAMs’ measurement errors, issues with retroactive revisions, and (more) problems with using standardized tests in VAMs here; see the Review of Article #3 – on VAMs’ potentials here; see the Review of Article #4 – on observational systems’ potentials here; see the Review of Article #5 – on teachers’ perceptions of observations and student growth here; see the Review of Article (Essay) #6 – on VAMs as tools for “egg-crate” schools here; and see the Review of Article (Commentary) #7 – on VAMs situated in their appropriate ecologies here.

Article #8, Part I Reference: Darling-Hammond, L. (2015). Can value-added add value to teacher evaluation? Educational Researcher, 44(2), 132-137. doi:10.3102/0013189X15575346

“Arbitrary and Capricious:” Sheri Lederman Wins Lawsuit in NY’s State Supreme Court

Recall the New York lawsuit pertaining to Long Island teacher Sheri Lederman? She just won in New York’s State Supreme court, and boy did she win big, also for the cause!

Sheri is a teacher, who by all accounts other than her 2013-2014 “ineffective” growth score of a 1/20, is a terrific 4th grade, 18-year veteran teacher. However, after receiving her “ineffective” growth rating and score, she along with her attorney and husband Bruce Lederman, sued the state of New York to challenge the state’s growth-based teacher evaluation system and Sheri’s individual score. See prior posts about Sheri’s case here, herehere and here.

The more specific goal of her case was to seek a judgment: (1) setting aside or vacating Sheri’s individual growth score and rating her as “ineffective,” and (2) declare that the New York endorsed and implemented growth measures in use was/is “arbitrary and capricious.” The “overall gist” was that Sheri contended that the system unfairly penalized teachers whose students consistently scored well and could not demonstrated growth upwards (e.g., teachers of gifted or other high achieving students). This concern/complaint is common elsewhere.

As per a State Supreme Court ruling, just released today as written by Acting Supreme Court Justice Judge Roger McDonough (May 10, 2016), and at 15 pages in length and available in full here, Sheri won her case. She won it against John King — the then New York State Education Department Commissioner and the now US Secretary of Education (who recently replaced Arne Duncan as US Secretary of Education). The Court concluded that Sheri (her husband, her team of experts, and other witnesses) effectively established that her growth score and rating for 2013-2014 was “arbitrary and capricious,” with “arbitrary and capricious” being defined as actions “taken without sound basis in reason or regard to the facts.”

More specifically, the Court’s conclusion was founded upon: (1) the convincing and detailed evidence of VAM bias against teachers at both ends of the spectrum (e.g. those with high-performing students or those with low-performing students); (2) the disproportionate effect of petitioner’s small class size and relatively large percentage of high-performing students; (3) the functional inability of high-performing students to demonstrate growth akin to lower-performing students; (4) the wholly unexplained swing in petitioner’s growth score from 14 [i.e., her growth score the year prior] to 1, despite the presence of statistically similar scoring students in her respective classes; and, most tellingly, (5) the strict imposition of rating constraints in the form of a “bell curve” that places teachers in four categories via pre-determined percentages regardless of whether the performance of students dramatically rose or dramatically fell from the previous year.”

As per an email I received earlier today from Bruce (i.e., Sheri’s husband/attorney who prosecuted her case), the Court otherwise “declined to make an overall ruling on the [New York growth] rating system in general because of new regulations in effect” [e.g., that the state’s growth model is currently under review]…[Nontheless, t]he decision should qualify as persuasive authority for other teachers challenging growth scores throughout the County [and Country]. [In addition, the] Court carefully recite[d] all our expert affidavits [i.e., from Professors Darling-Hammond, Pallas, Amrein-Beardsley, Sean Corcoran and Jesse Rothstein as well as Drs. Burris and Lindell].” Noted as well were the “absence of any meaningful’ challenge to [Sheri’s] experts’ conclusions, especially about the dramatic swings noticed between her, and potentially others’ scores, and the other ‘litany of expert affidavits submitted on [Sheris’] behalf].”

“It is clear that the evidence all of these amazing experts presented was a key factor in winning this case since the Judge repeatedly said both in Court and in the decision that we have a “high burden” to meet in this case.” [In addition,] [t]he Court wrote that the court “does not lightly enter into a critical analysis of this matter … [and] is constrained on this record, to conclude that [the] petitioner [i.e., Sheri] has met her high burden.”

To Bruce’s/our knowledge, this is the first time a judge has set aside an individual teacher’s VAM rating based upon such a presentation in court.

Thanks to all who helped in this endeavor. Onward!

Virginia SGP’s Side of the Story

In one of my most recent posts I wrote about how Virginia SGP, aka parent Brian Davison, won in court against the state of Virginia, requiring them to release teachers’ Student Growth Percentile (SGP) scores. Virginia SGP is a very vocal promoter of the use of SGPs to evaluate teachers’ value-added (although many do not consider the SGP model to be a value-added model (VAM); see general differences between VAMs and SGPs here). Regardless, he sued the state of Virginia to release teachers’ SGP scores so he could make them available to all via the Internet. He did this, more specifically, so parents and perhaps others throughout the state would be able to access and then potentially use the scores to make choices about who should and should not teach their kids. See other posts about this story here and here.

Those of us who are familiar with Virginia SGP and the research literature writ large know that, unfortunately, there’s much that Virginia SGP does not understand about the now loads of research surrounding VAMs as defined more broadly (see multiple research article links here). Likewise, Virginia SGP, as evidenced below, rides most of his research-based arguments on select sections of a small handful of research studies (e.g., those written by economists Raj Chetty and colleagues, and Thomas Kane as part of Kane’s Measures of Effective Teaching (MET) studies) that do not represent the general research on the topic. He simultaneously ignores/rejects the research studies that empirically challenge his research-based claims (e.g., that there is no bias in VAM-based estimates, and that because Chetty, Friedman, and Rockoff “proved this,” it must be true, despite the research studies that have presented evidence otherwise (see for example here, here, and here).

Nonetheless, given that him winning this case in Virginia is still noteworthy, and followers of this blog should be aware of this particular case, I invited Virginia SGP to write a guest post so that he could tell his side of the story. As we have exchanged emails in the past, which I must add have become less abrasive/inflamed as time has passed, I recommend that readers read and also critically consume what is written below. Let’s hope that we might have some healthy and honest dialogue on this particular topic in the end.

From Virginia SGP:

I’d like to thank Dr. Amrein-Beardsley for giving me this forum.

My school district recently announced its teacher of the year. John Tuck teaches in a school with 70%+ FRL students compared to a district average of ~15% (don’t ask me why we can’t even those #’s out). He graduated from an ordinary school with a degree in liberal arts. He only has a Bachelors and is not a National Board Certified Teacher (NBCT). He is in his ninth year of teaching specializing in math and science for 5th graders. Despite the ordinary background, Tuck gets amazing student growth. He mentors, serves as principal in the summer, and leads the school’s leadership committees. In Dallas, TX, he could have risen to the top of the salary scale already, but in Loudoun County, VA, he only makes $55K compared to a top salary of $100K for Step 30 teachers. Tuck is not rewarded for his talent or efforts largely because Loudoun eschews all VAMs and merit-based promotion.

This is largely why I enlisted the assistance of Arizona State law school graduate Lin Edrington in seeking the Virginia Department of Education’s (VDOE) VAM (SGP) data via a Freedom of Information Act (FOIA) suit (see pertinent files here).

VAMs are not perfect. There are concerns about validity when switching from paper to computer tests. There are serious concerns about reliability when VAMs are computed with small sample sizes or are based on classes not taught by the rated teacher (as appeared to occur in New Mexico, Florida, and possibly New York). Improper uses of VAMs give reformers a bad name. This was not the case in Virginia. SGPs were only to be used when appropriate with 2+ years of data and 40+ scores recommended.

I am a big proponent of VAMs based on my reviews of the research. We have the Chetty/Friedman/Rockoff (CFR) studies, of course, including their recent paper showing virtually no bias (Table 6). The following briefing presented by Professor Friedman at our trial gives a good layman’s overview of their high level findings. When teachers are transferred to a completely new school but their VAMs remain consistent, that is very convincing to me. I understand some point to the cautionary statement of the ASA suggesting districts apply VAMs carefully and explicitly state their limitations. But the ASA definitely recommends VAMs for analyzing larger samples including schools or district policies, and CFR believe their statement failed to consider updated research.

To me, the MET studies provided some of the most convincing evidence. Not only are high VAMs on state standardized tests correlated to higher achievement on more open-ended short-answer and essay-based tests of critical thinking, but students of high-VAM teachers are more likely to enjoy class (Table 14). This points to VAMs measuring inspiration, classroom discipline, the ability to communicate concepts, subject matter knowledge and much more. If a teacher engages a disinterested student, their low scores will certainly rise along with their VAMs. CFR and others have shown this higher achievement carries over into future grades and success later in life. VAMs don’t just measure the ability to identify test distractors, but the ability of teachers to inspire.

So why exactly did the Richmond City Circuit Court force the release of Virginia’s SGPs? VDOE applied for and received a No Child Left Behind (NCLB) waiver like many other states. But in court testimony provided in December of 2014, VDOE acknowledged that districts were not complying with the waiver by not providing the SGP data to teachers or using SGPs in teacher evaluations despite “assurances” to the US Department of Education (USDOE). When we initially received a favorable verdict in January of 2015, instead of trying to comply with NCLB waiver requirements, my district of Loudoun County Publis Schools (LCPS) laughed. LCPS refused to implement SGPs or even discuss them.

There was no dispute that the largest Virginia districts had committed fraud when I discussed these facts with the US Attorney’s office and lawyers from the USDOE in January of 2016, but the USDOE refused to support a False Claim Act suit. And while nearly every district stridently refused to use VAMs [i.e., SGPs], the Virginia Secretary of Education was falsely claiming in high profile op-eds that Virginia was using “progress and growth” in the evaluation of schools. Yet, VDOE never used the very measure (SGPs) that the ESEA [i.e., NCLB] waivers required to measure student growth. The irony is that if these districts had used SGPs for just 1% of their teachers’ evaluations after the December of 2014 hearing, their teachers’ SGPs would be confidential today. I could only find one county that utilized SGPs, and their teachers’ SGPs are exempt. Sometimes fraud doesn’t pay.

My overall goals are threefold:

  1. Hire more Science Technology Engineering and Mathematics (STEM) majors to get kids excited about STEM careers and effectively teach STEM concepts
  2. Use growth data to evaluate policies, administrators, and teachers. Share the insights from the best teachers and provide professional development to ineffective ones
  3. Publish private sector equivalent pay so young people know how much teachers really earn (pensions often add 15-18% to their salaries). We can then recruit more STEM teachers and better overall teaching candidates

What has this lawsuit and activism cost me? A lot. I ate $5K of the cost of the VDOE SGP suit even after the award[ing] of fees. One local school board member has banned me from commenting on his “public figure” Facebook page (which I see as a free speech violation), both because I questioned his denial of SGPs and some other conflicts of interests I saw, although indirectly related to this particular case. The judge in the case even sanctioned me $7K just for daring to hold him accountable. And after criticizing LCPS for violating Family Educational Rights and Privacy Act (FERPA) by coercing kids who fail Virginia’s Standards of Learning tests (SOLs) to retake them, I was banned from my kids’ school for being a “safety threat.”

Note that I am a former Naval submarine officer and have held Department of Defense (DOD) clearances for 20+ years. I attended a meeting this past Thursday with LCPS officials in which they [since] acknowledged I was no safety threat. I served in the military, and along with many I have fought for the right to free speech.

Accordingly, I am no shrinking violet. Despite having LCPS attorneys sanction perjury, the Republican Commonwealth Attorney refused to prosecute and then illegally censored me in public forums. So the CA will soon have to sign a consent order acknowledging violating my constitutional rights (he effectively admitted as much already). And a federal civil rights complaint against the schools for their retaliatory ban is being drafted as we speak. All of this resulted from my efforts to have public data released and hold LCPS officials accountable to state and federal laws. I have promised that the majority of any potential financial award will be used to fund other whistle blower cases, [against] both teachers and reformers. I have a clean background and administrators still targeted me. Imagine what they would do to someone who isn’t willing to bear these costs!

In the end, I encourage everyone to speak out based on your beliefs. Support your case with facts not anecdotes or hastily conceived opinions. And there are certainly efforts we can all support like those of Dr. Darling-Hammond. We can hold an honest debate, but please remember that schools don’t exist to employ teachers/principals. Schools exist to effectively educate students.

Teachers’ “Similar” Value-Added Estimates Yield “Different” Meanings Across “Different” Contexts

Some, particularly educational practitioners, might respond with a sense of “duh”-like sarcasm to the title of this post above, but as per a new research study recently released in the highly reputable, peer-reviewed American Educational Research Journal (AERJ), researchers evidenced this very headline via an extensive research study they conducted in the northeast United States. Hence, this title has now been substantiated with empirical evidence.

Researchers David Blazar (Doctoral Candidate at Harvard), Erica Litke (Assistant Professor at University of Delaware), and Johanna Barmore (Doctoral Candidate at Harvard) examined (1) the comparability of teachers’ value-added estimates within and across four urban districts and (2), given the extent to which variations observed, how and whether said value-added estimates consistently captured differences in teachers’ observed, videotaped, and scored classroom practices.

Regarding their first point of investigation, they found that teachers were categorized differently when compared within versus across districts (i.e., when compared to other similar teachers within districts versus across districts, which is a methodological choice that value-added modelers often make). Researchers found and asserted not that either approach yielded more valid interpretations, however. Rather, they evidenced that the differences they observed within and across districts were notable, and these differences had notable implications for validity, whereas a teacher classified as adding X value in one context could be categorized as adding Y value in another, given the context in which (s)he was teaching. In other words, the validity of the inferences to be drawn about potentially any teacher depended greatly on context in which the teacher taught, in that his/her value-added estimate did not necessarily generalize across contexts. Put in their words, “it is not clear whether the signal of teachers’ effectiveness sent by their value-added rankings retains a substantive interpretation across contexts” (p. 326). Inversely put, “it is clear that labels such as highly effective or ineffective based on value-added scores do not have fixed meaning” (p. 351).

Regarding their second point of investigation, they found “stark differences in instructional practices across districts among teachers who received similar within-district value-added rankings” (p. 324). In other words, “when comparing [similar] teachers within districts, value-added rankings signaled differences in instructional quality in some but not all instances” (p. 351), whereas similarly ranked teachers did not necessarily display effective or ineffective teachers. This has also been more loosely evidenced via those who have investigated the correlations between teachers’ value-added and observational scores, and have found weak to moderate correlations (see prior posts on this here, here, here, and here). In the simplest of terms, “value-added categorizations did not signal common sets of instructional practices across districts” (p. 352).

The bottom line here, then, is that those in charge of making consequential decisions about teachers, as based even if in part on teachers’ value-added estimates, need to be cautious when making particularly high-stakes decisions about teachers as based on said estimates. A teacher, as based on the evidence presented in this particular study could logically but also legally argue that had (s)he been teaching in a different district, even within the same state and using the same assessment instruments, (s)he could have received a substantively different value-added score given the teacher(s) to whom (s)he was compared when estimating his/her value-added elsewhere. Hence, the validity of the inferences and statements asserting that one teacher was effective or not as based on his/her value-added estimates is suspect, again, as based on the contexts in which teachers teach, as well as when compared to whatever other comparable teachers to whom teachers are compared when estimating teacher-level value-added. “Here, the instructional quality of the lowest ranked teachers was not particularly weak and in fact was as strong as the instructional quality of the highest ranked teachers in other districts” (p. 353).

This has serious implications, not only for practice but also for the lawsuits ongoing across the nation, especially in terms of those pertaining to teachers’ wrongful terminations, as charged.

Citation: Blazar, D., Litke, E., & Barmore, J. (2016). What does it mean to be ranked a ‘‘high’’ or ‘‘low’’ value-added teacher? Observing differences in instructional quality across districts. American Educational Research Journal, 53(2), 324–359.  doi:10.3102/0002831216630407

The Marzano Observational Framework’s Correlations with Value-Added

“Good news for schools using the Marzano framework,”…according to Marzano researchers. “One of the largest validation studies ever conducted on [the] observation framework shows that the Marzano model’s research-based structure is correlated with state VAMs.” See this claim, via the full report here.

The more specific claim is as follows: Study researchers found a “strong [emphasis added, see discussion forthcoming] correlation between Dr. Marzano’s nine Design Questions [within the model’s Domain 1] and increased student achievement on state math and reading scores.” All correlations were positive, with the highest correlation just below r = 0.4 and the lowest correlation just above r = 0.0. See the actual correlations illustrated here:

Screen Shot 2016-04-13 at 7.59.37 AM

 

 

 

 

 

See also a standard model to categorize such correlations, albeit out of any particular context, below. Using this, one can see that the correlations observed were indeed small to moderate, but not “strong” as claimed. Elsewhere, as also cited in this report, other observed correlations from similar studies on the same model ranged from r = 0.13 to 0.15, r = 0.14 to 0.21, and r = 0.21 to 0.26. While these are also noted as statistically significant, using the table below one can determine that statistical significance does not necessarily mean that such “very weak” to “weak” correlations are of much practical significance, especially if and when high-stakes decisions about teachers and their effects are to be attached to such evidence.

Screen Shot 2016-04-13 at 8.28.23 AM

Likewise, if such results (i.e., 0.0 < r < 0.4) sound familiar, they should, as a good number of researchers have set out to explore similar correlations in the past, using different value-added and observational data, and these researchers have also found similar zero-to-moderate (i.e., 0.0 < r < 0.4), but not (and dare I say never) “strong” correlations. See prior posts about such studies, for example, here, here, and here. See also the authors’ Endnote #1 in their report, again, here.

As the authors write: “When evaluating the validity of observation protocols, studies [sic] typically assess the correlations between teacher observation scores and their value-added scores.” This is true, but this is true only in that such correlations offer only one piece of validity evidence.

Validity, or rather evidencing that something from which inferences are drawn is in fact valid, is MUCH more complicated than simply running these types of correlations. Rather, the type of evidence that these authors are exploring is called convergent-related evidence of validity; however, for something to actually be deemed valid, MUCH more validity evidence is needed (e.g., content-, consequence-, predictive-, etc.- related evidence of validity). See, for example, some of the Educational Testing Service (ETS)’s Michael T. Kane’s work on validity here. See also The Standards for Educational and Psychological Testing developed by the American Educational Research Association (AERA), American Psychological Association (APA), and National Council on Measurement in Education (NCME) here.

Instead, in this report the authors write that “Small to moderate correlations permit researchers to claim that the framework is validated (Kane, Taylor, Tyler, & Wooten, 2010).” This is false. As well, this unfortunately demonstrates a very naive conception and unsophisticated treatment of validity. This is also illustrated in that the authors use one external citation, authored by Thomas Kane NOT the aforementioned validity expert Michael Kane, to allege that validity can be (and is being) claimed. Here is the actual Thomas Kane et al. article the Marzano authors reference to support their validity claim, also noting that nowhere in this piece do Thomas Kane et al. make this actual claim. In fact, a search for “small” or “moderate” correlations yields zero total hits.

In the end, what can be more fairly and appropriately asserted from this research report is that the Marzano model is indeed correlated with value-added estimates, and their  correlation coefficients fall right in line with all other correlation coefficients evidenced via  other current studies on this topic, again, whereby researchers have correlated multiple observational models with multiple value-added estimates. These correlations are small to moderate, and certainly not “strong,” and definitely not “strong” enough to warrant high-stakes decisions (e.g., teacher termination) given everything (i.e., the unexplained variance) that is still not captured among these multiple measures…and that still threatens the validity of the inferences to be drawn from these measures combined.

Kane Is At It, Again: “Statistically Significant” Claims Exaggerated to Influence Policy

In a recent post, I critiqued a fellow academic and value-added model (VAM) supporter — Thomas Kane, an economics professor from Harvard University who also directed the $45 million worth of Measures of Effective Teaching (MET) studies for the Bill & Melinda Gates Foundation. Kane has been the source of multiple posts on this blog (see also here, here, and here) as he is a very public figure, very often backing, albeit often not in non-peer-reviewed technical reports and documents, series of exaggerated, and “research-based” claims. In this prior post, I more specifically critiqued the overstated claims he made in a recent National Public Radio (NPR) interview titled: “There Is No FDA For Education. Maybe There Should Be.”

Well, a colleague recently emailed me another such document authored by Kane (and co-written with four colleagues), titled: “Teaching Higher: Educators’ Perspectives on Common Core Implementation.” While this one is quite methodologically sound (i.e., as assessed via a thorough read of the main text of the document, including all footnotes and appendices), it is Kane’s set of claims, again, that are of concern, especially knowing that this report, even though it too has not yet been externally vetted or reviewed, will likely have a policy impact. The main goal of this report is clearly (although not made explicit) to endorse, promote, and in many ways save the Common Core State Standards (CCSS). I emphasize the word save in that clearly, and especially since the passage of the Every Student Succeeds Act (ESSA), many states have rejected the still highly controversial Common Core. I also should note that researchers in this study clearly conducted this study with similar a priori conclusions in mind (i.e., that the Common Core should be saved/promoted); hence, future peer review of this piece may be out of the question as the bias evident in the sets of findings would certainly be a “methodological issue,” again, likely preventing a peer-reviewed publication (see, for example, the a priori conclusion that “[this] study highlights an important advantage of having a common set of standards and assessments across multiple states,” in the abstract (p. 3).

First I will comment on the findings regarding the Common Core, as related to value-added models (VAMs). Next, I will comment on Section III of the report, about “Which [Common Core] Implementation Strategies Helped Students Succeed?” (p. 17). This is where Kane and colleagues “link[ed] teachers’ survey responses [about the Common Core] to their students’ test scores on the 2014–2015 PARCC [Partnership for Assessment of Readiness for College and Careers] and SBAC [Smarter Balanced Assessment Consortium] assessments [both of which are aligned to the Common Core Standards]… This allowed [Kane et al.] to investigate which strategies and which of the [Common Core-related] supports [teachers] received were associated with their performance on PARCC and SBAC,” controlling for a variety of factors including teachers’ prior value-added (p. 17).

With regards to the Common Core sections, Kane et al. lay claims like: “Despite the additional work, teachers and principals in the five states [that have adopted the Common Core = Delaware, Maryland, Massachusetts, New Mexico, and Nevada] have largely embraced [emphasis added] the new standards” (p. 3). They mention nowhere, however, the mediating set of influences interfering with such a claim, that likely lead to this claim entirely or at least in part – that many teachers across the nation have been forced, by prior federal and current state mandates (e.g., in New Mexico), to “embrace the new standards.” Rather, Kane et al. imply throughout the document that this “embracement” is a sure sign that teachers and principals are literally taking the Common Core into and within their open arms. The same interference is at play with their similar claim that “Teachers in the five study states have made major changes [emphasis in the original] in their lesson plans and instructional materials to meet the CCSS” (p. 3). Compliance is certainly a intervening factor, again, likely contaminating and distorting the validity of both of these claims (which are two of the four total claims highlighted throughout the document’s (p. 3)).

Elsewhere, Kane et al. claim that “The new standards and assessments represent a significant challenge for teachers and students” (p. 6), along with an accompanying figure they use to illustrate how proficiency (i.e., the percent of students labeled as proficient) on these five states’ prior tests has decreased, indicating more rigor or a more “significant challenge for teachers and students” thanks to the Common Core. What they completely ignore again, however, is that the cut scores used to define “proficiency” are arbitrary per state, as was their approach to define “proficiency” across states in comparison (see footnote four). What we also know from years of research on such tests is that whenever a state introduces a “new and improved” test (e.g., the PARCC and SBAC tests), which is typically tied to “new and improved standards” (e.g., the Common Core), lower “proficiency” rates are observed. This has happened countless times across states, and certainly prior to the introduction of the PARCC and SBAC tests. Thereafter, the state typically responds with the same types of claims, that “The new standards and assessments represent a significant challenge for teachers and students.” These claims are meant to signal to the public that at last “we” are holding our teachers and students accountable for their teaching and learning, but thereafter, again, proficiency cut scores are arbitrarily redefined (among other things), and then five or ten years later “new and improved” tests and standards are needed again. In other words, this claim is nothing new and it should not be interpreted as such, but it should rather be interpreted as aligned with Einstein’s definition of insanity (i.e., repeating the same behaviors over and over again in the hopes that different results will ultimately materialize) as this is precisely what we as a nation have been doing since the minimum competency era in the early 1980s.

Otherwise, Kane et al.’s other two claims were related to “Which [Common Core] Implementation Strategies Helped Students Succeed” (p. 17), as mentioned. They assert first that “In mathematics, [they] identified three markers of successful implementation: more professional development days, more classroom observations with explicit feedback tied to the Common Core, and the inclusion of Common Core-aligned student outcomes in teacher evaluations. All were associated with statistically significantly [emphasis added] higher student performance on the PARCC and [SBAC] assessments in mathematics” (p. 3, see also p. 20). They assert second that “In English language arts, [they] did not find evidence for or against any particular implementation strategies” (p. 3, see also p. 20).

What is highly problematic about these claims is that the three correlated implementation strategies noted, again as significantly associated with teachers’ students’ test-based performance on the PARCC and SBAC mathematics assessments, were “statistically significant” (determined by standard p or “probability” values under which findings that may have happened due to chance are numerically specified). But, they were not really practically significant, at all. There IS a difference whereby “statistically significant” findings may not be “practically significant,” or in this case “policy relevant,” at all. While many misinterpret “statistical significance” as an indicator of strength or importance, it is not. Practical significance is.

As per the American Statistical Association’s (ASA) recently released “Statement on P-Values,” statistical significance “is not equivalent to scientific, human, or economic significance…Any effect, no matter how tiny, can produce a small p-value [i.e., “statistical significance”] if the sample size or measurement precision is high enough” (p. 10); hence, one must always check for practical significance when making claims about statistical significance, like Kane et al. actually do here, but do here in a similar inflated vein.

As their Table 6 shows (p. 20), the regression coefficients related to these three areas of “statistically significant” influence on teachers’ students’ test-based performance on the new PARCC and SBAC mathematics tests (i.e., more professional development days, more classroom observations with explicit feedback tied to the Common Core, and the inclusion of Common Core-aligned student outcomes in teacher evaluations) yielded the following coefficients, respectively: 0.045 (p < 0.01), 0.044 (p < 0.05), and 0.054 (p < 0.01). They then use as an example the 0.044 (p < 0.05) coefficient (as related to more classroom observations with explicit feedback tied to the Common Core) and explain that “a difference of one standard deviation in the observation and feedback index was associated with an increase of 0.044 standard deviations in students’ mathematics test scores—roughly the equivalent of 1.4 scale score points on the PARCC assessment and 4.1 scale score points on the SBAC.”

In order to generate sizable and policy relevant improvement in test scores, (e.g., by half of a standard deviation), the observation and feedback index should jump up by 11 standard deviations! In addition, given that scale score points do not equal raw or actual test items (e.g., scale score-to-actual test item relationships are typically in the neighborhood of 4 or 5 scale scores points to 1 actual test item), this likely also means that Kane’s interpretations (i.e., mathematics scores were roughly the equivalent of 1.4 scale score points on the PARCC and 4.1 scale score points on the SBAC) actually mean 1/4th or 1/5th of a test item in mathematics on the PARCC and 4/5th of or 1 test item on the SBAC. This hardly “Provides New Evidence on Strategies Related to Improved Student Performance,” unless you define improved student performance as something as little as 1/5th of a test item.

This is also not what Kane et al. claim to be “a moderately sizeable effect!” (p. 21). These numbers should not even be reported, much less emphasized as policy relevant/significant, unless perhaps equivalent to at least 0.25 standard deviations on the test (as a quasi-standard/accepted minimum). Likewise, the same argument can be made about the other three coefficients derived via these mathematics tests. See also similar claims that they asserted (e.g., that “students perform[ed] better when teachers [were] being evaluated based on student achievement” (p. 21).

Because the abstract (and possibly conclusions) section are the main sections of this paper that are likely to have the most educational/policy impact, especially when people do not read all of the text, footnotes, and abstract contents of this entire document, this is irresponsible, and in many ways contemptible. This is also precisely the reason why, again, Kane’s calls for a Federal Drug Administration (FDA) type of entity for education are also so ironic (as explained in my prior post here).

Contradictory Data Complicate Definitions of, and Attempts to Capture, “Teacher Effects”

An researcher from Brown University — Matthew Kraft — and his student, recently released a “working paper” in which I think you all will (and should) be interested. The study is about “…Teacher Effects on Complex Cognitive Skills and Social-Emotional Competencies” — those effects that are beyond “just” test scores (see also a related article on this working piece released in The Seventy Four). This one is 64-pages long, but here are the (condensed) highlights as I see them.

The researchers use data from the Bill & Melinda Gates Foundations’ Measures of Effective Teaching (MET) Project “to estimate teacher effects on students’ performance on cognitively demanding open-ended tasks in math and reading, as well as their growth mindset, grit, and effort in class.” They find “substantial variation in teacher effects on complex task performance and social-emotional measures. [They] also find weak relationships between teacher effects on state standardized tests, complex tasks, and social-emotional competencies” (p. 1).

More specifically, researchers found that: (1) “teachers who are most effective at raising student performance on standardized tests are not consistently the same teachers who develop students’ complex cognitive abilities and social-emotional competencies” (p. 7); (2) “While teachers who add the most value to students’ performance on state tests in math do also appear to strengthen their analytic and problem-solving skills, teacher effects on state [English/language arts] tests are only moderately correlated with open-ended tests in reading” (p. 7); and (3) “[T]eacher effects on social-emotional measures are only weakly correlated with effects on state achievement tests and more cognitively demanding open-ended tasks (p. 7).

The ultimate finding, then, is that “teacher effectiveness differs across specific abilities” and definitions of what it means to be an effective teacher (p. 7). Likewise, authors concluded that really all current teacher evaluation systems, also given those included within the MET studies are/were some of the best given the multiple sources of data MET researchers included (e.g., traditional tests, indicators capturing complex cognitive skills and social-emotional competencies, observations, student surveys), are not mapping onto similar definitions of teacher quality or effectiveness, as “we” have and continue to theorize.

Hence, attaching high-stakes consequences to data, especially when multiple data yield contradictory findings as based on how one might define effective teaching or its most important components (e.g., test scores v. affective, socio-emotional effects), is (still) not yet warranted, whereby an effective teacher here might not be an effective teacher there, even if defined similarly in like schools, districts, or states. As per Kraft, “while high-stakes decisions may be an important part of teacher evaluation systems, we need to decide on what we value most when making these decisions rather than just using what we measure by default because it is easier.” Ignoring such complexities will not make standard, uniform, or defensible some of the high-stakes decisions that some states are still wanting to attach to such data derived via  multiple measures and sources, given data defined differently even within standard definitions of “effective teaching” continue to contradict one another.

Accordingly, “[q]uestions remain about whether those teachers and schools that are judged as effective by state standardized tests [and the other measures] are also developing the skills necessary to succeed in the 21st century economy.” (p. 36). Likewise, it is not necessarily the case that teachers defined as high value-added teachers, using these common indicators, are indeed high value-added teachers given they are the same teachers defined as low value-added teachers when different aspects of “effective teaching” are also examined.

Ultimately, this further complicates “our” current definitions of effective teaching, especially when those definitions are constructed in policy arenas oft-removed from the realities of America’s public schools.

Reference: Kraft, M. A., & Grace, S. (2016). Teaching for tomorrow’s economy? Teacher effects on complex cognitive skills and social-emotional competencies. Providence, RI: Brown University. Working paper.

*Note: This study was ultimately published with the following citation: Blazar, D., & Kraft, M. A. (2017). Teacher and teaching effects on students’ attitudes and behaviors. Educational Evaluation and Policy Analysis, 39(1), 146 –170. doi:10.3102/0162373716670260. Retrieved from http://journals.sagepub.com/doi/pdf/10.3102/0162373716670260

Alabama’s “New” Accountability System: Part II

In a prior post, about whether the state of “Alabama is the New, New Mexico,” I wrote about a draft bill in Alabama to be called the Rewarding Advancement in Instruction and Student Excellence (RAISE) Act of 2016. This has since been renamed the Preparing and Rewarding Educational Professionals (PREP) Bill (to be Act) of 2016. The bill was introduced by its sponsoring Republican Senator Marsh last Tuesday, and its public hearing is scheduled for tomorrow. I review the bill below, and attach it here for others who are interested in reading it in full.

First, the bill is to “provide a procedure for observing and evaluating teachers, principals, and assistant principals on performance and student achievement…[using student growth]…to isolate the effect and impact of a teacher on student learning, controlling for pre-existing characteristics of a student including, but not limited to, prior achievement.” Student growth is still one of the bill’s key components, with growth set at a 25% weight, and this is still written into this bill regardless of the fact that the new federal Elementary Student Success Act (ESSA) no longer requires teacher-level growth as a component of states’ educational reform legislation. In other words, states are no longer required to do this, but apparently the state/Senator Marsh still wants to move forward in this regard, regardless (and regardless of the research evidence). The student growth model is to be selected by October 1, 2016. On this my offer (as per my prior post) still stands. I would  be more than happy to help the state negotiate this contract, pro bono, and much more wisely than so many other states and districts have negotiated similar contracts thus far (e.g., without asking for empirical evidence as a continuous contractual deliverable).

Second, and related, nothing is written about the ongoing research and evaluation of the state system, that is absolutely necessary in order to ensure the system is working as intended, especially before any types of consequential decisions are to be made (e.g., school bonuses, teachers’ denial of tenure, teacher termination, teacher termination due to a reduction in force). All that is mentioned is that things like stakeholder perceptions, general outcomes, and local compliance with the state will be monitored. Without evidence in hand, in advance and preferably as externally vetted and validated, the state will very likely be setting itself up for some legal trouble.

Third, to measure growth the state is set to use student performance data on state tests, as well as data derived via the ACT Aspire examination, American College Test (ACT), and “any number of measures from the department developed list of preapproved options for governing boards to utilize to measure student achievement growth.” As mentioned in my prior post about Alabama (linked to again here), this is precisely what has gotten the whole state of New Mexico wrapped up in, and quasi-losing their ongoing lawsuit. While providing districts with menus of off-the-shelf and other assessment options might make sense to policymakers, any self respecting researcher should know why this is entirely inappropriate. To read more about this, the best research study explaining why doing just this will set any state up for lawsuits comes from Brown University’s John Papay in his highly esteemed and highly cited “Different tests, different answers: The stability of teacher value-added estimates across outcome measures” article. The title of this research article alone should explain enough why simply positioning and offering up such tests in such casual ways makes way for legal recourse.

Fourth, the bill is to increase the number of years it will take Alabama teachers to earn tenure, requiring that teachers teach for at least five years, of which at least three  consecutive years of “satisfies expectations,” “exceeds expectations,” or “significantly exceeds expectations” are demonstrated via the state’s evaluation system prior to earning tenure. Clearly the state does not understand the current issues with value-added/growth levels of reliability, or consistency, or lack thereof, that are altogether preventing such consistent classifications of teachers over time. Inversely, what is consistently evident across all growth models is that estimates are very inconsistent from year to year, which will likely thwart what the bill has written into it here, as such a theoretically simple proposition. For example, the common statistic still cited in this regard is that a  teacher classified as “adding value” has a 25% to 50% chance of being classified as “subtracting value” the following year(s), and vice versa. This sometimes makes the probability of a teacher being consistently identified as (in)effective, from year-to-year, no different than the flip of a coin, and this is true when there are at least three years of data (which is standard practice and is also written into this bill as a minimum requirement).

Unless the state plans on “artificially conflating” scores, by manufacturing and forcing the oft-unreliable growth data to fit or correlate with teachers’ observational data (two observations per year are to be required), and/or survey data (student surveys are to be used for teachers of students in grades three and above), such consistency is thus far impossible unless deliberately manipulated (see a recent post here about how “artificial conflation” is one of the fundamental and critical points of litigation in another lawsuit in Houston). Related, this bill is also to allow a governing board to evaluate a tenured teacher as per his/her similar classifications every other year, and is to subject a tenured teacher who received a rating of below or significantly below expectations for two consecutive evaluations to personnel action.

Fifth and finally, the bill is also to use an allocated $5,000,000 to recruit teachers, $3,000,000 to mentor teachers, and $10,000,000 to reward the top 10% of schools in the state as per their demonstrated performance. The latter of these is the most consequential as per the state’s use of its planned growth data at the school level (see other teacher-level consequences to be attached to growth output above); hence, the comments about needing empirical evidence to justify such allocations prior to the distributions of such monies is important to underscore again. Likewise, given levels of bias in value-added/growth output are worse at the state versus teacher levels, I would also caution the state against rewarding schools, again in this regard, for what might not really be schools’ causal impacts on student growth over time, after all. See, for example, here, here, and here.

As I also mentioned in my prior post on Alabama, for those of you who have access to educational leaders there, do send them this post too, so they might be a bit more proactive, and appropriately more careful and cautious, before going down what continues to demonstrate itself as a poor educational policy path. While I do embrace my professional responsibility as a public scholar to be called to court to testify about all of this when such high-stakes consequences are ultimately, yet inappropriately based upon invalid inferences, I’d much rather be proactive in this regard and save states and states’ taxpayers their time and money, respectively.

Tennessee’s Trout/Taylor Value-Added Lawsuit Dismissed

As you may recall, one of 15 important lawsuits pertaining to teacher value-added estimates across the nation (Florida n=2, Louisiana n=1, Nevada n=1, New Mexico n=4, New York n=3, Tennessee n=3, and Texas n=1 – see more information here) was situated in Knox County, Tennessee.

Filed in February of 2015, with legal support provided by the Tennessee Education Association (TEA), Knox County teacher Lisa Trout and Mark Taylor charged that they were denied monetary bonuses after their Tennessee Value-Added Assessment System (TVAAS — the original Education Value-Added Assessment System (EVAAS)) teacher-level value-added scores were miscalculated. This lawsuit was also to contest the reasonableness, rationality, and arbitrariness of the TVAAS system, as per its intended and actual uses in this case, but also in Tennessee writ large. On this case, Jesse Rothstein (University of California – Berkeley) and I were serving as the Plaintiffs’ expert witnesses.

Unfortunately, however, last week (February 17, 2016) the Plaintiffs’ team received a Court order written by U.S. District Judge Harry S. Mattice Jr. dismissing their claims. While the Court had substantial questions about the reliability and validity of the TVAAS, the Court determined that the State satisfied the very low threshold of the “rational basis test,” at legal issue. I should note here, however, that all of the evidence that the lawyers for the Plaintiffs collected via their “extensive discovery,” including the affidavits both Jesse and I submitted on Plaintiffs’ behalves, were unfortunately not considered in Judge Mattice’s motion to dismiss. This, perhaps, makes sense given some of the assertions made by the Court, forthcoming.

Ultimately, the Court found that the TVAAS-based, teacher-level value-added policy at issue was “rationally related to a legitimate government interest.” As per the Court order itself, Judge Mattice wrote that “While the court expresses no opinion as to whether the Tennessee Legislature has enacted sound public policy, it finds that the use of TVAAS as a means to measure teacher efficacy survives minimal constitutional scrutiny. If this policy proves to be unworkable in practice, plaintiffs are not to be vindicated by judicial intervention but rather by democratic process.”

Otherwise, as per an article in the Knoxville News Sentinel, Judge Mattice was “not unsympathetic to the teachers’ claims,” for example, given the TVAAS measures “student growth — not teacher performance — using an algorithm that is not fail proof.” He inversely noted, however, in the Court order that the “TVAAS algorithms have been validated for their accuracy in measuring a teacher’s effect on student growth,” even if minimal. He also wrote that the test scores used in the TVAAS (and other models) “need not be validated for measuring teacher effectiveness merely because they are used as an input in a validated statistical model that measures teacher effectiveness.” This is, unfortunately, untrue. Nonetheless, he continued to write that even though the rational basis test “might be a blunt tool, a rational policymaker could conclude that TVAAS is ‘capable of measuring some marginal impact that teachers can have on their own students…[and t]his is all the Constitution requires.”

In the end, Judge Mattice concluded in the Court order that, overall, “It bears repeating that Plaintiff’s concerns about the statistical imprecision of TVAAS are not unfounded. In addressing Plaintiffs’ constitutional claims, however, the Court’s role is extremely limited. The judiciary is not empowered to second-guess the wisdom of the Tennessee legislature’s approach to solving the problems facing public education, but rather must determine whether the policy at issue is rationally related to a legitimate government interest.”

It is too early to know whether the prosecution team will appeal, although Judge Mattice dismissed the federal constitutional claims within the lawsuit “with prejudice.” As per an article in the Knoxville News Sentinel, this means that “it cannot be resurrected with new facts or legal claims or in another court. His decision can be appealed, though, to the 6th Circuit U.S. Court of Appeals.”

Everything is Bigger (and Badder) in Texas: Houston’s Teacher Value-Added System

Last November, I published a post about “Houston’s “Split” Decision to Give Superintendent Grier $98,600 in Bonuses, Pre-Resignation.” Thereafter, I engaged some of my former doctoral students to further explore some data from Houston Independent School District (HISD), and what we collectively found and wrote up was just published in the highly-esteemed Teachers College Record journal (Amrein-Beardsley, Collins, Holloway-Libell, & Paufler, 2016). To view the full commentary, please click here.

In this commentary we discuss HISD’s highest-stakes use of its Education Value-Added Assessment System (EVAAS) data – the value-added system HISD pays for at an approximate rate of $500,000 per year. This district has used its EVAAS data for more consequential purposes (e.g., teacher merit pay and termination) than any other state or district in the nation; hence, HISD is well known for its “big use” of “big data” to reform and inform improved student learning and achievement throughout the district.

We note in this commentary, however, that as per the evidence, and more specifically the recent release of the Texas’s large-scale standardized test scores, that perhaps attaching such high-stakes consequences to teachers’ EVAAS output in Houston is not working as district leaders have, now for years, intended. See, for example, the recent test-based evidence comparing the state of Texas v. HISD, illustrated below.

Figure 1

“Perhaps the district’s EVAAS system is not as much of an “educational-improvement and performance-management model that engages all employees in creating a culture of excellence” as the district suggests (HISD, n.d.a). Perhaps, as well, we should “ponder the specific model used by HISD—the aforementioned EVAAS—and [EVAAS modelers’] perpetual claims that this model helps teachers become more “proactive [while] making sound instructional choices;” helps teachers use “resources more strategically to ensure that every student has the chance to succeed;” or “provides valuable diagnostic information about [teachers’ instructional] practices” so as to ultimately improve student learning and achievement (SAS Institute Inc., n.d.).

The bottom line, though, is that “Even the simplest evidence presented above should at the very least make us question this particular value-added system, as paid for, supported, and applied in Houston for some of the biggest and baddest teacher-level consequences in town.” See, again, the full text and another, similar graph in the commentary, linked  here.

*****

References:

Amrein-Beardsley, A., Collins, C., Holloway-Libell, J., & Paufler, N. A. (2016). Everything is bigger (and badder) in Texas: Houston’s teacher value-added system. [Commentary]. Teachers College Record. Retrieved from http://www.tcrecord.org/Content.asp?ContentId=18983

Houston Independent School District (HISD). (n.d.a). ASPIRE: Accelerating Student Progress Increasing Results & Expectations: Welcome to the ASPIRE Portal. Retrieved from http://portal.battelleforkids.org/Aspire/home.html

SAS Institute Inc. (n.d.). SAS® EVAAS® for K–12: Assess and predict student performance with precision and reliability. Retrieved from www.sas.com/govedu/edu/k12/evaas/index.html