March 10, 2019

Earlier this week NYTimes published a story by reporters Dana Goldstein and Manny Fernandez titled “Texas Says Most of Its Students Aren’t Reading at Grade Level. But Are Its Tests Fair?” The answer to the question posed in the headline was unsurprisingly “NO!”. The article described flaws in Texas’ latest standardized tests, the State of Texas Assessments of Academic Readiness, or Staar, that is used to determine if individual students are reading on grade level  and if the schools are meeting standards based on their students’ performance. The article present the battle in Texas as a microcosm of the national battle school reformers are now fighting:

The battle over reading in Texas is the latest in a national war over the future of education reform. From teacher picket lines to the halls of state capitols, public school educators and their political allies are pushing back against decades of laws they say have been punitive to traditional schools.

A persistent narrative of failure, backed by low student test scores, has undermined the public’s trust in local education systems, critics say, and has paved the way for policies that shift students and taxpayer dollars toward charter schools and private school vouchers.

On the other side of the debate are school reformers who contend that tough accountability systems like Staar are a civil rights imperative, and that they protect low-income students and students of color from what President George W. Bush famously called “the soft bigotry of low expectations.”

Ms. Goldstein and Mr. Fernandez then explain how the tests yield two different scores for students: one that places students in four bands:

“did not meet grade level,” which means a student failed the test and, in some grades, could be held back; “approaches grade level,” which means a student… did not meet all expectations and will be targeted for extra help; “meets grade level”; and “masters grade level.”

And a “Lexile score”, which determines the kinds of reading material a student would be assigned. The problem is that in some cases the two scores are in conflict with each other with the Lexile score giving the student an effectively higher score than the one used to determine the mastery of the individual student and the performance of the school.

The article does a good job of framing this battle, and a decent job of explaining the test scores, but it does a poor job of pointing out that ANY conversion of a raw score to a “grade level” score or a “mastery” score is a statistical artifact. The dilemma is that “grade level” itself is an artifact: it’s based on the wholly false premise that all children learn at the same rate and that batching children by age cohorts is the most efficient and effective means of delivering instruction and measuring student performance based on this premise is a valid means of determining the effectiveness of a school.

Maybe an article in a widely read newspaper is not the place to present this reality… but by accepting this wholly artificial construct for organizing schools and the statistical artifact called “grade level” that results from the adoption of this construct the NYTimes implicitly supports the rating systems that buttress the test-and-punish “reform movement” and the high-stakes testing that is used to sort and select students.

