I’ve read several articles and blog posts about the NYS tests… and all of them underscore the flaws of using standardized tests to measure the effectiveness of schools.
The “Lace To The Top” blog provides the clearest example of the flaws with standardized test scores: the cut score can be modified from year to year to alter the pass rates. As anyone familiar with standardized tests realizes, the scores are not “standard” in a pure mathematical sense. As the blog points out:
Results of the Math tests are up 4.6%, but the cut score was lowered by 3% (3rd grade). In 2013, students needed to receive 44 out of a possible 60 points in order to achieve a passing score of 3. In 2014, students needed to only receive 42 out of a possible 60 points in order to receive a passing grade of 3.
Results of the ELA tests are up 0.1%, but the cut score was lowered by 2% (3rd grade). In 2013, students needed to receive 35 out of 55 possible points to achieve a passing score of 3. In 2014, students needed to only receive 30 out of a possible 49 points to receive a passing grade of 3.
So while the perception of the public is that these “standards” are objective and mathematically invariant the reality is that they are subjective and fungible… as they must be since the questions vary from year to year. Thus, as any student who passed a course in education statistics knows, standardized tests are NOT precise. The public, however, has a different perception, a perception that is reinforced when schools are rated against each other based on numeric scales generated by the test developers and promulgated without any caveats by State Departments of Education.
The overall lack of transparency in the NYS tests was the topic of a City Limits blog post by Fred Smith last Monday. The post describes the change in the way test results were shared with teachers and the public. Before Pearson took over the testing in 2011,
SED made complete copies of the annual statewide exams available on its web site shortly after they were given, along with the answer keys. People could ponder the test content and thought processes required of students to answer the questions. Every year, no later than December, a technical report was issued giving specific information about each item.
Now, only 50% of the questions are shared and the technical report is non-existent or released too late for teachers to use the results to inform instruction. This is especially problematic given that the tests will be used to rate and rank teacher performance. The practice underscores the reality that these tests are NOT being used to inform instruction in the classroom or help students but rather being used as an evaluation tool… or more accurately being MIS-used as an evaluation tool. And though Smith does not mention it in his analysis of the test results, I believe Pearson is the culprit here: since they are writing the test they probably are using some kind of “proprietary right” argument to avoid releasing the kinds of broad data SED formerly issued– another adverse by-product of privately contracting for the test development.
Anthony Cody’s Living in Dialogue blog post recounts the sordid history of standardized testing, describing how the tests were originally designed to sort employees into manual labor assignments or office jobs. It emphasizes the explicit link between eugenics and standardized testing but overlooks the more subtle and, in my judgment, more damaging link between Taylorism and testing. The use of tests reinforces the notion that the “output” of education can be objectively measured in the same way that a manufactured product can be measured using, say, a micrometer. As noted previous posts, the standardization paradigm is necessary for those who wish to privatize education and leads to a narrowing of the curriculum because teaching to the test is more efficient than giving children the time to learn a concept deeply.
Alas, the testing shenanigans will continue until the public wakes up to the realization that standardized testing ISN’T exacting and IS undermining their children’s joy of learning.
A Diane Ravitch blog post yesterday and a New Yorker article she linked to earlier this week describe examples of the kind of dishonesty that results when “competition” is introduced into schools.
One of her blog posts yesterday describes the ongoing saga of Terrence Carter, who was announced as the incoming superintendent of schools in New London CT and was subsequently found to have lied about his credentials and plagiarized his letter of application. It seems that one of the reasons the New London School Board might have overlooked these details is that MR. Carter (not DR. Carter as he presented himself) had letters of reference from Arne Duncan, CT Governor Malloy, and CT Commissioner Stefan Pryor. Unfortunately for MR. Carter, several activist parents, Jonathan Lender of the Hartford Courant, and a professor from Connecticut College did some legwork. Here’s two excerpts from the Courant’s article:
This is how the pro-privatization, big-philanthropy-funded networks and organizations tend to work. They pass their own people along and up, greasing rails and plumping resumes as they go. And the main criteria for ‘success’ often seems not to be real leadership characteristics, so much as willingness to be a good soldier when it comes to pushing forward a particular reform agenda, said Lauren Anderson, an assistant professor of education at Connecticut College in New London.
…(Carter) filed for bankruptcy twice; his application essay included long passages identical with other educators’ writings on the Internet; a national research organization released a copy of a bio that it says Carter submitted in 2011 with the claim that he had a Ph.D. from Stanford University, which Stanford says he does not; and he got a Ph.D. in 1996 from “Lexington University” — which doesn’t have a campus and had a website offering degrees for several hundred dollars with the motto “Order Now, Graduate Today!”
Diane Ravitch’s concluding paragraph is a chillingly accurate synopsis of the privatization trend in public schools:
The Carter story is not about one man, but about the bipartisan movement to disregard credentials, to close schools, to hire ill-prepared TFA, and to favor privately managed schools over community public schools. To favor democratically elected school boards over management by hedge fund millionaires.
The New Yorker article describes how competition played out in Atlanta where the cheating scandals in the mid 2000s brought down the Superintendent and 178 staff members and resulted in students believing they had achieved academic success when they had, in fact, not improved at all. The most telling paragraph from the entire article is this:
John Ewing, who served as the executive director of the American Mathematical Society for fifteen years, told me that he is perplexed by educators’ ”infatuation with data,” their faith that it is more authoritative than using their own judgment. He explains the problem in terms of Campbell’s law,a principle that describes the risks of using a single indicator to measure complex social phenomena: the greater the value placed on a quantitative measure, like test scores, the more likely it is that the people using it and the process it measures will be corrupted. “The end goal of education isn’t to get students to answer the right number of questions,” he said. “The goal is to have curious and creative students who can function in life.” In a 2011 paper inNotices of the American Mathematical Society, he warned that policymakers were using mathematics “to intimidate—to preëmpt debate about the goals of education and measures of success.”
Mr. Ewing needs to know that educators are not infatuated with use of data to measure school or teacher performance: politicians, economists, and those who want to “run schools like a business” are the ones who believe a “…single indicator can be used to measure complex social phenomenon“… and they believe this because it is a cheap, easy, and fast way to make changes to school.
Both of these articles illustrate what can happen when competition occurs in an unregulated or lightly regulated atmosphere… people cut corners to get ahead. We’ve seen it in professional sports, in banking, and in politics. Competition isn’t a bad thing if the game isn’t rigged and/or the rules are clearly understood and impartially regulated. Unregulated competition will always result in cheating.
Both Diane Ravitch and Cathy O’Neill cited the news that Chris Christie has announced a major change in the use of VAM to evaluation teachers, with Diane Ravitch providing a link to an article in the NorthJersey.Com web site where a detailed article appeared. According to the article, Christie announced he intended to rollback the use of VAM from 30% to 10%. To help him determine the future use of VAM to higher levels, he issued an executive order creating a commission that would “…review the effectiveness of all K-12 tests used to assess student knowledge… look at volume, frequency and impact of student testing throughout New Jersey school districts.” But here’s the kicker (with my emphases added:
Christie will appoint all nine commission members, who should have expertise or experience in education policy or administration, according to his order. The commission will issue an initial report with recommendations by Dec. 31, and a final report seven months later.
The commission “should have” this expertise… but if Christie’s appointments to State operated districts are any indication of the kind of qualifications he will be using to identify people with “expertise” no one in New Jersey should be assured the commission will know anything about the construction of standardized tests. Instead of requiring anyone on the commission to have expertise in psychometrics Christie is appointing people with expertise in “education policy or administration”. As one with expertise in both of those spheres I can assure readers there are few “education policy or administration” folks who know anything about psychometrics… and that one reason why so many of them have bought into VAM. If Christie is sincere about improving the use of tests in NJ he should appoint someone like Cathy O’Neill (aka the Mathbabe) or someone from FairTest to serve on this commission.
Finally, a political sidebar: the report of this commission will be issued in July 2015— right about the time the Christie for President campaign will be ramping up. Should be interesting times in NJ!
Malcolm Gladwell’s book, Outliers, posited that in order to become successful one had to practice at a skill or craft for at least 10,000 hours. This “10,000 hour rule” got twisted to read that IF one practiced a craft for at least 10,000 hours they would become successful— a small but significantly different conclusion than Gladwell made. Gladwell suggested that 10,000 hours was a necessary but not sufficient cause of success. The twisted rule suggests that 10,000 hours was necessary and sufficient…. and the twisted rule is the one that fond its way into the mainstream media and became a statement of faith and not an evidence based conclusion.
Sunday’s NYTimes had an article titled “The Limits of Practice” that refuted Gladwell’s “findings” regarding the 10,000 hour rule. The article acknowledged that Gladwell fruitlessly attempted to clarify his findings in a subsequent New Yorker article but nevertheless picked up on the flawed “10,000 hour” meme, (superfluously) demonstrated that it was wrong. In the comment section I made the following observation:
This line jumped out at me after reading Nyhan’s article on Sunday: “But despite this, and despite recent research, the idea that you can be awesome at anything with 10,000 hours of work continues to hold sway.” The 10,000 hour meme, like many faith based memes, is easy to remember, is intuitively appealing, and ESPECIALLY appealing to those who want to shift the burden away from government to the individual. After all, if doing well in school is just a matter of hard work and grit and nothing to do with being born into the right zip code, then no government intervention is needed to improve the lives of children or to improve education.
The “10,000 hour rule” is another in a long line of bad idea holds sway in the minds of the public because IF it is true difficult decisions can be avoided. If trickle down economics works we don’t need to tax people who are earning a lot of money; if VAM works we don’t need to spend money on administrators to coach and counsel bad teachers out of the profession; if the “10,000 hour rule” works we don’t need to provide support to children— THEY need to work harder and have more grit. Bad ideas are uncomplicated, intuitively appealing, and require little or no effort or cost to implement. That’s why they stick even if they are proven fully or partially false.
As is almost always the case, the Mathbabe, Cathy O’Neill, has written a blog post full of thought provoking questions on the use of data. The big question O’Neill poses in today’s post is this: “What Constitutes Evidence?” She poses this question in the context of the collection of health data that will presumably be used to rank and evaluate doctors and patients in ways that have not been clearly defined but seem to be universally accepted. She asks readers if she is paranoid in her fear that this data, once collected, will be misused? Virtually all her readers assure her that she is NOT irrational in her fears or her concerns. I share her concern… but also fear that this will be yet another instance of us using quantification in lieu of judgment… and when that occurs, bad things often follow. I left the following comment to her post, providing an example from education where statistical algorithms have replaced human judgment… namely… VAM:
What evidence is there that ANY statistical models used to draw conclusions from aggregated data are valid and reliable? We want to believe that there is a way to quantify things to replace human judgment. Human judgment based on data can be flawed and colored by misjudgment… but, in a well conceived bureaucracy it can be subjected to further review by others. We seem to think that “dispassionate quantification” that substitutes precision for accuracy requires no review.
An example of this bogus quantification is best found in education where statistically flawed algorithms are used as the basis for judging performance with no recourse available. You’ve written frequently and eloquently about the VAM hoax… what makes you think the medical field is going to develop a more sophisticated means of “scoring” doctors or patients?
Here’s what I observe happening: we have no faith in fairness or due process and less faith in “the government” or “the system” and WAY too much faith in quantification… Readers of this blog know that I have questions and concerns about the officials we’ve elected to lead us and some of the decisions these officials made… but I prefer an imperfect elected human being to an unelected computer program developed by a mid-level contract employee who’s company submitted the lowest bid for a project.
I read with interest today’s editorial in the NYTimes regarding the Supreme Court’s 5-4 decision to deny the execution of a murderer because of the inexact legal definition of a defendant’s “diminished capacity” to reason. It seems that FLA law defined “diminished capacity” based solely on one test score: IQ. Writing the majority opinion Justice Kennedy opined:
…the state’s “rigid rule” violated the Constitution because it “disregards established medical practice” by taking a test score as the final word on a defendant’s intellectual capacity, and by refusing to consider the imprecision inherent in such tests.
“Intellectual disability is a condition, not a number,” Justice Kennedy wrote. His opinion relied heavily on the consensus of mental-health professionals that a diagnosis of intellectual disability depends on both “significantly subaverage” intellectual functioning and major deficits in adaptive behaviors like self-care and interpersonal skills. I.Q. is, they say, an approximate measure of intellectual function, and people can be disabled even if they score above 70. Florida, Justice Kennedy noted, did not cite a “single medical professional” who supported the strict cutoff.
So… when the day comes that a teacher sues a State over VAM, will the state be able to “…cite a single professional statistician” who can attest to the “precision inherent” in the tests? If the highest court in the law recognizes that IQ tests are inherently imprecise, how can the advocates of VAM hope to make their use of standardized tests stand up in court?