“Giving Doctors Grades“, an op ed article in today’s NYTimes by Sandeep Jauhar, describes the consequences of using simplistic metrics to determine the effectiveness of a complex operation: heart surgery. In the early 1990s, NYS decided to issue “Report Cards” to surgeons in an effort to provide easy-to-understand information on the ability of various medical practitioners. The result?
(T)he report cards backfired. They often penalized surgeons, like the senior surgeon at my hospital, who were aggressive about treating very sick patients and thus incurred higher mortality rates. When the statistics were publicized, some talented surgeons with higher-than-expected mortality statistics lost their operating privileges, while others, whose risk aversion had earned them lower-than-predicted rates, used the report cards to promote their services in advertisements.
This was an insult that the senior surgeon at my hospital could no longer countenance. “The so-called best surgeons are only doing the most straightforward cases,” he said disdainfully.
This sounded VERY familiar to me… and I left the following comment:
This wrongheaded method of measuring the performance of surgeons is analogous to the “Value Added” evaluation methods promoted by “school reformers” and adopted by Arne Duncan, Andrew Cuomo, the Regents, and host of other governors and State Boards. The standardized test scores used to “measure” teacher performance mirror the economic standing of the parents. Consequently teachers who choose to work with the most challenging students, like the surgeons who tackle the riskiest cases, could lose their jobs. Grading schools using test scores only serves to humiliate the entire faculty who choose to work with children raised in poverty. Both of these failed metrics have one thing in common: they are attempts to bring mathematical precision to fields of endeavor that are crafts more than sciences.
The notion that service organizations should be run like businesses leads to the need for “precise” metrics like mortality rates and VAM to be used in lieu of “the bottom line” so revered by businessmen. But service enterprises do not provide neat and tidy outcomes: they defy the kinds of measures that can be used to develop “stack ratings” or “grades” because they serve individuals who have different backgrounds, temperaments, and physical compositions. The desire to reduce everything to a single number to rank employees using some kind of “objective criteria” is ultimately a means to replacing the judgement of human managers with algorithms. It has not worked in the past and is unlikely to work in the future— unless the future is led by robots.
Today’s NYTimes column by Paul Krugman describes the flawed thinking by political leaders in Europe that led to their embrace of an idea that was deeply flawed according to a host of economists. Because of my personal background and biases, I immediately saw a connection between the flawed thinking regarding the Euro and the flawed thinking that led to VAM sweeping the country. VAM, like the Euro, seems like a logical and efficient solution to a complicated problem. But, as H.L. Mencken wrote, every complicated problem has a solution that is simple, clean, and wrong… and VAM is as simplistic and intuitively appealing as the Euro. After reading Krugman’s essay I left this comment:
This same kind of magical thinking by “…self-indulgent politicians” who “…ignore arithmetic and the lessons of history” is happening right now in public education. There is no mathematical or statistical basis for rating schools or teachers based on the standardized test scores of children yet governors, state legislators, and “reformers” all champion the idea. Why? To paraphrase Mr. Krugman: To people who didn’t know much about statistics, or chose to ignore awkward questions regarding the repeatedly demonstrable evidence that test scores correlate with income, using tests to measure performance sounded like a great idea…. and it is an idea that appears to be persisting in the bills before Congress today.
To paraphrase Mr. Krugman once more: The only big mistake of the “school reformers” was underestimating just how much damage the emphasis on standardized test scores would do to public education… or MAYBE that’s not a bug of the legislation— it’s a feature.
One of the main reasons I hope the ECAA bill in Congress never passes is that it insisted States use standardized tests as the primary metric for performance and allows States to continue using VAM if they choose to do so. We’ve had six years of VAM: another six years will make it even harder to rid the schools of this flawed idea and all of the schools serving children raised in poverty will be treated like Greece is being treated today.
Last week NYTimes columnist Paul Krugman wrote an op-ed piece titled “Fighting the Derp“. What is Derp?
“Derp” is a term borrowed from the cartoon “South Park” that has achieved wide currency among people I talk to, because it’s useful shorthand for an all-too-obvious feature of the modern intellectual landscape: people who keep saying the same thing no matter how much evidence accumulates that it’s completely wrong.
Over the past few days I’ve accumulated examples of Derp in public education. The most obvious example came to me in an instant” the belief that standardized testing will improve schools is the derp of public education “reform”. When I started my career as a school administrator in the mid 1970s Pennsylvania administered state standardized tests to determine which schools were doing best. Since that time I witnessed the advent of state tests in NH, MD and NY and to no one’s surprise the results never changed: schools in affluent communities and neighborhoods ALWAYS outperformed schools serving children raised in poverty. Now we have derp on steroids: a Secretary of Education who— despite evidence to the contrary presented by a national professional association of statisticians— believes standardized tests can be used to measure teacher performance. After decades of testing that has not improved the results one would think another idea might be tried… but that would upset the “school reformers” who want to be reassured in their beliefs.
Other examples of Derp in public education include:
- The belief that the development of grit and resilience in children— not additional funds for schools— are the secret sauce that children raised in poverty need in order to succeed in school.
- The belief that there is some way to scale up successful charter programs that are supplemented with grant funds without increasing the public funding needed for the replicated schools.
- The notion that if parents could select schools the way they select appliances that there would be more equity in education… despite the fact that affluent districts typically do not accept out-of-district students unless they pay full-price tuitions and charter schools have admissions standards that limit their enrollments.
- The idea that since “Government is the problem” and public schools are government schools they are blocking innovations and advances that would be possible if they were run like businesses.
- The notion that unions are the primary problem with school performance, a notion that persists despite the fact that the lowest performing schools on NAEP are in the south where unions are weakest.
- Our mental models concerning the grouping of children in age cohorts is pervasive and unshakeable and, as noted frequently in this blog, is one that drives many of the misguided “reforms”.
I am confident that this list is incomplete and welcome additions and corrections….
ELizabeth Harris’ article on a judges decision regarding the racial bias could have an impact on all graduation or grade-level promotion examinations across the country if someone took this case to it’s logical conclusion. According to Harris’ summary of the case, Judge Kimba Woods decision that the Liberal Arts and Sciences Test 2 (or LAST) turned on the fact that the test’s “content objectives” were irrelevant and unimportant to teaching.
The judge found that National Evaluation Systems, now called Evaluation Systems, part of Pearson Education, went about the process backward.
“Instead of beginning with ascertaining the job tasks of New York teachers, the two LAST examinations began with the premise that all New York teachers should be required to demonstrate an understanding of the liberal arts,” Judge Wood wrote.
As Judge Kimba’s quote indicates, this isn’t the first employment test that failed to pass muster because it was discriminatory. It’s predecessor, the LAST-1 was found wanting because it, too, unfairly discriminated based on race…. and the test to replace the LAST-2, which is no longer in use, is currently under review by courts.
“Reformers” AND policy makers should take heed at these findings when they advocate the use of “high stakes tests” for two reasons. First, not all skills can be assessed using pencil and paper tests… and the job tasks associated with teaching have more to do with relating to children and peers than demonstrating an understanding of liberal arts and science. Secondly, before using any “high stakes” tests that determine the long term fate of students or teachers policy makers should be certain that the tests measure requisite skills and not cultural or ethnic background.
So while courts are reviewing the LAST and finding it wanting and as a result school districts are paying damages to roughly 3900 people who “failed” the test and took substitute jobs as a result, schools and teachers are being evaluated on tests designed by Pearson that supposedly measure the students’ mastery of the common core which supposedly indicates a student’s readiness for work or college… And the results of those tests, like the results of LAST, seem to have a racial bias. In the case of LAST,
…the pass rate for African-American and Latino candidates was between 54 percent and 75 percent of the pass rate for white candidates. Once it was established that minority applicants were failing at a disproportionately high rate, the burden shifted to education officials to prove that the skills being tested were necessary to do the job; otherwise, the test would be ruled discriminatory.
Given the results of the standardized tests administered over the past several decades it is evident that they discriminate against children raised in poverty. Here’s an interesting question for the Regents: could they prove that the new Common Core Tests assess the skills necessary to enter college or the workforce? If not, the tests they are using to evaluate students, schools and teachers would be ruled discriminatory.
Channel 10, Philadelphia’s CBS affiliate, recently broadcasted an interview with West Chester PA Superintendent Jim Scanlon, who’s letter to the newly elected governor got their attention. The broadcast was good news for three reasons:
- West Chester is one of the best districts in the state (Disclosure: I graduated from West Chester High in 1965) and, therefore, criticism brought forth by it’s Superintendent has a high level of credibility
- Dr. Scanlon did great job of answering the questions the reporter directed… despite the reporter’s efforts to make his response combative Scanlon remained composed and offered some specific examples of the ridiculous nature of the tests without using that terminology.
- Governor Wolf’s résponse was heartening. It appears that he will move away from the single test metric of his predecessor and seek feedback from a wide array of groups as opposed to relying on a small band of conservative legislators and “reformers”.
Maybe better times will be coming in PA… stay tuned!
I looked at this cartoon and thought of another caption:
The GOP’s and Neo-Liberals Favorite Branch of Statistics: VAM
To make the caption work we’d need TWO elephants: the one in the picture and a clone of the one in the picture with the word “DINO” splashed on the side in graffiti-style writing.
I wish MaryEllen Elia well… but based on the article in the NYTimes announcing her appointment, I sense that in the coming weeks I may be finding some areas of serious disagreement with her OR she may be finding herself in disagreement with the Regents and/or the Governor.
In Ms. Elia’s most recent assignment as Superintendent of Hillsborough County Superintendent in FL, she was the recipient of a $100,000,000 grant from the Gates Foundation to improve the teacher evaluation system. Under the new system she implemented, 40 percent of teachers’ evaluations were based on “…their students’ improvement on tests” (VAM), and 60 percent was based on “….observations by principals and peers“. During Ms. Elia’s ten year tenure she Hillsborough also instituted a merit pay system that “…allowed some new teachers to earn more than veteran teachers”. These “reforms” would clearly warm the hearts of the Regents and Governor Cuomo. However, Ms. Elia was also part of “…a group of Florida superintendents who asked the State Board of Education to suspend consequences for schools and students in the first year of the tests“, presumably because the transition was too fast. THAT position would contradict the previous Commissioner’s stance and would fly in the face of the Governor’s impatience. Her willingness to advocate for a phasing in DID win her the qualified support of the union leadership in NYS:
The president of the American Federation of Teachers, Randi Weingarten, offered tempered praise, saying in a statement that while the union was “opposed to high-stakes testing” and grading teachers on students’ test performance, “even when MaryEllen applied it as required under Florida law, she made collaboration her mantra.”
If past actions are the best predictor of her future, Ms. Elia is more likely to lose Randi Weingarten’s support than the Regent’s support… and VAM is likely to proceed on schedule in NYS. This paragraph in the Times article explains why:
She will make these decisions against a backdrop of low test scores, well-funded special-interest groups, angry parents and a State Legislature that has become more active on education policy and has been in conflict with Gov. Andrew M. Cuomo.
The Times may believe that the “well funded special interest groups” are supportive of teachers… but anyone keeping a balance sheet knows that the for-profit charter lobby has much more money than the unions and much more support in the legislature than those pesky parents who want their public schools to provide something more than test preparation exercises. I hope I’m wrong about VAM… but I’m afraid money will trump common sense and the needs of children.