A week ago, the NYTimes ran an article by Ron Haskins, a self-described “…policy analyst who helped House Republicans design the 1996 welfare overhaul and who later advised President George W. Bush on social policy”. Titled “Social Programs That Work”, the op ed piece extolled the virtues of the Obama administration’s efforts to use evidence to determine the effectiveness of various social programs. In the article, Haskins offered examples of popular programs whose effectiveness NOT supported by evidence (e.g. HeadStart and DARE) and several programs whose effectiveness IS supported by evidence.
I appreciate the de facto bipartisanship inherent Haskins support for a Democratic administration’s initiative, but need to point out two flaws with the evidence being collected. First, the metrics used to define “success” are standardized test scores and easily gathered objective data like drop-out rates, teen pregnancy rates, and suspension rates. As noted repeatedly in earlier blog posts, there is more to schooling than these measures. Secondly, the expansion of the successful programs will require a re-thinking of schooling and social services… a change of the dominant paradigm for schools away from the factory model. And that change is not where in sight anywhere. The best example of the need for paradigmatic change is the Success for All program:
Success for All, a comprehensive schoolwide reform program, primarily for high-poverty elementary schools, emphasizes early detection and prevention of reading problems before they become serious. Students of various ages who read at the same performance level are grouped together and receive daily, 90-minute reading classes, as well as one-on-one tutoring and cooperative learning activities.
The success of Success For All would be even more astronomical if those “students of various ages” were tested based on their marginal improvement over time instead of being measured as compared to students of the same age… but doing that would require an abandonment of the deeply-rooted practice of age-based grade levels and, more crucially, the administration of standardized tests that rank students based on comparisons with their age peers. Furthermore, implementing a school wide program would preclude the evaluation of individual teachers based on standardized test scores. Finally, organizing instruction to provide one-on-one tutoring and cooperative learning activities requires the abandonment of teacher centered programs, further eroding the ability to evaluate individual teachers based on test scores.
All of this underscores the one major error in the Obama administration’s use of evidence-based decision making: there is NO evidence that VAM works and NO evidence that the decades of standardized testing has improved opportunities for economically disadvantaged students… and yet… these practices persist… and both VAM and standardized testing reinforce the dominant paradigm where the teacher of students in an age-based grade level cohort is solely responsible for the academic performance of the child. And there is NO evidence that this is the case.
Earlier this week, Richard Kahlenberg wrote a post for The Nation with the provocative title “Standardized Tests are Weakening Our Democracy”. Using a review of The Tyranny of the Meritocracy, Lani Guinier’s recent book, as the jumping off point, Kahlenberg describes how schools— and especially colleges— use standardized tests as a proxy for “merit”. He then offers Guinier’s definition of “merit” and shows how that definition has no relationship to test scores.
Guinier does not want to destroy the concept of merit, but to “redefine” it to go beyond “student performance on standardized tests.” She suggests we shift “from honoring testocratic merit to honoring democratic merit.”
“Democratic merit,” Guinier explains, goes far beyond examining test scores… Today, she says, our society should value people who combine two sets of attributes: (1) knowing how to solve problems, which requires not just cognitive skills but also the ability to collaborate with others, and to think creatively; and (2) a “commitment to building a better society for more people” rather than just pursuing one’s own selfish ends.
Guinier’s book deals with the difficulties in achieving truly equal access to colleges when test scores are given a high priority in the admissions process, asserting that the current link between test scores and affirmative action is flawed:
Affirmative action tends to “simply mirror the values of the current view of meritocracy,” Guinier notes. Colleges tend to admit “the children of upper-middle[-]class parents of color who have been sent to fine prep schools just like the upper-middle[-]class white students.” Universities often seek what Guinier calls “cosmetic diversity” of wealthy black students, many of whom are recent immigrants. One study, she notes, finds that “more than 90 percent of parents of Harvard’s African students had advanced degrees.”
Moving away from “testocratic” merit will be a difficult task. Why? Because there is a high correlation between affluence, education, and high test scores and those who achieved high test scores in the past believe their subsequent success in school and life was a result of “merit” based on those test scores and anything that is substituted for “objective” test scores will be “subjective” and therefore difficult to measure. If standardized tests are not used define “merit” what could take their place? Guinier has a solution that has a track record:
Guinier suggests that universities, rather than considering race as a way of papering over deeper inequalities, should turn to a more transformative model of admissions, which is illustrated by the Posse Foundation. Founded in 1989 by Debbie Bial, Posse seeks to identify students of all colors from disadvantaged neighborhoods who embody democratic merit. Through intensive interviews and group projects, Posse picks students who show grit, who demonstrate that they can collaborate with others, think creatively and show leadership. Many of these students end up involved in public service. The former admissions dean of Middlebury College told Guinier he strongly supports Posse. “What’s more important,“ he asked her: “someone with all As or someone with some Bs who goes out and makes a difference in the world?”
As long as applicants (and/or their parents and/or admissions officials) believe standardized tests have more credence than “grit”, the ability to collaborate, creative thinking, and leadership we will see A students with high SATs displacing B students who have shown they can go out and make a difference in the world.
Several political forecasters believe that the incoming congress will at long last revisit and revise NCLB as part of the overdue reauthorization of ESEA and as part of that process the annual tests that are part of NCLB will be modified in some way. The reasons for moving away from those annual tests are varied. Teachers unions oppose the way the tests are being used for student promotion and teacher retention decisions. Parents oppose them because they create undue stress in their children and deprive them of broad curricular opportunities. Rank-and-file teachers oppose them because of the lost time for classroom instruction. So… who wants them and why?
The answer is that politicians and taxpayers want some kind of “accountability” and standardized tests provide a cheap, easy, and seemingly precise means of evaluating the effectiveness of instruction. As a by-product, tests also provide newspapers with a cheap, easy, and seemingly precise way of rating and ranking schools. Finally, the tests also provide those favoring “market-based” schools with evidence that public schools are “failing” and provide data that “parent-consumers” might use to “choose” the best school for their child if only those parents received vouchers that could be redeemed at the schoolhouse door.
A recent Politico article by Caitlin Emma described the background offered in the first paragraph but avoided delving into the rationales for testing outlined in the second paragraph. Moreover, I felt Emma was too generous in giving credit to state legislatures and Governors who are pledging to limit the hours of testing each year. She misses the point that HOW the tests are used is far more important than HOW MANY HOURS the tests take.
For example, if Ohio passed a law insisting that only four hours of standardized testing be done each year but retained its policy of using test results to measure teacher performance, school effectiveness, and student retentions, nothing would change in he classrooms. In order to prepare students for the four hours of annual high stakes tests administrators and teachers will spend hours preparing for those tests and many of those hours will involve administering tests that mirror the format used in the annual standardized assessments.
Emma’s article offers an example of one politician who wants to find an even cheaper, easier and less effective means of measuring school and teacher performance while limiting the hours of testing. CT Governor Malloy wants to use the SAT for 11th graders in lieu of the current tests because “…the SAT can double as a test for school accountability and college entrance.” Given that Malloy is a proponent of VAM it follows that he would believe the SAT is a viable means of testing “school accountability”… and if he DOES pull this off I would expect the number of multiple choice tests to proliferate in grades 9-11 in CT schools.
The bottom line: changing the number of standardized tests and the length of classroom time spent on those tests will not make any difference in the classroom unless the stakes are changed dramatically…. and there is no evidence that any politician wants to make changes in the stakes.
The Rethinking Schools Facebook feed just posted Valerie Strauss’ April 2013 essay titled “Statisticians Slam Popular Teacher Evaluation Method”. In this post, which addresses a topic I’ve blogged about earlier and quite often. Strauss reports on the findings of the American Statistical Association, a bona fide, objective, longstanding professional organization, regarding VAM or Value Added Method of evaluation:
The ASA just slammed the high-stakes “value-added method” (VAM) of evaluating teachers that has been increasingly embraced in states as part of school-reform efforts. VAM purports to be able to take student standardized test scores and measure the “value” a teacher adds to student learning through complicated formulas that can supposedly factor out all of the other influences and emerge with a valid assessment of how effective a particular teacher has been.
These formulas can’t actually do this with sufficient reliability and validity, but school reformers have pushed this approach and now most states use VAM as part of teacher evaluations. Because math and English test scores are available, reformers have devised bizarre implementation methods in which teachers are assessed on the test scores of students they don’t have or subjects they don’t teach. When Michelle Rhee was chancellor of D.C. public schools (2007-10), she was so enamored with using student test scores to evaluate adults that she implemented a system in which all adults in a school building, including the custodians, were in part evaluated by test scores.
This is an 8 month old article… and Rhee’s methods are so discredited that she is persona non grata (and at this writing jobless)… yet Governors like Cuomo and Kasich STILL believe it is valid as does the “evidence based” advocate Arne Duncan… I have come to the conclusion that believing in VAM is the neoliberal equivalent of believing in intelligent design.
At first I thought it Rethinking Schools was being lazy in publishing this essay on its feed eight months later… and I was reluctant to post yet again on VAM’s flaws… but I have concluded that repetition reinforces learning and increases awareness and so I am following Rethinking Schools lead and hammering away at this issue. Until VAM is as discredited it will continue to appeal to politicians and members of the public who want to find a cheap, inexpensive, and easy solution to “fixing filing schools” and VAM gives them apparent “scientific evidence” that teachers are the problem. There’s one problem with this conclusion: it’s wrong!
Two recent articles, one on Cuomo’s dismay at the low number of “failing teachers” from the NYTimes and another provided by my daughter from blogger Carol Burris via Valerie Strauss’ Washington Post Answer Sheet blog lament NY Governor Cuomo’s insistence that the State double down on the use of value added metrics (VAM) to measure teacher performance. Why? Because he and the Regents expected the VAM tests to prove public schools were “failing” because of bad teachers. When the VAM tests were administered for two years and fewer than 1% of the teachers failed, Cuomo decided to veto a bill that would have protested those teachers from losing their jobs. Why? Because he expected 10% to lose their job! This quote from the Times article with my emphasis added provides insights into Cuomo’s arrogance:
“Given what we now know, it would make no sense to sign this bill and inflate these already inflated ratings,” Mr. Cuomo wrote in his veto message.
What did we know: the tests that were designed to identify at least 10% of the teachers as failures were not emphasized enough in the ratings. Consequently, the teachers’ ratings were inflated because of an over-emphasis on, and this is a quote from the NYTimes reporter Kate Taylor,
…subjective measures, like principals’ evaluations, which in many districts were overwhelmingly favorable to teachers.
Before Governor Cuomo, Regents Chair Tisch, and reports like Kate Taylor characterize “principals’ evaluations” as “subjective” they might want to take a look at a sampling of the methods used. And before they claim an arbitrary percentage of teachers are “deficient” they might want to compare notes with corporate leaders who long ago abandoned the idea that ridding themselves of the lowest performers was the road to success. And last but not least, they ought to check with Principals like Carol Burris who know that bad teachers cannot succeed in schools with engaged parents and, as a result, schools with engaged parents don’t need VAM tests. Here’s Burris’ description of parents’ reactions when Sheri Lederman, one of her stellar teachers, received low VAM scores:
(P)arents have been disinterested in APPR scores. Although they can request their child’s teacher’s APPR score, not one parent in my district has asked for it during the two years that APPR has been in effect. Most principals report that parents simply do not care. Teachers like Sheri have a great reputation because of the years of loving care and great instruction they have given their students. Moms don’t need a score to know that.
Unfortunately, “…loving care and great instruction” can’t be reduced to a number, displayed on a spread sheet, and ranked. And any experienced educator know that the VAM tests can’t capture “…loving care and great instruction”. That can only be evaluated by “subjective” Principals…and Moms.