Over 15 years, Todd Farley worked throughout the standardized testing industry. He worked as a lowly scorer, a table leader (supervising the lowly scorers), a project manager, an item writer, some kind of administrator / analyst at a testing company headquarters, and a consultant. He worked for Educational Testing Service (ETS), Pearson, National Computer Systems, and others. He worked on the California High School Exit Exam, the SAT, the Nation’s Report Card (NAEP), and myriad others. From that wealth of experience, Farley draws hilarious and cringe-worthy anecdote after another, of scorers for reading tests that don’t speak English, of blatant meddling with reliability statistics, et cetera, et cetera.
I recommend the book: It was consistently entertaining, and some of the critiques are clearly important, such as the ease with which testing companies can doctor their statistics and the number of poorly qualified scorers who are grading your child’s SAT.
However, several of Farley’s critiques are inherent to any testing, including classroom testing. His first experience as a scorer describes the challenge of grading a question in which fourth graders had to read an article about bicycle safety and then draw a poster to highlight bicycle safety rules. Unsurprisingly, many of the posters were difficult to interpret. As any teacher will agree, this is a problem with any testing, not standardized testing.
At the end of the book, Farley recommends we trust the evaluations of classroom teachers (Mrs. White and Mr. Reyes are his examples) rather than the standardized evaluations. This, however, is of little use for a university admissions officer who must choose between a student from Mrs. White’s class and a student from Mr. Reyes’s class. In addition, Farley argues that teachers were horrible scorers, in part because they “make huge leaps when reading the student responses, convinced they knew what a student was saying even if that didn’t match the words on the page” (236).
Many of the critiques, that enumerators are very poorly qualified or that testing companies easily manipulate statistics to hide low-quality scoring, are issues of oversight. That implies that there may be no way America can get the testing it wants for the price it currently pays. Perhaps this means less testing, better done, and the development of monitoring systems which better guard against cheating. It probably means higher standards for scorers, which means higher wages for scorers. (Insufficient supply of scorers is a recurrent problem, leading Farley multiple times to be fired for failing to pass scoring tests and then re-hired within a day after a lowering of standards.) Or when a testing company refuses to produce new, better test items because their contract says they don’t have it: That signals the need for better contracts.
None of these suggest that standardized testing should be tossed out entirely. There will always be some useful information in student evaluations and some random noise (see note), whether those are classroom evaluations or “standardized” evaluations. The focus needs to be to increase the information and recognize (and take action) where the noise is so great that the evaluation will be worthless.
Overall, this is an important book that will hopefully be read by education policymakers. But I hope they will use it to improve the system, not to toss it out the window.
Note: As long as different items are scored by different scorers (which they are), the fact that some scorers are too harsh and some are too easy should wash out in comparisons across large samples. For example, harsh scorers would lower scores in both great schools and good schools, so the test results would still show that great schools are doing better than good schools. We may not have that confidence when comparing two individual students given the smaller sample size.
All that being said, let me add a couple of other less central critiques:
+ Several times Farley suggests that a fundamental issue is that for-profit companies are doing this work, rather than educators with children’s best interests at heart. And yet, the educators who appear in the story seem no more capable at evaluating than the for-profit companies.
+ I was disappointed at how Farley carries his own ignorance as a bit of a point of pride. For example, several times he refers to the psychometricians (statisticians – in this context – specializing in analyzing test statistics) as imposing counterproductive rules without ever taking the trouble to examine what psychometricians do or why it’s part of the process. Certainly statistics shouldn’t overrule good sense (and sometimes it unfortunately does), but it also can help reveal result-manipulating test evaluators (like Farley and his colleagues).
Objectionable content? The book has one mention of the title of a pornographic movie plus a light smattering of strong language.