High Stakes Testing Fails Our Kids

by radical simplicity

Community

(This content is not subject to review by Daily Kos staff prior to publication.)

Tuesday, Jan. 22, 2013 Tuesday, Jan. 22, 2013 at 10:57:21am PST

This began as a comment in the diary Teacher Ken posted re: a number of well-regarded educators coming to the support of teachers in WA state who are boycotting their district's mandatory standardized test (it's worth a read - go look).

That particular test is especially egregious, because while it is supposed to be designed exclusively for the district, and based on the district's curriculum, it is not. The test seems almost specifically designed to force the district to fail. Of course that's conjecture on my part, but it would certainly fit the well with the goals of the people who most strongly push the testing mantra: those who want to see more for-profit schools funded by taxpayer dollars at the expense of public schools. Since the test was foisted upon the district by a former administrator who had been fired for incompetence, well, it's not really far-fetched to expect that revenge against the district could have played a small role in choosing the test.

But the MAP test isn't the only high-stakes test out there. There are plenty of them clogging up school systems throughout the country. These tests are highly profitable, which is part of the reason the Bush Administration put so much effort into making them a national requirement. After all, his own brother owns Ignite, a company that provides standardized test preparation services to students.

One of the arguments encouraging us to let the Bush brother connection slide at the time was along the lines of, "Eh, what's a little conflict of interest, when our children's futures are at stake?".

Well, our children's futures were at stake, not because we weren't testing them with these magical, highly-profitable, future-predicting tests, but because we were about to launch the greatest offense on public education the nation had ever seen - by implementing these tests.

Follow me below the orange croissant for some meaty goodness ....

It is well-documented that the test results are falsified and/or entirely useless, both at the local level (watch for the annual stories about the teacher who is fired when he/she has been caught replacing student answers in order to improve scores) and via the testing companies themselves.

While a brief, honest test of basics could give teachers a handle on areas where individual students could use some additional tutoring, these day-long or multi-day tests containing biased questions and graded dishonestly do nothing of any use to society. They are simply siphons to suck public funding into private pockets.

Here's an excerpt from an essay awarded the highest score by the Educational Testing Service's grader:

In today's society, college is ambiguous. We need it to live, but we also need it to love. Moreover, without college most of the world's learning would be egregious. College, however, has myriad costs. One of the most important issues facing the world is how to reduce college costs. Some have argued that college costs are due to the luxuries students now expect. Others have argued that the costs are a result of athletics. In reality, high college costs are the result of excessive pay for teaching assistants.
I live in a luxury dorm. In reality, it costs no more than rat infested rooms at a Motel Six. The best minds of my generation were destroyed by madness, starving hysterical naked, and publishing obscene odes on the windows of the skull. Luxury dorms pay for themselves because they generate thousand and thousands of dollars of revenue. In the Middle Ages, the University of Paris grew because it provided comfortable accommodations for each of its students, large rooms with servants and legs of mutton. Although they are expensive, these rooms are necessary to learning. The second reason for the five-paragraph theme is that it makes you focus on a single topic. Some people start writing on the usual topic, like TV commercials, and they wind up all over the place, talking about where TV came from or capitalism or health foods or whatever. But with only five paragraphs and one topic you're not tempted to get beyond your original idea, like commercials are a good source of information about products. You give your three examples, and zap! you're done. This is another way the five-paragraph theme keeps you from thinking too much.

This was written by an MIT writing professor, specifically to test the grading system that is providing make-or-break scores on our children's futures.

That horrifying essay's score was provided by a scoring robot - an algorithm that uses some basic rules to determine how "good" the essay is. But not all tests are scored by robots - surely human scorers do better, right? Nope:

The study, funded by the William and Flora Hewlett Foundation, compared the software-generated ratings given to more than 22,000 short essays, written by students in junior high schools and high school sophomores, to the ratings given to the same essays by trained human readers.
The differences, across a number of different brands of automated essay scoring software (AES) and essay types, were minute. “The results demonstrated that over all, automated essay scoring was capable of producing scores similar to human scores for extended-response writing items,” the Akron researchers write, “with equal performance for both source-based and traditional writing genre.”

The essay is make-or-break in admissions decisions at a number of high-level schools. And it's easy to game, at least if you know the rules. If you know any college-bound high school students, send them this. They can thank you later. Note: the math section can also be gamed, just by learning what tricks they use to try to cause students to give the wrong answer, whether or not the student actually knows how to do the math. But that section is scored entirely by bots. The essay is where human graders have come clean about the cheating.

But, but, but these tests tell us how people will perform in real life, right? They help us sort the winners from the losers, allowing the winners greater opportunity to make something of themselves in the world, while preventing the losers from wasting their time and money on expensive college educations that won't do them any good. (Or so the mantra goes). But, um, no:

Roach took the FCAT himself last year and failed dismally, getting wrong 84 percent of the math questions and only scoring 62 percent on the writing portion, which would get him a “mandatory assignment to a double block of reading instruction,” according to Roach.
“It seems to me something is seriously wrong. I have a bachelor of science degree, two masters’ degrees, and 15 credit hours toward a doctorate. I help oversee an organization with 22,000 employees and a $3 billion operations and capital budget, and am able to make sense of complex data related to those responsibilities.

“It might be argued that I’ve been out of school too long, that if I’d actually been in the 10th grade prior to taking the test, the material would have been fresh. But doesn’t that miss the point? A test that can determine a student’s future life chances should surely relate in some practical way to the requirements of life. I can’t see how that could possibly be true of the test I took,” wrote Roach.

But wait, there's more! The numbers assigned are, by and large, faked to create an appearance of consistency from year to year within a school district. It doesn't matter if students actually do better, because the scores are not allowed to deviate from the statistical norm for the school system:

What is the work itself like? In test-scoring centers, dozens of scorers sit in rows, staring at computer screens where students’ papers appear (after the papers have undergone some mysterious scanning process). I imagine that most students think their papers are being graded as if they are the most important thing in the world. Yet every day, each scorer is expected to read hundreds of papers. So for all the months of preparation and the dozens of hours of class time spent writing practice essays, a student’s writing probably will be processed and scored in about a minute.
...

There is a common fantasy that test scorers have some control over the grades they are giving. ...[snip]... Usually, within a day or two, when the scores we are giving are inevitably too low (as we attempt to follow the standards laid out in training), we are told to start giving higher scores, or, in the enigmatic language of scoring directors, to “learn to see more papers as a 4.” For some mysterious reason, unbeknownst to test scorers, the scores we are giving are supposed to closely match those given in previous years. So if 40 percent of papers received 3s the previous year (on a scale of 1 to 6), then a similar percentage should receive 3s this year. Lest you think this is an isolated experience, Farley cites similar stories from his fourteen-year test-scoring career in his book, reporting instances where project managers announced that scoring would have to be changed because “our numbers don’t match up with what the psychometricians [the stats people] predicted.” Farley reports the disbelief of one employee that the stats people “know what the scores will be without reading the essays.”

I also question how these scores can possibly measure whether students or schools are improving. Are we just trying to match the scores from last year, or are we part of an elaborate game of “juking the stats,” as it’s called on HBO’s The Wire, when agents alter statistics to please superiors? For these companies, the ultimate goal is to present acceptable numbers to the state education departments as quickly as possible, beating their deadlines (there are, we are told, $1 million fines if they miss a deadline). Proving their reliability so they will continue to get more contracts.

Why do they "juke the stats"? Because if the stats aren't consistent from year to year, states are less likely to be willing to pay for the same test the next time, because they believe that the tests are bad, rather than that different groups of students could possibly have different scores. Thus, to ensure continued profitability, the testing companies must cheat.

The goals of these tests are entirely unrelated to education - at least on the creation/grading end. It doesn't matter what the state or local school system thinks they're trying to accomplish with the testing. Parents and school systems have bought a testing pig in a poke, and if they ever actually open the bag, they'll discover the pig is actually "Blinky" the three-eyed fish from the Simpsons.