I find myself checking DailyKos at least a dozen times (ok, maybe two dozen?!) a day for updated poll news. I scan past politicarewire and Nate's blog and RCP many, many times a day for the slightest hint of movement. At this point as soon as a new poll hits I've an awfully good idea (if not perfect) of how it fits into the context of the pollster's previous results, other recent polls in the state, its partisan lean, etc. My mood rises and falls on the barest hint of a trend one way or the other.
And really? It's a colossal waste of time.
My reptilian mid-brain can't stop being terrified. That fear that keeps me obsessively, obsessively checking: "Are we going to be ok? Will we just punt and continue to give even the hope for future change over to the Mourdock/Cheney/Romney bastards?"
My analytical brain knows that subtle poll shifts cannot allow us to deduce much in a quasi-close election. To extrapolate responses from any poll to what's going on in the general population requires really strong assumptions. The poll's "margin of error" is, most definitely, the least of it.
Partly as therapy, I want to describe some of the obvious complications with trying to "deduce" the mood of the population from a poll (any poll); the contents won't be news to anyone who's studied this stuff, but may well be to those who haven't. More below the jump.
Kindness of Nate Silver many of us have been schooled up in a large number of ways about the vagueries of polling.
We attend to sample size associated with a poll. Why? Because if the sample is too small then there's a greater likelihood that the sample isn't representative of the population.
We attend to which pollsters sample from landlines only, or also from cell phones. Why? If those who only use a cell phone are systematically, if only subtly, different from those who only or also have a landline then the sample won't be representative of the population as a whole; indeed, the parameters being estimated will be biased.
We attend to demographics of those polled. Why? Too many women, or men, or blacks, or latinos, or young, or old, or whatever and (again) the sample won't be representative of the population; the parameters being estimated will be biased.
We attend to whether the population being sampled from is of Registered Voters or so-called Likely Voters.
But it's only relatively recently that Nate clued me into the fact that poll response rates are incredibly low (of the few polls I've looked at half-way closely the response rate can be as low as 2%, and is rarely above 10%). Why does this matter if the sample that responds has all of the "appropriate" measurable features that you think are representative of the population as a whole (demographics, cell vs. land lines, etc.)?
It is not at all unreasonable to suspect that those who respond to a poll are "different" (representative of a different population from which the poll is a hopefully representative sample) than those who do not. After all, the very fact that so few people respond to polls is itself evidence that poll responders are exceptional people. The basic assumption that is generally made is that poll responders, while exceptional from the perspective of responding to the survey instrument, are nevertheless exactly typical (representative) of those who do not respond to the survey instrument in so far as voting preferences are concerned. That is, the population who does not respond to polling has the same preferences as those who do.
This is a strong assumption. It would be super unsurprising if it's importantly wrong. And it would be unsurprising if what demarcates someone who is willing to respond to a survey compared to someone who is not so willing changes over time (so that predicting the "non response bias" based on past performance itself might be dodgy). After all, the intrusions of marketers and media and push polls and the like continues to grow. Those willing to engage in the process of polling, therefore, would seem to have thicker skins to that crap and such skins might (just might) correlate with voting preferences.
A simple math example is useful for understanding the degree to which this problem is serious. The probability that a random voter will vote for President Obama can be expressed as the following (where P( ) means "the probability of what's inside the brackets"):
P(will vote for Obama) =
P(respond to the survey)*P(will vote for Obama, conditional on having responded) +
P(not respond to the survey)*P(will vote for Obama, conditional on not having responded)
Let's assume that on average P(will respond to the survey) = 5%. Then:
P(will vote for Obama) =
5%*P(will vote for Obama, conditional on having responded) +
95%*P(will vote for Obama, conditional on not having responded).
The survey instrument measures P(will vote for Obama, conditional on having responded). Accordingly, the thing we really care about -- P(will vote for Obama) -- is determined by the 95% weight on something we never observe, and cannot observe. That unobservable probability needn't be very different, at all, from the observable probability (the sample that responds to the survey) for the polling instrument to be importantly at odds from the actual population's intentions of how they intend to vote.
I'm not saying that polls tell us nothing. What they can tell us, perhaps, is about trends in sentiment. But what keeps me up at night, what makes me unable to breath, what fills my heart with fear and anxiety are not trends, but levels, of voter sentiment. With all my heart, for me, and us, and especially my kids I want P(will vote for Obama) > P(will vote for Romney). And the real bitch? I can't know this from polls. I can more or less reasonable inferences. But given that the really crucial thing that matters isn't observable, I cannot deduce what this is.