Statistical analysis of Rasmussen's accuracy

by Michiganliberal

Community

(This content is not subject to review by Daily Kos staff prior to publication.)

Tuesday, Mar. 01, 2011 Tuesday, Mar. 01, 2011 at 4:13:02pm PST

I'm writing this diary mainly since I've spent a while arguing with Rasmussen's partisan defenders, and intend to have solid numbers ready for my own reference in case of future disputes. I haven't diaried for quite a while, and I don't expect this to get much attention. Be warned, everyone - this will be a pretty math-heavy diary.

By now, all of us have heard that Rasmussen is fairly biased and inaccurate (as Mr. Silver has done an excellent job of arguing.) This diary is intended to put a mathematically semi-robust confidence level on Rasmussen's inaccuracy. I use as subject of analysis Rasmussen's infamous HI-Sen poll in 2010, which showed Inouye only up 13 points. In the end, he won by 53.

Rasmussen's defenders explain the poll as "just an outlier" and insist that it be removed from the formula Nate uses to derive his weightings. But making an argument for that simply reveals an ignorance of basic statistics. Statistics does not predict any results with 100% certainty, but there are limits on all outliers. "95% confidence of a 3% margin of error" does not mean that the other 5% of the time the error can be off by 50%. If we start from the assumption that Rasmussen is a fair unbiased competent pollster, we may calculate the exact probability that a poll was conducted legitimately and just happened to be off by such a large margin.

Polling is an example of a binomial distribution - the pollster selects N people from the general population, each of whom have a p% chance of voting Democrat (as there were very few undecideds/3rd-party-candidates, I'll ignore them in the interests of simplicity for this analysis.) Such a distribution has a standard deviation - the rough error - of sqrt(N*p(1-p)).

Note that for calculating the margin of error in polls, we assume p=50% to maximize the error, as we don't know the actual probability before the election, but here, we have the election results with which to guide our analysis. The question we are asking here hence is that, if we select 500 people, each of whom have 75% chance of voting for Inouye, what is the probability that no more than 265 of them (53% - Ras's #) answer that they will do so.

The typical ('quick') approximation for probabilities is that for non-small N, the binomial distribution is pretty much a normal distribution which we can calculate exactly. However, this is not necessarily accurate in the tails - a quick way to look at it is that the normal distribution never goes to 0, so we could have ridiculously small but non-0 probability of having, say, a result of 150% which is unphysical. This usually does not come up except in horrendously unlikely occurrences - such as Rasmussen's HI poll which I'm examining. Anyways, we'll do this more robustly by using approximations to the pure binomial distribution.

Per Hoeffding's Inequality, we put a maximum limit on this probability at 5*10^-22. That is to say, even using a generous approximation, if we assume Rasmussen is a competent/unbiased/accurate pollster, it would need to conduct 2*10^21 (two sextillion) polls in order to have one with such an unusual result.

I rest my case.