I'm obsessed with the polls like many other people this time of year. But lately I've realized I'm actually more obsessed with something else: the analysis behind the polls. And by 'analysis' I don't mean the pathetic coverage we often see in the media from pundits who don't have the first clue about basic statistics. Instead I'm talking about statistical models like those used by Nate Silver and others. People are often glued to 538 like their life depends on it, but did you know that the Princeton Election Consortium gives Obama 85% odds of winning? Or how about that Election Analytics at the University of Illinois gives Obama 97% odds of winning? And professors De Sart and Holbrook at UVU give Obama an 87% chance of re-election as of today -- not too shabby!
I'd like to pretend, even with a Masters in Physics, that I can follow these models and lay out the exact assumptions they employ, but I'm honest enough to admit that I don't understand most of their features. I would need to spend a significant amount of time studying all the nitty-gritty details before I could claim competency in how they work. So to spare myself, and the rest of you, a whole lot of heartache and confusion, I want to introduce people to a new way of thinking about the polls. Forget 538, forget Princeton, and forget the University of Illinois. I give you: the Rule of Ten.
Before I tell you what it is, let me tell you what it's not. The Rule of Ten is not a statistical model and it's not a computer program of any kind. It's just an easy, convenient, and reliable way of thinking about polls intended to provide context where often none exists. The rule is the following: analyze the last ten polls conducted in any contest, assuming the polls were taken relatively recently. It's based on a simple idea: polling analysis should always be aggregated, which itself is based on a central principle in statistics -- The Law of Large Numbers. That law states that as the number of independent trials increases, the trials will approach the 'true' average, or the expected value. In other words, the more polls we analyze, the more confident we can be that they give us a good idea of reality. It doesn't matter that some polls were conducted by the Evil Republican Pollsters because, if a roughly equal amount were conducted by Democratic pollsters, the differences will wash away in the average and the 'neutral' pollsters will survive unscathed. So under this scheme, weighting of any kind is unnecessary -- as long as the polls are representative of all biases, including those with no bias. To give you a better idea of how it works, let's first examine the present election, and at the end I'll return to analyze the flaws with the way RCP does things, particularly at the state level.
We look at ten recent national polls first:
Ipsos on 10/27: O+2 (O 47 R 45)
RAND on 10/27: O+6 (O 51 R 45)
Gallup on 10/27: R+5 (R 51 O 46)
Rasmussen on 10/27: R+4 (R 50 O 46)
IBD/TIPP on 10/27: O+2 (O 47 R 45)
PPP on 10/27: Tie (O 48 R 48)
ABC/WashPost on 10/26: R+1 (R 49 O 48)
AP/Gfk on 10/25: R+2 (R 47 O 45)
Zogby on 10/21: O+3 (O 50 R 47)
Battleground on 10/22: R+2 (R 49 O 47)
Without even doing any math, we see that five of these polls put Romney ahead, four put Obama ahead, and one has them tied. Conclusion? The race is more or less tied nationally. If you work through the averages, you'll get Romney at 47.6 and Obama at 47.5. Notice how this analysis puts the race slightly closer than the RCP national numbers, which are often devoid of Democratic-leaning polls. Notice also that PPP is the best in the business -- if by 'best' we mean the pollster that most closely matches the averages.
Now let's look at some states. We begin with Ohio.
The ten most recent polls for Ohio can be found at Pollster. We see the following: seven polls put Obama ahead and three show a tie. Conclusion? Obama is certainly leading in Ohio. The averages merely confirm this: Obama sits at 47.9 and Romney's at 46.0. Remember that a two-point spread over ten polls isn't merely a "small lead for Obama." The correct interpretation is: Romney's in some deep you-know-what.
Next up is Nevada. Here are the ten most recent polls there. All of them show Obama leading. Conclusion? I have no idea why every other pundit on TV considers this a tossup. To me a tossup state is one where the eventual result is uncertain given our current information. Here the result is very certain: Obama will win Nevada easily. The averages work out to Obama at 49.9 and Romney at 46.1. Again, a four-point spread over ten polls is devastating for whoever happens to be losing.
What about Florida? Here we have Romney leading seven of the last ten polls, Obama leads two, and one is tied. Conclusion? Romney is probably in the lead there. Romney's average of the ten is 48.7 and Obama stands at 47.2 I think Florida will be tough for us, but certainly not impossible.
How's Virginia looking? Four polls show Romney ahead, four show Obama ahead, and two are tied. Virginia is the definition of the tossup state. Obama's average in the state is 47.7 and Romney's comes in at 47.1. Don't rush to call this one either way.
By now you can go to Pollster and play with this on your own, but let's finish with Wisconsin since some people in the Romney camp think it might be a competitive state. Turns out they better look elsewhere. Obama leads nine of the last ten polls and the other one shows a tied race (courtesy of Rasmussen!). Obama averages 49.7 and Romney averages 46.5, so Wisconsin is safe for Obama.
On the whole, I'm feeling good about Obama's chances. Of course, a week is often a lifetime in politics and some things might change from now until election day. Before I conclude the diary, I wanted to mention a few more things about RCP. They're often the gold standard for the state of the horse race among the media elite, but that position needs to be reconsidered. My essential problem with RCP is not with its underlying methodology; it's with their omissions. Consider the following RCP state margins and actual results from 2008. In New Mexico, the RCP average going into the election had Obama winning by 7.3; Obama won by 15.1. In Nevada, the RCP average had Obama winning by 6.5; Obama won by 12.5. In Virginia, the RCP average showed Obama winning by 4.4; Obama won by 6.3. In Colorado, the RCP average had Obama up 5.5; Obama won by 9 points. In Pennsylvania, RCP had Obama up 7.3; Obama won by 10.3. In Indiana, RCP showed McCain leading by 1.4, but Obama prevailed by 1.1. You get the point by now: RCP significantly understated Obama's support in the states, particularly the battlegrounds. Why? I'll answer that with another question: what do you expect to happen when you willingly exclude Democratic-leaning firms from your averages but openly embrace those that skew Republican? Obviously the averages are going to skew Republican, and as a result you'll miss the actual results -- sometimes by considerable margins.
RCP is up to the same shenanigans this year. For example, look at the four polls that currently (as of 10/27) form their average for Virginia. Missing something? How about PPP, which showed Obama up 5 recently? Excluding this poll is inexcusable and negligent. It's the same story throughout their state lists: a systematic and deliberate exclusion of polls that lean Democratic without the corresponding exclusion of those that lean Republican. As a result of this, I predict that RCP will understate Obama's support in some of the battlegrounds by anywhere from one to two points. That's not a huge difference, but in a close election it's the difference between victory and defeat.