In the last couple of days, a dizzying number of different statistical arguments have been put forward regarding the legitimacy of the 2009 Iranian election.
In order to keep track of them all, I've made a summary of all the major quantitative arguments I've heard so far and statistical commentary on them.
See more below the fold or at StochasticDemocracy.com
*****Crossposted at StochasticDemocracy.com*****
In the last couple of days, a dizzying number of different statistical arguments have been put forward regarding the legitimacy of the 2009 Iranian election.
In order to keep track of them all, I've made a summary of all the major quantitative arguments I've heard so far and my thoughts on them. Let me know by email or in the comments if I have missed any.
1) Professor Mebane's work
i) Second digit Benford Law test:
Previous research by the professor has found that testing the frequency of the second digit of vote returns is a more reliable measure of electoral fraud then the more traditional first digit test, which can often produce false positives.
He did not find any statistically significant discrepancies using a second digit Benford law test on city-level election returns.
However, his Second-Digit Benford Analysis was developed for precinct level data which has not been available. At such a high level of aggregation, sheer scale can overwhelm foul play. Because of this, conformity to Benford's law is not suggestive of authenticity in this case.
Update[6:26 pm]: Professor Mebane has obtained ballot-level data and has indicated discrepancies in Karroubi and Rezaee vote returns over private correspondance, will update when more information is available
Update[9:23 pm]: A report incorporating Ballot-Box returns can be seen herehere.
ii) Election models:
Professor Mebane found that when conditioning 2009 data on first and second round 2005 data using simple models, he found behavior that violated "natural election processes". He came to the conclusion that it constituted "moderately strong evidence for fraud".
I don't feel qualified to comment on his assertion, but he is considered an expert in this field, and so I attach weight to the fact that he is convinced. Still, I would like to see his methods applied to other elections as a "control"
2) The work of Dr.Roukema
His arguments are the following in order of "strength":
i)That reformist candidate Karroubi's vote returns have a large number of excess 7's assuming Benford's law
ii) That Ahmedinejad had an excess number of 2's and a deficit of 1's, according to Benford's law
iii) That all candidates have log-normal distributed returns except for Mousavi.
To me, i is interesting and merits further study, though the author wouldn't characterize it as conclusive.
This is particularly so because the largest discrepancy between Iranian opinion polling and the actual results was the massive collapse in support of Karroubi and Rezaee. Every opinion poll showed their combined support in the double digits, when they ended up obtaining less than 3% of the vote between them.
The major criticism of ii is that there has been research showing that candidate vote returns often do not always conform to Benford's law.
Histogram showing the distribution of vote totals in voting areas is approximately log-normal, pulled from original paper.
As for iii: Statistically, I don't see why total returns on a district level being log-normally distributed would imply that every candidate would have log-normally distributed votes. On the contrary, Log-Normal distributions are not stable under addition, and so at least one of the candidate's returns would need to deviate from a log-normal distribution in order to maintain the observed total vote's log-normality. (At least if you ignore the fact that different candidate vote returns are not independent).
Nate Silver and Professor Andrew Gelman's commentary on all three points are invaluable. See here, here, and here.
3) Samuel Wang's look at Tehran Opinion polls
Professor Samuel Wang asserts that while the public opinion polls conducted for Iran at large were all over the place and of suspect quality (Polls ranged from Ahmedinejad +16 to Mousavi +32), polls for Tehran specifically might be more reliable.
He then looks at Tehran-specific polls and finds that the discrepancy between the Ahmedinejad's poll-forecasted winning-margin in Tehran and his official margin there was large enough to be statistically significant. He concludes "For now, my interpretation is that the official returns in Tehran are unbelievable."
Some thoughts:
i) I'm not sure what the polling area the polls referred to when they polled "Tehran", the translation wasn't clear. If they were polling "Tehran, Tehran", then the results would have been within the margin of error.
ii) Focusing on margins obscures the fact that in terms of actual candidate vote-share, the polls seem to have been massively off. Mousavi+Ahmedinejad was estimated to be around 60%, but ended up at 97%. This could have been due to an abnormal number of undecided voters or some other factor, but it should be explored.
4) Nate Silver's Analysis
Nate has had some interesting qualitative analysis and statistical commentary on other research, but his best piece so far has been this , where he shows that Ahmedinejad did not do very well in rural areas in the first round results of 2005, while doing much better in 2009. He posits that such a radical change in the rural-urban divide in so little time is suspicious.
Before: First Round Iranian Elections in 2005
After: First Round Iranian Elections in 2009
I have to check if this result holds on when I replace the Ahmedinejad variable with a "conservative" variable showing the combined support of the 4 conservative candidates in 2005. But so far, this has been one of the most convincing points in favor of fraud I've heard so far.
5) Miscellaneous criticisms
i) Mousavi lost his home region:
I don't find this suspicious. There were some polls conducted by Western Organizations that showed that Ahmedinejad had much higher support than Mousavi among Azeri's, Mousavi's ethnicity. This could be related to unconfirmed reports that Ahmedinejad was a popular administrator in Azerbaijan for 8earlier in his career.
ii) Ahmedinejad won Tehran, which should have gone for Mousavi:
Some polls showed that Ahmedinejad winning Tehran by around the margins that he did. Not only that, but he was formerly elected Mayor of Tehran. I don't find this necessarily suspicious by itself either.
iii) Counting was done implausibly quickly:
Paper ballots can be counted very quickly. It doesn't take very long to call a 63% lead.
iv) There are numerical discrepancies in the voting data.
To summarize the arguments I've previously made here:
There are about 100,000 missing votes, because Valid votes+Invalid votes is 100,000 less than "Total votes".
Also, the percentage of spoiled ballots in a district is highly correlated with the district's reformist candidate vote share, while being negatively correlated with Ahmedinejad vote-share.
Percent of ballots declared invalid vs Candidate vote-share
One simple explanation, would be that new voters are more likely to make mistakes and produce spoiled ballots. But this would imply that the surge in turnout mainly went to reformist candidates. If this was the case, I don't see how Ahmedinejad could have won.
v) The idea that Ahmedinejad's share of the vote stayed too constant while results were being announced to be real. Shown via following popular graph:
This has been thoroughly debunked by multiple sources, see here and here.
*****Crossposted at StochasticDemocracy.com*****