As Jen Hayden diaried on the 25th, a Wichita State statistician has asserted that statistical anomalies exist in the 2014 Kansas election returns (among others), and has sued to gain access to the voting tapes in her home county, Sedgwick. Kansas secretary of state Kris Kobach and Sedgwick Co. election commissioner Tabitha Lehman have maintained that Kansas open records law does not apply to the tapes, and that the law does not permit them to be unsealed.
As a matter of policy, I believe that paper ballots and voting records should be routinely audited after elections. I don't have a legal opinion on Kansas open records law, but in principle I think Clarkson should be allowed to take a look. As for the alleged statistical anomalies (which, I believe, have no bearing on her legal case), I am one of several Kossacks who doubt that they are anomalous at all. (leviabowles wrote a series of diaries way back in April. I'll say more about them later.) That isn't necessarily to say that the votes were counted correctly, only that Clarkson's evidence to the contrary doesn't hold water. It will take a while to explain why. If you're curious, follow me under the orange forensic curlicue....
As many of us vividly recall, Democrats had high hopes that Paul Davis would defeat embattled Republican Kansas governor Sam Brownback and that independent Greg Orman would knock off Republican senator Pat Roberts. Davis and Orman both held narrow leads in the polling averages. Yet Brownback won by almost 4 points, and Roberts won by almost 11.
These surprising results attracted scrutiny, most of it futile. In my home state of New York, all the votes would have been recorded on optical scan ballots, and some of these ballots would have been audited by hand as a matter of law. In Kansas, not so much. Many parts of the state — including the largest county, Johnson County (suburban Kansas City, MO, including Olathe) — voted on touchscreen Direct Recording Electronic (DRE) machines that do not produce voter-verifiable records. Much of the state uses DREs that do produce voter-verifiable records — as in Sedgwick, the topic of Clarkson's lawsuit — or paper ballots counted by optical scanners or even by hand. However, these ballots aren't routinely audited or recounted by hand. Without inspecting ballots, any arguments about the accuracy of the vote count are indirect at best.
Clarkson's argument builds on an analysis by Francois Choquette and James Johnson, which is discussed (and sharply critiqued in comments) here and in other DKos diaries. (I will lay out my critique of that analysis another time: it isn't integral to the discussion here.) The gist of Clarkson's argument is found in this quotation from her article in Significance:
There is an expectation that %R[epublican] vote will go down with the size of the precincts due to the association of rural districts with the %R vote. These trends are clear in the data....
The upward trend on the right [in the largest precincts] is the pattern that concerns me. It shouldn’t be there at all. But such a pattern would be expected under the hypothesis of vote fraud proposed in the Choquette and Johnson paper.
In other words, Clarkson asserts that we should expect (1)
in smaller precincts (say, those with a few hundred or fewer votes), a negative relationship between the number of votes cast and the Republican vote share, and (2)
in larger precincts (say, with 500 or more votes), no relationship.
Arguably, the diary could end right here: Clarkson's central assertion makes no real sense. If voters were randomly assigned to precincts of various sizes, we would indeed expect no relationship between those sizes and the Republican vote shares. But obviously voters aren't randomly assigned to precincts, and indeed, Clarkson reasonably says she expects the small precincts to be more Republican. Why, then, does she expect middle-size precincts to be indistinguishable from large precincts? No particular reason. The failure of an unsupported assumption doesn't qualify as an "anomaly." Still, Clarkson could be onto something genuinely weird here, so let's keep going.
Following Choquette and Johnson's example, Clarkson uses an unusual, but defensible, method to explore these relationships: she sorts the precincts from fewest to most votes, and plots the cumulative Republican vote share as a function of the cumulative number of votes cast. You can see her original plots at the link above; to err on the side of respecting copyright, here is my replication for the Senate contest in Kansas, limited to precincts with 500 or more votes cast in the counties that used DREs. (Thus, this line basically combines the two red lines in Clarkson's last graph. At left, the cumulative totals include DRE precincts with just a bit over 500 votes; at right, they include the largest DRE precincts, containing up to 1800 votes.) Overall, the line trends up, implying that the larger precincts are figuratively dragging Roberts' vote share up. The relationship does not look very strong — and if we used a scatterplot, it would look even weaker — but it's there.
The conjecture, apparently, is that someone stole votes for Roberts in the larger precincts. If Roberts really only received, say, 46% of the votes in these precincts as a whole, but was credited with over 52%, then something like 37,000 votes could have been switched from Orman to Roberts — not enough on its own to account for Roberts' 90,000+ vote margin, but more than enough to raise a stink about, if true. On the other hand, the upward trend could be explained by differences between the largest precincts and the rest.
To address this question, leviabowles used a dataset from the 2008 presidential election. (Click through to his Data Science Notes blog post for the details.) Bowles found that a similar relationship between votes cast and Republican vote share occurred in 2008 — but that when he controlled for other variables, including turnout proportion and some demographic variables, the relationship disappeared. So it appears that those other variables account for the relationship. That is a nifty approach, but some people might be put off by the analytical shift to 2008 — even though the same "anomaly" occurred then, and even though it may seem far-fetched that anyone would have bothered to steal votes from Obama in Kansas in 2008.
I chose a different approach that relies only on basic 2014 data, crucially including party registration figures. Colloquially, Clarkson's analysis looks at "Republican votes" without taking account of Republican and Democratic voters. That's a problem.
I use precinct-level vote counts from the secretary of state's website and precinct-level registration totals by party as of the 2014 general election in the two largest counties, Johnson and Sedgwick. (Current registration totals are available on the county websites, and I verified that these are similar to the 2014 totals.) Together these counties contain 319 of the 391 DRE precincts with at least 500 votes cast — and in the other 72, there is no discernible relationship between votes cast and Republican (Roberts) vote share. So it is reasonable to focus on these counties.
First, Johnson County. Here are OLS regression coefficients using votes cast to predict Roberts' vote share (limited to precincts with 500+ votes cast):
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.076e-01 3.001e-02 13.584 < 2e-16 *
votescast 1.497e-04 4.451e-05 3.364 0.000938 *
The 1.497e-04 means that for each additional vote cast, Roberts' vote share increases "on average" (on the best fit line) by about 0.00015, or 0.015%. For differences of hundreds of votes, that adds up — and the tiny p value indicates that the result is statistically significant. (The significance test treats the precincts as a random sample, but let's not think too hard about that.)
And here is the result controlling for the Democratic share of registered voters in each precinct (p_dem, which ranges from about 0.11 to 0.31):
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.061e-01 2.589e-02 31.139 <2e-16 *
votescast 7.295e-07 2.583e-05 0.028 0.977
p_dem -1.424e+00 7.073e-02 -20.133 <2e-16 *
Boom. Suddenly we have a strong finding that Roberts did worse where there were more Democrats (doh); taking that into account, the relationship between votes cast and Roberts share flat-lines. (The result is substantively the same if we include all precincts in Johnson County.) Why? Contrary to Clarkson's assumption that the precincts with the most votes should be indistinguishable from the middle-sized precincts, they contain proportionately fewer registered Democrats — and that difference alone can statistically account for Clarkson's result.
Here is a visualization of how both Orman's vote share and the Democratic registration share first increase, then decrease, in Johnson County precincts as the size (votes cast) increases. (The blue lines are loess smoothers.)
Here are the parallel results for Sedgwick County (500+ voters):
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.104e-01 3.481e-02 11.789 < 2e-16 *
total 1.059e-04 3.785e-05 2.798 0.00591 *
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.854e-01 1.713e-02 45.838 <2e-16 *
total -1.125e-05 1.375e-05 -0.818 0.415
p_dem -1.174e+00 3.779e-02 -31.067 <2e-16 **
Basically the same result: controlling for Democratic registration share — which is smaller in the largest precincts — the relationship between votes cast and Roberts vote share flat-lines. (In fact, it turns trivially negative.)
You may have noticed that the p_dem slopes are substantially less than -1: for each one-point increase in Democratic registration share, Roberts' share dropped by more than one point. That suggests that precincts with relatively more Democrats also tended to contain more moderate non-Democrats who supported Orman over Roberts. That makes sense, since Orman's support obviously extended far beyond registered Democrats.
Now, these results come nowhere near proving that the votes in this Senate race were counted correctly in these counties or anywhere else. They only show that Clarkson's purported anomalies have a straightforward interpretation: precincts with more votes cast tend to be less Democratic, through some combination of differences in registration and differences in turnout. (The same seems to be true in Ohio and Wisconsin, the other examples in Clarkson's Significance article.)
Again, all this should (at least in my view) have no bearing on whether Clarkson is allowed to audit voting records. Even if courts conclude that Kansas law forbids that, I think that Kansas and other states should implement both routine random audits and easy ways to obtain targeted audits, or partial recounts, of particular precincts or jurisdictions. So, in one sense, it doesn't matter if Clarkson is right or wrong.
In another sense, maybe it matters a lot. If election returns from Kansas and elsewhere are riddled with statistical evidence of fraud — "anomalies... that always benefited Republican candidates," in Jen Hayden's fair paraphrase — I venture that most people would really like to know that. If it's true, it powerfully strengthens the arguments for better election verification. But if it isn't true, then it could weaken those arguments, in a wolf-crying way.
4:52 PM PT: (I added a graphic to show how both vote shares and registration shares vary in Johnson County precincts as the number of votes cast increases.)