(cross posted at VoteForAmerica.net)
I changed my methodology a bit too, hopefully to provide better results, but first...
The Day 5 Results were just released, and nothing changed from the previous day.
SOS Recount Results [11/22]
Recount Original
Coleman: 809,677 - 809454 = -669
Franken: 784,310 - 784022 = -621
Franken Net Today: -52 (14 Yesterday, 43 day before, 43)
Total Franken Net: 48
Franken Deficit: 67
I'll again provide my error rate regressions, take 'em or leave 'em, but I've now done some work with the challenge data as well. But first the error regressions:
----------------------------------------------------------------------------------------------------
--------------------
Layman's Terms
----------------------------------------------------------------------------------------------------
--------------------
Basically Franken had his worst day of the recount, so far, and his projected gain slipped below the requisite 215 votes.
I am attempting to find the correlation between error rates, precinct size, and the gain for each candidate. This is done by assuming that the error rate will remain constant throughout the rest of the recount; I have no idea if this is true, although the error trends over the past three days seem to be relatively similar or they appear to be converging for the math folks. The only input to my analysis is the total number of votes from each of Minnesota's 6621 precincts.
For each precinct that has reported recount results, I look at the number of votes Coleman lost and the number of votes Franken lost. This is shown in the two graphs. I then create a trend, a function that relates the size of the precinct to the gains (or loses) of each candidate.
Then I assume that the error rates will remain constant to create a net gain (or lose) projection. The winner of the precinct has absolutely no bearing on the math.
----------------------------------------------------------------------------------------------------
--------------------
Recount Regression: Day 4
----------------------------------------------------------------------------------------------------
--------------------
The regression, or a mathematical attempt to denote a trend is done using MathCad's regress() function.
The previous two graphs depict the number of votes gained or lost with respect to the total votes recorded for each precinct that has completed their recount. Each dot represents the change between the originally certified result and the post recount tally in a given precinct with x number of total votes:
And the current functions used in conjunction with the regression:
Now again using those functions, the following two graphs further illustrate the regression interpolation. The graph below illustrates the regression on precincts whose vote totals are less than 3,494; the largest precinct that has completed a recount. The dotted lines represent the first day's regression, while the dashed lines represent yesterday's. The thin lines depict the linear regression while the thicker curvy lines present a quartic regression. The black bars emanating from the x axis represent the number of precincts, statewide with x number of total votes. There are a few straggler precincts between 2600 and 6621 that are not depicted due to the resolution of the graph.
The graph below goes further and fits the regression onto all 4,130 precincts statewide; this forces 32,186 additional votes, that lie in precincts that surpass the 3,494 vote threshold, into the predetermined trend.
Take notice of the thickest lines, those are the functions depicted above. I used a piecewise model for three reasons:
(1) The regression function is HORRIBLE beyond the current precinct max vote of 3494.
(2) The end points have far to great an effect relative to the expected linear model.
(3) The R-squared value, as you'll see below are not that good for the quartic fit. In appending the linear model at a point where the quartic starts to deviate, the closeness of fit should be improved.
And the R-squared of the regression; this only covers the quartic:
Using the previous two graphs, and the functions they represent, a projection can be made for the cases covered. The first case simply includes precincts with 3,494 or less, while precincts above and beyond that figure are entirely ignored. For each precinct, the total vote total is taken and applied to the listed function for each candidate. The result is then added to that candidate's sum, and the next precinct is calculated. This process is done using precinct results from the final certification.
Coleman Gain: -968.186777451647
Franken Gain: -911.227610123547
Franken Net: 56.9591673280993
Franken falls short of the pre-recount deficit by about 160 votes, with 32,186 votes entirely ignored and the challenges left uncounted. If those additional 32,186 votes are applied to the process, Franken adds less than a single vote:
Coleman Gain: -981.037349428085
Franken Gain: -923.639100212363
Franken Net: 57.3982492157215
----------------------------------------------------------------------------------------------------
--------------------
Challenge Regression: Day 4
----------------------------------------------------------------------------------------------------
--------------------
The graph below shows the number of challenges by each candidate relative to how they are performing in a given precinct. The dots represent challenges; a dot to the north of the x-axis represents a precinct that the candidate is currently winning. A dot to the south shows the number of challenges in a precinct that candidate is currently losing. There are no negative challenges.
The next graph shows the regression of each candidates' challenges:
It appears that a larger percent of Franken challenges occur in precincts he is currently winning, relative to Coleman's percentage. They both appear to be challenging more ballots in precincts they are winning, but Franken at a higher rate. This discrepancy may allow Franken to make up additional votes, but an exact number is impossible to predict. I'm working on a county challenge map that seems to confirm this as well, but its not ready yet.
I'd like to do a more thorough analysis, but I'll have to save it for tomorrow.
I'll now attempt to preempt some probable criticisms:
(1) - The error rates you are trying to model are random.
Maybe they are, I honestly cannot say, but I'll leave you with two thoughts. If you look at the graphs and the previous days' trend, the regressions seem to be closely situated, which could be a sign of non-random errors. Just because the errors cannot be explained does not mean there is not some sort of measurable anomaly. I have done absolutely no work trying to prove that the errors are random however. If the errors are random, this study may be useless, but what if they aren't?
(2) - You expect the errors to constantly trend.
Yes, I do. I believe this is backed up by the relative convergence of the regressions over the past three days.
(3) - Your fitting function is bad
There very well could be a better way, any suggestions. If you just purely look at the first two graphs, and ignore the regressions lines, how would your hand drawn line differ? I know the outliers get a little rough, but that's why the piecewise model was introduce.
I've created a questions comment, and a suggestions comment. Feel free to leave feedback that could help make my analysis better; if you think its blatantly wrong or don't buy one of my preempted responses, let me know too.