It’s not exactly news that America’s long experiment in representative democracy has always faced enemies, working in darkness, who seek to undermine it by sabotaging elections...most frequently via voter suppression. The faces change (once it was southern Democrats, now it’s midwest Republicans and Russians), but the song remains the same. Here’s what is news: cheaper, better, faster technology is now empowering grassroots activists to man the ramparts — citizens filling some critical breaches in election integrity that our governments have too long ignored.
I’ve written elsewhere about one such tool empowering democracy’s happy warriors: NC-GoVote’s Reg Watch service, which works like a credit report monitoring service, but for voter registration records. Now let me tell you what else we’re doing to defend the right to vote.
MIGHTY OAKS FROM LITTLE ACORNS GROW
As Reg Watch has grown to serve ever more North Carolina voters, the grassroots technologists who created it have come to realize that the infrastructure we built also has the power to expand, rather easily, to do much, much more: to catch and reveal organized voter suppression activities in real time, before they throw an election, in service to the entire North Carolina electorate — not just for voters who sign up for Reg Watch.
The principle is simple. Voter suppression efforts involve barring ‘the wrong sort’ of voters from the polls. Whether that is accomplished by preventing would-be voters from registering in the first place (as achieved in the Jim Crow era through poll taxes and literacy tests, or today’s needless voter ID laws), or whether instead it is accomplished by more illicit means such as 2016’s voter caging scheme run by North Carolina’s notorious Voter Integrity Project, voter suppression is all about racking up big numbers. Preventing one individual from voting doesn’t move the needle for democracy’s foes; they need to suppress tens or hundreds of thousands of voters at a blow to make the considerable effort that voter suppression entails worthwhile. And the same is equally true for democracy’s foreign enemies, such as the state-sponsored Russian hackers who intensively probed dozens of state voter registration systems in 2016 (and who should be expected to be back again in force this year).
But the thing about moving the needle via a massive voter suppression effort targeting a state’s poll book is that it necessarily leaves a trail of odd statistics in its wake. And rooting through mountains of data to turn up curious statistical anomalies is a solved problem, thanks to modern data science and cheap computing power.
Catching the bad guys in the act
What might such a “curious statistical anomaly” in voter registration data look like? Maybe something like this:
This chart illustrates the consequences of the now-infamous voter caging scheme Republicans organized to quietly force thousands of voters off the rolls in North Carolina’s Cumberland and Moore counties in 2016. In Cumberland County in particular, that abuse raised a huge welt (the tall brown bar on the right in this chart) that would be pretty hard to miss...provided anyone was systematically watching for it. But in 2016 no one was — it was brought to light mere weeks before the November election, and then only thanks to a few individuals who happened to notice that their own names had suddenly dropped off the rolls.
Enter, stage left: Progressive data science
Spotting statistical anomalies like this one has long been possible here in North Carolina, the state with the nation’s best ‘open data’ laws regarding voter registration data. But it has never been easy. The data files that the NC State Board of Elections posts weekly on its public FTP server are well known among veteran voting rights defenders to be a hot mess (idiosyncratically formatted and full of weird typos) and, what’s worse, they are arcane — there’s no publicly available data dictionary explaining what each of the 70+ obscurely named fields in a voter’s registration record actually means, making deep analysis of these data a job that is not for the incautious or the faint of heart.
But along the road to creating NC-GoVote’s Reg Watch service we solved those problems. Our volunteer professional data scientists created robust and reliable fully automated data-cleaning algorithms, our volunteer software engineers created a truly elegant cloud-based backend structure, and the whole team built up substantial domain expertise concerning this arcane data — opening the door to querying the data set in ways we hadn’t originally contemplated.
And, because data scientists are an inherently curious bunch, boy-howdy did we query that data — compulsively with the arrival of each new update, searching for any little hint of foul play underway here in the Tar Heel state. Fingers flying over keyboards, feet tapping impatiently while complex queries ran for hours (the ever-growing NC-GoVote database currently tips the scales at just over 2.5 billion data points, growing by a quarter of a billion data points weekly).
Then it occurred to us that this ad hoc process of laboriously sniffing through the data to find curiosities could itself be fully automated to create an anomaly detection agent — which we’ve christened Ada — to spot anomalies we might never have considered looking for on our own. Take, for example, these hidden gems Ada recently turned up...none of them sinister, as it turns out, but all of them things that voting rights defenders should want to know about:
It has been quite encouraging to get to know some of the good folks at the NC State Board of Elections as we were building Ada, and to find them to be as interested in Ada’s discoveries as we are. For example, when we brought the Cleveland County anomaly to their attention they immediately began researching the matter, getting back to us in just a few days with an explanation that was in keeping with what we, by then, had already guessed (based on the statistics):
The Cleveland County Board of Elections was not always using the comprehensive death data provided by the State Board to remove deceased voters. This resulted in some deceased voters not being removed in a timely fashion. According to Cleveland County elections officials, when the issue was discovered, the county board removed 400 deceased voters on April 19 and 142 voters on April 20, 2018.
The Cleveland County Board of Elections reviewed voter history for all deceased voters who were removed and determined that none had voter history after their death dates. In other words, no one cast ballots in their names after their deaths.
It is informative to know that the state board needed days, rather than minutes, to confirm and research this incident. That tells us that the state board itself isn’t doing anything like keeping an eye out for such data anomalies. So Ada isn’t re-inventing the wheel; it’s raising the bar. And judging from the number of probes our systems regularly receive originating from former Soviet Union states, we’re not the only ones who seem to think so.
Building bridges to the state’s election officials is a critical component of the Ada strategy. Because when an anomaly turns up, who ya gonna call? The media (social or otherwise) should never be one’s first choice (or even second or third), because it is of the very nature of anomalies that 99% are of the ‘stuff happens’ variety. Earning a reputation for crying wolf before the Big One turns up is self-defeating...there are already plenty of tin-foil-hat types out there.
We’re currently unveiling to the public a very few of Ada’s ongoing analyses, via NC-GoVote’s new Data Viewer web page. Feel free to have a look, but please don’t use your cell phone...it is sooo not mobile-friendly yet . Currently we’re exposing just three of the more than 50 individual stats that Ada finds informative. We’ve chosen to make those particular stats public because they’re ones that every voting rights defender, voter registration drive organizer, county party chairperson or candidate should find useful. Most of the rest of the stats Ada looks at would only confuse civilians...and revealing them would give the bad guys more insight into our processes than seems wise.
WHAT ARE TECHIES IN YOUR STATE DOING TO DEFEND THE POLL BOOK?
There’s nothing unique enough about North Carolina’s situation to prevent the NC-GoVote model from flowering in some other states, too. Yes, our open data laws are extremely accommodating, but some others aren’t too far behind. A few that come immediately to mind include Maryland, Massachusetts, New York, Ohio, Pennsylvania, Vermont, and Washington (alas, Californians, you’re out of luck: your state’s poll book data access policies are shockingly antediluvian). Progressive data scientists and software engineers who are ready to do more than just the usual slow burn while watching Rachel are everywhere nowadays; you probably count some among your friends. The coding and engineering required isn’t exactly rocket surgery. And the cloud-based computing power required is very reasonably priced.
If you have a team that might be ready to have a go at this in your state, and you’d like some free advice (and worth every penny!), feel free to drop me a line at doc at ncgovote dot org.
If you’re a registered North Carolina voter, join the growing numbers of your fellow Tar Heels who are signing up for Reg Watch at NC-GoVote. Because even as we expand our efforts to provide statewide situational awareness, we’re still deeply committed to defending individual NC voters one by one, too. But we can’t inform you of a problem with your voter registration record if we don’t know how to reach you with an alert. You can learn more about how Reg Watch works here.