Barack Obama won the 2012 presidential election, defeating Mitt Romney in nearly all battleground states, securing a total of 332 electoral votes and winning 51% of the popular vote.   Following the election, several prominent media outlets reported the Obama campaign’s effective mining of the large databases of voter information was a major factor in the president’s victory.  In fact, during the election, Time Magazine interviewed several of the Obama campaign’s “data crunches” and estimated their efforts “helped Obama raise $1 billion, remade the process of targeting TV ads and created detailed models of swing-state voters that could be used to increase the effectiveness of everything from phone calls and door knocks to direct mailings and social media.”  Obama strategist David Axelrod told news reporters “nothing happened on election night that surprised me — nothing. Every single domino that turned over was in keeping with the model that our folks had projected.” While it is predicted large data crunching will become normal in winning major elections, there are still many questions to be answered, especially concerning transparency of the process, voter privacy and the handling of such data between elections.

What is Data Mining?

Data Mining is defined as “the computational process of discovering patterns in large data sets” for the goal of  “extract[ing] information from a data set and transform it into an understandable structure for further use.”  A second definition is “the process of automatically discovering useful information in large data repositories.” It is important to note the term data mining should not be confused to mean any form of large-scale data or information processing.

What did the Obama Data Scientists Do?

The Obama administration accumulated a massive amount of data in the 2008 election, but did not have the means to effectively interpret all that information into precise action.  Therefore, for the first 18 months of the 2012 campaign, massive amounts of voter data from pollsters, fundraisers, field workers, consumer databases, social media and private companies were accumulated into a massive repository.   Data was continually collected through cookies and tracker programs on Obama’s website and social media apps.  Using data mining, the Obama campaign’s data scientists were able to comb through all the information to discover important patterns and draw conclusions about potential voters.  They assigned potential swing state voters numbers of a scale of 1 to 100 in four metrics: the likelihood that they would support Mr. Obama, the likelihood they would show up at the polls,  the likelihood an Obama supporter who did not consistently vote could be motivated to go to the polls, and finally, how persuadable someone was by a conversation on a particular issue.

The Obama campaign then utilized advanced statistical algorithms to run test models predicting what actions or messages would persuade the most voters to swing democratic.   Sasha Issenberg, journalist and author of Victory Lab, explained in an NPR interview  “They’re running algorithms that are basically looking for patterns between that big mass of data they have about each of you with the information that their polling tells them about specific attitudes about the election that’s underway.”   Based on the data and statistical models, the democrats effectively tailored their message to micro-target different types of voters.  The Washington Post likened Obama’s use of data mining and predictive models to Money Balling in baseball.   It should be noted, Romney’s campaign also engaged in data mining, but was not as effective at utilizing the technology.  In fact, Romney’s data processing servers were never fully tested and became overloaded, crashing on Election Day.  This left many election night republican volunteers, relying on the predictive data, utterly confused and the campaign essentially blind.

The Data & Data Crunchers

The exact number of information specialists and technical experts used by the Obama campaign are unknown.  What is known is that such staff was highly valuable and the results of their expertise coveted.  Obama campaign senior staff likened the data miner staff to nuclear codes because they were the campaigns’ secret weapon to victory.   David Axelrod stated that because of the 2012 results he would: “invest in people who understand where the technology is going and what the potential will be by 2016 for communications, for targeting, for mining data, to make precision possible in terms of both persuasion and mobilization.”   Because of the crucial role data mining played in Obama’s victory, “guys sitting in a back room smoking cigars, saying ‘We always buy 60 Minutes’ is over.”  Fundamentals and gut feelings are being replaced by the information driven insights of  data scientists and technology.  (It should be noted, Nate Silver also won a victory for data science on the pundit side of the election.  Nate Silver is a statistician and blogger who used advanced statistical math and algorithms to correctly predict the winners of all 50 states and the District of Columbia in the 2012 election.)  What type of data was collected and if it was completely anonymous is also unclear. Chief Innovation and Integration Officer Michael Slaby and other campaign officials claim the the campaign primarily relied on data voluntary given on websites, social media and to campaign canvassers.  But they have not commented on the specifics of how data was recorded on individual people, thus contributing to the larger debate of privacy ethics in large data mining.

The Future of the Obama Data

  The future of all this valuable data is unclear as well.  It will most likely be used by the Democratic National Committee to assist other democrats in their campaigns.  Although how exactly and if it will play as great a role, remains unclear.  The Wall Street Journal reports:  “But it isn’t clear that a network built around the appeal of Mr. Obama could transfer effectively to other candidates, or that voters who signed up for email messages from the Obama campaign or as volunteers would welcome such a move.”  There is also the financial cost of maintaining the infrastructure for the data, including a staff of data and technology experts, hardware and backups, processes to ensure the data is continuously updated and relevant, and upgrading systems. 

The future of the technology and code generated by Obama’s technical experts is also unclear.   Jim Pugh warns that if the democratic party does not continue to invest in data and technology, as well as retaining those key data scientists and technology staff,  the party risks losing its crucial advantage over republicans:  “A 21st-century political movement must have a serious on-going commitment to staying at the forefront of technological advancement. With the infrastructure coming out of the Obama campaign, we’ve got a huge lead in this area. Sadly, that is not what has happened so far. Since Election Day, the Democratic National Committee has laid off an unprecedented number of technology staff members, some of whom had been at the party for over ten years.”   

Open Source vs. Private Code

Programmers involved in the campaign advocated for the code to become open source, allowing other programmers the opportunity to continuously review, build and improve the technology.   Daniel Ryan, one of the campaigns technology directors, stated that much of what they had done would not have been possible without utilizing other open source programs.   He warns that the code will become obsolete if it is mothballed, however “if our work was open and people were forking it and improving it all the time, then it keeps up with changes as we go.”  This could have great benefit and lead to innovation for those who do not have the resources to build such complex code in-house. While the economic benefits and values of open source are obvious and established to the programming and data community, giving away something for free  is a largely foreign concept to the political crowd.   Three months after the election, the data and software is still tightly controlled by the president and his campaign staff.     Former campaign user-experience engineer  Manik Rathee warns this could come back to haunt the democratic party, especially in its ability to recruit the top tech engineers and developers: ” It’s going to send a very bad signal to engineers who might consider working on the next election cycle in 2016.  It shows a fundamental misunderstanding of how we work.”

While the success achieved through data mining in the 2012 presidential election is a win for informational and data sciences, both its future and its long term role in politics remains to be seen.   The Obama campaign is understandably cautious to divulge too much information, especially as the specifics of data mining, algorithms and techniques are considered key to winning the future elections from the Republican party.  But, in order to best support democracy, it is imperative that the data collected on voters, as well as the code and technology, and how it is being utilized by public officials eventually become transparent.  Why not have the public comment on the role such technology can play in elections?  Open discussion may lead to a better understanding of both the technology, information sciences and its relation to privacy.   The advancements made by the Obama campaign’s in-house data and tech staff can potentially benefit and spur innovation in a number of future data mining projects and open source technologies, (outside of predicting how to best market a campaign), if shared with the rest of the world.  So why not lead the way?