Inventing the Data Machine: Political Analytics in Practice

Inventing the Data Machine: Political Analytics in Practice

The 2020 Iowa caucuses debacle has left the public wondering, once again, about the role of technology in political life. Is the control amassed in software platforms antithetical to democracy? Can we trust technology firms with sensitive voter records and demographic data?

The 2018 Cambridge Analytica scandal first brought this issue to public attention. Platforms like Facebook contain vast troves of personal information. Combined with modern data science and machine learning technology, they can be used to develop sophisticated models of voter behavior. 

There is a latent fear that these models could, on the one hand,  turn elections into little more than automated propaganda campaigns run by algorithms tuned to influence audiences. On the other, these efforts to turn the political campaign into a data-driven project may turn out to be a lot of hype that distracts candidates from the bread-and-butter of running a successful campaign.



The Early Origins of Data Analytics in Politics

In 2002, an online presence of disaffected activists began to rally around Howard Dean, the Vermont governor who went on to win the 2004 Democratic Party presidential nomination. Dean opposed the invasion of Iraq, in contrast to the mainstream Democratic leadership. Supporters of his campaign coordinated with one another via a new communication technology called blogs and encouraged the use of Meetup.com to get face to face with other supporters.

In the early 2000s, the internet was not seen as a focus for campaigns. Just around half of American adults had broadband internet access. At the time, the Howard Dean campaign’s grassroots approach felt like one of another era. As one Dean campaigner described it in the book, ‘Taking Our Country Back’ by Daniel Kreiss:

“When I went to work for Howard Dean, I don’t think the Internet was taken very seriously as a tool…it was at best an afterthought, and it certainly was never a product of any campaign manager’s explicit strategy. It was something like “I guess we have to do that.” If you were a hot shot political operative you did not go into the Internet side of the business. It was a backwater in politics.”

Back then, the internet was seen as yet another broadcasting medium. Traditional approaches utilizing TV ads and mailed flyers were mostly untargeted efforts – the campaigners knew scant details about the voters they were trying to reach. Campaigning on the internet was initially approached in much the same way: by collecting email lists of supporters, and sending notifications about fund-raising efforts and campaign stops.



The Obama For America Campaign (2008 – 2012)

The early days of internet campaigning could be seen as a “push model”. Blasting out updates to everyone on an email list was a low-fidelity way of targeting supporters. But it didn’t take long before the imagination of technologists began to ask: What if we could more accurately cater a message to the particular voter, given some knowledge about them?

The first significant use of data as such came with Barack Obama’s campaign. Obama For America, headquartered in Chicago, was celebrated for pioneering a tech-savvy approach that drew focus on “new media” platforms to reach voters. It was the first of its kind: a campaign created by digital natives, for whom the internet was not a strange new frontier but a familiar hub of creative expression. The internet-focused campaign won Obama the nomination and subsequent victory in 2008, and the new wave of tech-savvy younger voters was credited as part of that victory.

In the lead-up to the 2012 election, Obama For America took modern campaigning efforts even further with ambitious concepts like data modeling and simulations to predict voter behavior. This was the advent of big data – technology designed for ingesting, processing, and transforming huge swaths of data to mine for insights and make predictions. The Obama For America team used big data text analytics to understand voters and predict outcomes. The campaign even developed a social media connected platform, my.barackobama.com, that allowed voters to connect their Facebook profile and share information directly with the campaign. Over two million people did.

The connected nature of the Obama For America platform allowed messages to be catered to an individual voter’s demographics. LGBTQ voters were more likely to receive messages from the Obama campaign highlighting his support of progressive policies around gay rights, while working-class white voters were more likely to receive messages about the economy and foreign trade.

The Obama For America platform essentially functioned like a customer relationship management (CRM) system. A voter could be considered a potential consumer of a product –  the product being the candidate. Messaging could then be tweaked and tuned to make the sale. The Obama For America staffers brought insights like these from other careers and in particular from the worlds of technology startups and academic research. Obama For America‘s chief scientist, Rayid Ghani, had previously worked in consumer research. By targeting voters demographically using personal information collected from surveys and derived from textual analysis, the campaign could cater its message to be more effective.

The Obama For America campaign introduced data science to politics, and it has been with us ever since, for good or ill. The general election campaigns for Democratic Party candidates have been notable for their innovative use of technology ever since. This strategy has been mimicked by data mining firms, including the controversial British firm Cambridge Analytica, which contracts similar services to third parties.

The modern political campaign today assumes the use of data to try to build accurate models of voters. These models then serve as the basis by which to test approaches to wooing potential voters.

As we’ll see, these data-driven efforts show no sign of ending. As long as the data economy exists, and as long as data collection continues to grow, analytics will remain a fact of political life in the 21st century.



The Cambridge Analytica Scandal (2014 – 2016)

In early 2018, it was revealed by a former employee that U.K.-based firm Cambridge Analytica had gathered voter analytics data from millions of Americans in support of the Republican Donald Trump’s 2016 campaign. It seemed that the company had potentially misused its access to acquire large volumes of Facebook user data. This revelation caused a furor of criticism against Facebook for enabling the actions of the firm, eventually resulting in a congressional hearing with CEO Mark Zuckerberg.

By most accounts, Zuckerberg successfully defended his platform. During the hearing, Facebook’s stock took a slight drop—but then it rose as the CEO demonstrated his willingness to defend the company’s data collection practices.

Even though Cambridge Analytica has seen its access to Facebook severed, there is little to stop another Facebook partner from taking a similar strategy toward data collection. And with Facebook taking a hard stance in favor of targeted political advertising, we can expect to see Facebook data remain in play for political campaigns for the foreseeable future.



Ada and
Hillary For America (2016)

Cambridge Analytica relied on collecting third-party data from data brokers like Facebook.  The strategy taken by the Democratic Party campaigns has remained focused on using first-party data collected by volunteers and canvassing networks.

The campaign to elect Hillary Clinton in 2016 led the charge once again in utilizing big data at the heart of its national strategy, following the approach used to great success by Obama For America. Many Obama For America staffers returned to support Clinton. 

At the forefront of the campaign was Elan Kriegel, a former analyst at the Democratic National Committee who had also worked on the analytics team at Obama For America. He was appointed the chief analytics officer of Hillary For America

“From our schedule to our voter contact to where our organizers spend their time, almost everyone here interacts with [Kriegel’s] work and their work is influenced by his insights,” one campaign manager told Politico in September 2016.

The Hillary For America campaign developed an artificial intelligence system to guide the campaign to victory through advanced analytics and prediction modeling. They called it Ada, after Ada Lovelace, the 19th-century mathematician who is often considered the mother of computer science. The campaign hoped to unveil Ada in all her glory following Clinton’s decisive victory in the 2016 election. But that victory didn’t pan out, and Ada was quickly forgotten.

We don’t know a lot of details about Ada except that it was fed “a raft of polling numbers, public and private” and “ground-level voter data meticulously collected by the campaign” in an effort to predict victory conditions for Clinton and chart a course to a win. Ada ran 400,000 simulations every day to model the race against Trump. It advised the campaign on where to focus advertising dollars by calculating a “cost per flippable delegate” score.

Although staffers spoke glowingly of the data-driven efforts at Hillary For America, Ada didn’t secure a victory. We know that the system did not correctly predict the outcome of the election. Ada was not alone in its limited predictive valueeven political analyst Nate Silver predicted a sweeping Clinton victory. The data-driven campaigners were seemingly defeated.



Looking To The Future of Data-Driven Campaigning

Have we seen the last of data-driven campaigning? I wouldn’t bet on it.

I strongly suspect that we will see data-driven campaigns return in the next election cycle. After Ada, the efficacy of the predictive analysis approach to campaigns is in question, but we should expect to see the front-runners of both parties using a similar approach in the lead-up to the general election in November.

Data models are only as useful as the data that supplies them. Even if the Ada algorithm were perfectly tuned to advise a political candidate on strategy, flaws in the data, or limited data, could yield poor results.

With the increasing horizon of data available from the Internet of Things devices like smart TVs, streaming video services, and 5G enabling more targeted campaigns, there has never been more data available about Americans. That wealth of data has created a wave of technology firms in the data collection space, as well as incentives for application developers and technology firms to sell data. Political campaigns have access to this data from data brokers, and we should not be surprised to find out that it is being used to figure out how we are going to vote in November.

The proverbial genie has been let out of the bottle.

Read More