Former Obama data scientist outlines how big data changes the data business

Abstract

Editor’s note: Howard Fienberg is director of government affairs for the Marketing Research Association (MRA). He is MRA’s lobbyist in the U.S. on behalf of the survey, opinion and marketing research profession. Is big data a term used primarily by ...

Listen to this article

Is big data a term used primarily by people in suits, popularized by vendors to try to sell old products and services in a fancy new package? That was the picture drawn by President Obama’s former chief data scientist as he opened a recent speech.

Rayid Ghanni, formerly the chief scientist for Obama for America 2012, addressed the promise and peril of big data at a September 10 event in Washington, D.C. Since the election, Ghanni began teaching machine learning at the University of Chicago and co-founded Edgeflip, a company that helps non-profits take full advantage of big data.

In his view, there are five things that people do differently as a result of more data:

1. Make predictions (where they were previously too scared to do so).
2. Make more predictions much earlier (months ahead instead of seconds).
3. Make more accurate predictions (for example, the data broker Acxiom is not sharing data about consumers but algorithmically-spun inferences about consumers).
4. Become more likely to take actions they formerly thought were risky (more data helped President Obama increase his margin of victory in some places and reduce his risk of losing in others).
5. Become slightly more rational.

He was particularly enthused about people becoming more rational and getting more involved in experimentation to drive their decision-making. For instance, after experimenting with it on a smaller scale, the Obama campaign gave recommendations to its supporters on which of their Facebook friends should be asked to get out the vote. The campaign also experimented with personalization in its e-mail requests and even with how many e-mails recipients could get before they stopped giving money and opted out.

The experiments with personalized e-mails echoed lessons learned and acted upon by successful companies like Amazon, Ghanni said. “Every store can be like your old corner store, where the proprietor knows you personally and knows what you might need or like,” he said. “The channel is different, but the [personalization] results are the same.”

Despite the many lengthy newspaper series on the “magic” of the Obama campaign’s data science, Ghanni said people will be disappointed to learn that he and his team “didn’t really use that much private data in the campaign.” Most of the data that they had and used was public, especially voter registration data. Ghanni contended that all the intensive consumer preference and interest data that could conceivably be compiled and correlated could not displace simple voter data. Whether or not you’ve voted in the last few cycles is a far more important predictor than all that other big data, at least for the three most important things that the Obama campaign wanted to predict in swing states: likelihood to support, persuadability and likelihood to vote.

Ghanni also warned that identifying likely voters who are persuadable “is not the same thing” as identifying undecided voters. Part of the Obama campaign’s advantage was taking a more sophisticated look at the data, since voters classified as undecided are usually either apathetic or too guarded to respond to a survey.

All that sifted voter data was then turned over to volunteers to act upon in the swing state field and to maximize the targeting of television and online advertising and old-fashioned campaign mailers.

Divergent directions

The Q&A session at the end of Ghanni’s presentation went in divergent directions. Julie Brill, commissioner at the Federal Trade Commission (FTC), wanted to know how the Obama campaign was able to use so much ethnicity and racial demographic data. Ghanni pointed out that many state voter registration databases collect such data but in some cases they had to make inferences. Voter files are often messy and inconsistent and the Obama campaign had to consolidate and “clean up” the data. For example, in many cases, the data team had to infer ZIP codes and ages for their voter files.

Asked if consumers should know about inferences being made about them with big data, Ghanni quickly and vociferously responded in the affirmative. He also warned that the inferences made by data science are not quite deterministic: Big data “can’t predict everything.” He highlighted that, as a transparency issue for clients and users, this could be just as important as the issues of transparency to the ordinary people whose data is being used to make such inferences. Ultimately, he said, big data privacy concerns were misplaced; the real concern should be “inference privacy.”

Another audience member expressed his frustration with the introduction of commercial data techniques, such as big data and online behavioral targeting, to election campaigns. He asked, “Who, if anyone, in the Obama campaign thought about the appropriateness of these to the campaign?” Since voting is “constitutionally-protected . . . an autonomous act,” did Ghanni think that such commercial techniques trivialized or threatened the sanctity of voting? Ghanni struggled to answer what seemed to be a mostly rhetorical question but a simple response came from the moderator, who said there was “nothing new here.” He pointed out that the book The Making of the President, about the 1960 election, compared the selling of the president to the selling of soap and noted that marketing and politics have gone hand-in-hand for generations.