Skip to: Main Content / Navigation

  • Facebook
  • Twitter
  • LinkedIn
  • Add This

How Nate Silver did it



Article ID:
20130126-1
Published:
January 2013
Author:
Michael Lieberman

Article Abstract

Statistician Nate Silver proved to be more accurate in his 2012 election predictions than any of the talking heads and pundits. The author created an Electoral College model in Excel and used public data to discover the secret to Silver's accuracy.

Editor's note: Michael Lieberman is founder and president of Multivariate Solutions, a New York research firm. He can be reached at 646-257-3794 or at michael@mvsolution.com. This article appeared in the January 28, 2013, edition of Quirk's e-newsletter.

During the 2012 election, Nate Silver drew fire for his projections.

Joe Scarborough, the conservative host of Morning Joe on MSNBC, attacked Silver during the election and Politico.com called him a "one-term" celebrity, saying, "For all the confidence Silver puts in his predictions, he often gives the impression of hedging." (Later, Silver replied that Politico covers politics like sports but "not in an intelligent way at all.")

But in the end, Silver beat them all. (And Scarborough eventually apologized, sort of, acknowledging that Silver did get it right.)

For those who don't know, Nate Silver writes the FiveThirtyEight blog in The New York Times and is the bestselling author of The Signal and the Noise. In the book, Silver examines the world of prediction, investigating how we can distinguish a true signal from a universe of noisy data. It is about prediction, probability and why most predictions fail - most, but not all.

What had folks attacking Silver was this: He predicted elections far more accurately than most pollsters and pundits on Politico, The Drudge Report, MSNBC and others. In his book, Silver described his model as bringing Moneyball to politics. That is, producing statistically-driven results. Silver actually popularized the use of probabilistic models in predicting elections by producing the probability of a range of outcomes, rather than just who wins. When a candidate is at, say, a 90 percent chance of winning, Silver will call the race. What made Silver famous was his extremely accurate prediction of voter percentages - an area where pundits are almost always far off the mark. And loath as pollsters may be to admit it, polls are almost always wrong too. However, the average of polls is always more accurate. And a systematic probability model of the average of polls is almost always right. Think of it as political crowdsourcing.

One of the best models

Silver has built one of the best models out there. It's accurate, consistent and totally statistical. One advantage of being totally statistical is that his model can be replicated. This article will review the process and explain how Silver built his model, what his results mean and how to use them going forward.

The basics

To run Silver's model, you will need Microsoft Excel; a source of campaign finance and election data; and historical data to set "polling weights."

The first step is to calculate the poll weight, which is a measure of how much an individual poll counts when averaged with other polls. The poll weight consists of three values:

  • Recency: The older a poll, the lower the accuracy. A recency factor is calculated using a relatively simple decay function. Think of a poll as having a shelf-life - the longer on the shelf, the less potent the poll is.
  • Sample size: When shown on television, a poll might have a spread of +/- 4 percent. This spread is calculated using sample size. As a general rule, the larger the sample size, the more accurate the poll.
  • Pollster rating: Silver alludes to how his polling does this in a 2010 blog. He does not, however, completely reveal his secret sauce. Without going into too much statistical detail, Silver uses historical data and regression analysis to create an accuracy measure for pollsters. Better pollsters have positive ratings; worse have negative ratings.

After the information is created, the next step is to create a weighted polling average. That is, take the mean of each poll within the state using the three weights described above. For smaller races, like congressional or state races, polling data might be scarce, particularly in uncontested races. However, presidential contests, as we know, offer a deluge of data to be plugged in. Silver does not say exactly how he combines the weights. I multiply them and then weight the polls.

Error

A weighted polling average, like all averages, contains an error and a weighted mean. The weighted mean is the exact result - the one number that pops out of the calculation. Error is the average distance of each data point to the weighted mean. In creating a polling prediction, we utilize the error around the weighted mean. The smaller the average distance around the weighted mean - the error - the more accurate the poll.

When examining what Silver considers important in interpreting error, we get a good snapshot of what makes a poll accurate and what makes a poll less accurate:

  • Error is higher in races with fewer polls.
  • Error is higher in races where the polls disagree with each other.
  • Error is higher in races with a large number of undecided voters.
  • Error is higher when the margin between the two candidates is lopsided.
  • Error is higher the more days prior to Election Day the poll is conducted.

The presidential simulation

Silver predicts a lot of races: U.S. House, U.S. Senate and state governorships. The mother of all elections is, of course, the presidential.

If I were going to construct Silver's model for the pesidential election, I would set up 51 state worksheets in Excel. Each state worksheet would contain the polling data and weights for a state. We configure the 51 worksheets so each poll has its result, its weight and its error. For one run of a simulation, each poll would have one value, producing one weighted average for the state. The winner would then be declared. Excel would assign the electoral votes for that state. The front worksheet of my Nate Silver model would show all 51 states, tally who gets more than 270 electoral votes and predict the winner.

However, if you run the simulation, say, 10 million times, each poll has results that bounce around within its error, spitting out 10 million possible outcomes. When arrayed in a cumulative chart, all possible results are shown.

Exactly what Silver meant

One week before the 2012 Presidential election, Silver reported that President Obama had a 73 percent chance of being reelected. Of course, the prediction caused howls from Fox News. But while they bayed in protest, none explained exactly what Silver meant.

Silver ran his model eight days before the election. As I stated earlier, polls become more accurate closer to Election Day. Let's say that Silver ran his model 10 million times (with a new laptop this would take, oh, about four minutes). With states such as New York, California, Texas or Georgia, the outcome was never in doubt. But in swing states such as Virginia, Florida and particularly Ohio, the polls were too close to call. The winners may change for different iterations. If one runs the all possible iterations and combinations (and I would say that 10 million would probably cover it), then one can say how many times each side triumphs.

When Silver ran his models with the most current polls, 7.3 million times President Obama came out with more than 270 electoral votes; Mitt Romney won 2.7 million times. Thus, pronounced Silver, President Obama had a 73 percent chance of winning because he won 73 percent of the 10 million simulations.

Predicting the actual vote percentages is a little more difficult. However, when one had as much data as Silver and the ability the run the simulations millions of times, the actual vote count will converge to the real number, much like crowdsourcing guesses often converge to the result.

Practical uses

Practical uses of Silver's model are abundant and not solely on a presidential level. For example, if someone is working for a campaign in which the candidate is leading in the polls by 48 percent to 46 percent - a margin that in reality is a statistical tie - a month or two before Election Day, how likely is that candidate to actually win? And if the candidate is behind by five points with one month to go, how much ground does the campaign really need to make up?

A prediction model can answer these questions. If one candidate is leading by five points one month prior to Election Day in that or similar districts, 80 percent of the time s/he wins. This can be arrived at by looking at historical data or by plugging in all the current polls and financial data and running the simulation 10 million times.

Opinions and predictions

Political pundits like Dick Morris, Rush Limbaugh and Matt Drudge are paid to fill air time and give their opinions. Their opinions and predictions are almost always wrong. By contrast, Silver scientifically boils down real data and makes accurate predictions. The coming of age of probabilistic models in mainstream political modeling was brought about by Nate Silver and it is here to stay. It's called math.

Comment on this article

comments powered by Disqus

Related Glossary Terms

Search for more...

Related Events

ESOMAR CONGRESS 2016
September 18-21, 2016
ESOMAR will hold its annual congress on September 18-21 in New Orleans.
RIVA COURSE 241: QUALITATIVE ANALYSIS AND REPORTING
September 22-23, 2016
RIVA Training Institute will hold a course, themed 'Qualitative Analysis and Reporting,' on September 22-23 in Rockville, Md.

View more Related Events...

Related Articles

There are 1437 articles in our archive related to this topic. Below are 5 selected at random and available to all users of the site.

General Mills marketing research decides cookbook cover
"Betty Crocker's Cookbook" has sold over 22 million copies, but as the flagship of their publishing line, General Mills Marketing experts needed to figure out a cover that could keep the book selling strong. A variety of techniques were used to figure out what book cover would sell best.
JCPenney pinpoints its customers
In order to fully understand the needs of their customers, JCPenney has initiated a series of studies called Consumer Feedback. These studies give JCPenney a clear picture of the needs, attitudes and behaviors of their customers.
Rating scales can influence results
A summarized excerpt of a U.S. Department of Commerce study testing the merits of a seven-point rating scale versus a 10-point rating scale.
Quest research pays off for United Way
In the past, marketing research was too expensive for many United Way organizations. But all that has changed, thanks to a new research program called Quest. By utilizing innovative survey techniques and technology, Quest allows United Way organizations to improve communications, identify key services and improve fundraising easily and inexpensively.
Singles' lifestyles explored in JCPenney study
A recent survey by JCPenney explored the lifestyles and tendencies of the singles population. The consumer study, conducted by the Public Issues and Consumer Programs department of the JCPenney Co., helped the retail giant to better understand the approximately 77 million singles living in the United States.

See more articles on this topic

Related Suppliers: Research Companies from the SourceBook

Click on a category below to see firms that specialize in the following areas of research and/or industries

Specialties

Industries

Conduct a detailed search of the entire Researcher SourceBook directory

Related Discussion Topics

Confidence interval Definition
03/03/2016 by Alex Hales
TURF Simulator with Shapley Value
02/10/2016 by Amit Zaveri
Compared to what?
09/24/2015 by Alex Hales
Univariate Analysis
09/24/2015 by Alex Hales
VBA macro for TURF Analysis in Excel
09/17/2015 by Ausrine Balletta

View More