Editor's note: Paul Rubenstein is president and CEO of Accelerant Research, Matthews, N.C.

There are about 250 million adults in America, of every imaginable background and circumstance. So how can a survey of only 800 or 1,000 adults reflect what the entire country is thinking? How can a thousand voices speak for them all?

Marketing researchers liken it to making a big pot of soup – to taste-test the soup, you don’t have to eat the whole pot, or even a whole bowl’s worth. You only have to try a spoonful or two. The same is true of marketing research. You don’t have to ask every single person in America to find out what Americans think; you only need to ask a few to get the flavor of the population’s opinion.

This fact is reflected by a survey’s standard error of the mean. Specifically, the standard error of the mean is an index of the amount of error that results when a single sample mean is used to estimate the population mean; it is an index of sampling error. The standard error of the mean equals the standard deviation of the population of raw scores divided by the square root of the size of the sample on which the means are based.

So, a subject highly related to sampling error and standard error is the margin of error. The lower the margin of error, the larger the sample and the more accurately the views of those surveyed match those of the entire population.

When marketing researchers report the margin of error for their surveys (usually expressed as something like “plus or minus 3 percent”) they are stating their confidence in the data they have collected.

Confidence interval

You must also remember that every margin of error has a confidence interval, usually 95 percent. That means that if you asked a question from this survey 100 different times, 95 of those times the results would be within three percentage points of the original answer. Of course, this means that the other five times you ask the question, you may get answers that are completely different.

For example, if 50 percent of a sample of 1,000 randomly selected Americans said they are satisfied with their bank, in 95 cases out of 100, 50 percent of the entire population in the U.S. would also have given the same response had they been asked, give or take three percentage points (i.e., the true proportion is somewhere between 47 percent or 53 percent).

The bigger the sample, the smaller the margin of error, but once you get past a certain point – say, a sample size of 800 or 1,000 – the improvement is very small. The results of a survey of 300 people will likely be correct within six percentage points, while a survey of 1,000 will be correct within three percentage points, a lower margin of error. But that is where the dramatic differences end – when a sample is increased to 2,000 respondents, the margin of error drops only slightly, to two percentage points.

Despite this, some surveys have sample sizes much larger than 1,000 people. But why ask 2,000 or 3,000 respondents when 800 will do? Well, it sounds more impressive. But that’s hardly worth the cost of interviewing all those additional people. Usually when a study has a large sample, it is so certain subgroups can be isolated and compared to other subgroups or to the total sample. If you want to compare retired people to the general public, for instance, a sample of 1,000 might yield only 100 or 200 people who are no longer working, which may not be enough to get a solid grasp on the views of that group. A sample of 2,000, however, will probably yield a larger group of retired Americans and provide a more accurate picture of their views, which can be compared to non-retirees’ opinions.

Sometimes increasing the overall sample size is not enough, if the subgroup you are examining is rare or particularly hard to find. Affluent households, for example, make up only a small percentage of the U.S. population. In a standard random sample, you would have to interview an enormous number of people before you had a large enough subgroup of affluent households. In this instance, you would take an oversample, purposely seeking out members of the high-net-worth group you are interested in, and comparing the results to the main sample.

Of course, in both general samples and oversamples, who is asked is as important as how many are asked. Reputable survey organizations go to great lengths to make sure their interview sample is random and representative of whomever they are surveying, be it retired, affluent, or all Americans.

Statistical significance

Sometimes, even the best researchers misuse and abuse the concept of significance. Many in research pore over reams of crosstabulations and perform a multitude of analyses to find significant differences and formulate their decisions based on statistical significance. They tend to associate statistical significance with the magnitude of the result. Their reasoning is something like this: “The more statistically significant a result, the bigger the difference between two numbers.” In other words, the fact that one proportion is significantly different than another suggests that there is a big difference between the two proportions and statistical significance is often associated with “bigness” of a result.

 People often think that if the difference between two numbers is significant it must be large and therefore must be considered in the analysis. It is suggested that when comparing numbers, two types of significance should be considered: statistical significance and practical significance. By understanding the difference between statistical and practical significance, we can avoid the pitfall that many in the research industry make.

What does statistical significance mean? A significance level of, say, 95 percent merely implies that there is a 5 percent chance of accepting something as being true based on the sample when in the population it might be false. The statistical significance of an observed difference depends on two main factors: the sample size and the magnitude of the difference observed in the samples.

For example, let’s say we do a significance test between two groups of people who are exposed to a product concept and find a 20-point difference between Group A (65 percent acceptance) and Group B (45 percent). Is the difference statistically significant? Despite the large magnitude of the difference (20 points), its statistical significance will depend on the sample size. According to statistical theory, we need a sample size of about 50 or more people in each of the groups for the difference to be statistically significant at the 95 percent level of confidence. If we meet the sample size requirement, then the difference of 20 points will be statistically significant at the 95 percent level of confidence.

What does this really mean? Many marketers will look at this result and conclude that since there is a 20-point difference and the difference is statistically significant, there must be a big difference between Groups A and B. In reality, if we had done a census (i.e., surveyed the entire population) instead of surveying a sample, the difference between Group A and Group B may have turned out to be smaller. In other words, what this result tells us is merely this: Given our particular sample size, there is a 5 percent chance that in the population represented by this sample, the proportions for Group A and Group B are not different. That’s all!

Statistical significance does not tell us anything about how big the difference is. It only tells us the probability with which a difference found in the sample would not be found in the population. Thus, for this case, statistical significance would allow us to conclude that there is only a 5 percent chance that in the population the proportion of Group A favoring the product is not higher than Group B; we are taking a 5 percent risk of concluding a difference exists when there may not be any such difference. If this difference were significant at the 99 percent level of confidence, it would not have become larger. It would only mean that there is a 1 percent chance that the difference observed in the sample would not be observed in the population. Thus, we are only taking a 1 percent risk.

Practical significance

From a marketing perspective, the statistically significant difference of 20 points may be meaningful or meaningless. It all depends on our research objectives and resources. If it costs millions of dollars to reach each additional percentage of the market, we may decide to funnel resources toward Group A since it has a higher acceptance rate. In this case, the difference may be termed a “big” difference because (a) we are reasonably sure (95 percent or 99 percent sure) that the difference observed in our sample also exists in the population and (b) each percentage of difference is worth millions of dollars to the client. Thus, statistical significance should not be used to decide how big a difference is but merely to ascertain our confidence in generalizing the results from our sample to the population.

In another situation, the same difference may be ignored despite the fact that it may be statistically significant. For instance, if the marketing costs are so low that it makes sense to market to both groups, we can ignore the difference (even though it is significant) and treat both groups as if they were the same. We may choose to market to both groups as if they had similar acceptance rates (even though our statistical test was significant).

The logic is the following: although we can be 95 percent sure that the difference observed exists in the population, given the marketing scenario, the difference is not meaningful. Thus, the relevance of a statistically significant difference should be determined based on practical criteria including the absolute value of the difference, marketing objectives, strategy and so forth. The mere presence of a statistical significance does not necessarily imply that the difference is large or that it is of noteworthy importance.