Skip to: Main Content / Navigation

By the Numbers: Let's test everything



Article ID:
20040510
Published:
May 2004, page 28
Author:
Stephen J. Hellebusch

Article Abstract

In statistical testing, the key is to make sure the right numbers are being tested.

Editor’s note: Stephen J. Hellebusch is president of Hellebusch Research & Consulting, Inc., Cincinnati.

The logic of statistical (stat) testing is not complex, but it can be difficult to understand, because it is the reverse of everyday logic and what normal people expect. Basically, to determine if two numbers differ significantly, it is assumed that they are the same. The test then determines whether this notion can be rejected, and we can say that the numbers are “statistically significantly different at the (some predetermined) confidence level.”

While it is not complex, the logic can be subtle. One subtlety leads to a common error, aided and abetted by automatic computer stat testing - overtesting. Suppose there is a group of 200 men and one of 205 women, and they respond to a new product concept on a purchase intent scale. The data might look like that shown in Table A.

Statistical logic assumes that the two percentages to be tested are from the same population - they do not differ. Therefore, it is assumed that men have the same purchase interest as women. The rules also assume that the numbers are unrelated, in the sense that the percentages being tested are free to be whatever they might be, from 0 percent to 100 percent. Restricting them in any way changes the probabilities, and the dynamics of the statistical test.

The right way to test for a difference in purchase intent is to pick a key measure to summarize the responses, and test that measure. In Table A, the Top Two Box score was tested - the combined percentages from the top two points on the scale (“definitely would buy” plus “probably would buy”). Within the group of men, this number could have turned out to be anything. It just happened to be 13 percent. Within the group of women, it could have been anything, and, as it turns out, was 40 percent. Within each group, the number was free to be anything from 0 percent to 100 percent, so picking this percentage to test follows the statistical rule. The stat test indicates that the idea that these percentages are from the same place (or are the same) can be rejected, so we can say they are “statistically significantly different at the 95 percent confidence level.”

Something different often happens in practice, though. Since the computer programs that generate survey data do not “know” what summary measure will be important, these programs test everything. When looking at computer-generated data tables, the statistical results will look something like those shown in Table B.

If the Top Two Box score is selected ahead of time, and that is all that is examined (as in Table A), then this automatic testing is very helpful. It does the work, and shows that 13 percent differs from 40 percent. The other stat test results are ignored. However, if the data are reported as shown in Table B, there is a problem.

The percentages for the men add to 100 percent. If one percentage is picked for testing, it is “taken out” of the scale, in a sense. The other percentages are no longer free to be whatever they might be. They must add to 100 percent minus the set, fixed percent that was selected for testing. Percentages for the men can vary from 0 percent to 87 percent, but they can’t be higher, because 13 percent is “used up.” Similarly, percentages for the women can vary from 0 percent to 60 percent, but 40 percent is used already. When you look at testing in the other rows, or row by row, you are no longer using the confidence level you think you are using - it becomes something else.

Statistically, if one said of Table B that the percentages that “definitely would buy” and the percentages that “definitely/probably would buy” both differ at the 95 percent confidence level, it would be wrong. One of them does, but the other difference is at some unknown level of significance, probably much less than 95 percent, given one related significant difference.

Stat tests are very useful. Each one answers a specific question about a numerical relationship. The one most commonly asked about scale responses is whether two numbers differ significantly. If they are the right two numbers, and the proper test is used, the question is easily answered. If they are the wrong two numbers, or the wrong test has been used, the decision maker can be misled. 

 

Page Tools
Bookmark and Share

Related Suppliers: Research Companies from the SourceBook

Click on a category below to see firms that specialize in the following areas of research and/or industries

Specialties

Conduct a detailed search of the entire Researcher SourceBook directory

Related Articles

There are 756 articles in our archive related to this topic. Below are 5 selected at random and available to all users of the site.

Strategic planning process
Brand equity is usually defined in terms of advertising in ways that aid communication development rather than strategic brand management. This article discusses brand equity and strategic planning.
10 tips on tracking research
With the exception of competitive sales data, researchers probably spend more money on tracking research than anything else. This article provides ten tips for optimizing the program and achieving maximum value, including 1) identifying the real purpose of the research, 2) basic focus, 3) research scope, 4) continuous versus “dipstick” interviewing, 5) criteria for choosing a research firm for your tracking study, 6) interviewing mode, 7) the questionnaire, 8) preliminary analytic plan, 9) mining the data, and 10) dress rehearsal. Also stressed is the key element of planning.
Similar but different
Companies need customer satisfaction information in order to serve customers better both immediately and in the long run. In this article, customer satisfaction studies and customer service measurement studies, such as mystery shopping, are compared and contrasted.
A call for satisfaction
Long John Silver’s Restaurants Inc. used interactive voice response to measure customer satisfaction, the results of which have been an escalation in customer service performance and manager involvement.
Making choices
It would be nice to have a crystal ball to foresee answers to a variety of product development questions. This article discusses trade-off analysis as such a crystal ball.

See more articles on this topic

Related Events

ESOMAR ANNUAL CONGRESS: ODYSSEY 2010
September 12-15, 2010
ESOMAR will hold its annual congress, themed 'Odyssey 2010 - The Changing Face of Market Research,' on September 12-15 in Athens, Greece.
AMA MARKETING RESEARCH CONFERENCE
September 26-29, 2010
The American Marketing Association will hold its annual marketing research conference on September 26-29 at the Hilton Atlanta in Atlanta.

View more Related Events...

Related Discussion Topics

TURF Simulator
01/11/2010 by William Bailey
TURF Simulator
01/08/2010 by Manmit J. Shrimali
SPSS Versus Minitab
09/28/2009 by Ian L. Straus
SPSS Versus Minitab
09/23/2009 by Wendi Odenhausen
F-K Statistic
07/29/2009 by William Bailey

View More

Related Glossary Terms

Search for more...