Skip to: Main Content / Navigation

By the Numbers: Let's test everything



Article ID:
20040510
Published:
May 2004, page 28
Author:
Stephen J. Hellebusch

Article Abstract

In statistical testing, the key is to make sure the right numbers are being tested.

Editor’s note: Stephen J. Hellebusch is president of Hellebusch Research & Consulting, Inc., Cincinnati.

The logic of statistical (stat) testing is not complex, but it can be difficult to understand, because it is the reverse of everyday logic and what normal people expect. Basically, to determine if two numbers differ significantly, it is assumed that they are the same. The test then determines whether this notion can be rejected, and we can say that the numbers are “statistically significantly different at the (some predetermined) confidence level.”

While it is not complex, the logic can be subtle. One subtlety leads to a common error, aided and abetted by automatic computer stat testing - overtesting. Suppose there is a group of 200 men and one of 205 women, and they respond to a new product concept on a purchase intent scale. The data might look like that shown in Table A.

Statistical logic assumes that the two percentages to be tested are from the same population - they do not differ. Therefore, it is assumed that men have the same purchase interest as women. The rules also assume that the numbers are unrelated, in the sense that the percentages being tested are free to be whatever they might be, from 0 percent to 100 percent. Restricting them in any way changes the probabilities, and the dynamics of the statistical test.

The right way to test for a difference in purchase intent is to pick a key measure to summarize the responses, and test that measure. In Table A, the Top Two Box score was tested - the combined percentages from the top two points on the scale (“definitely would buy” plus “probably would buy”). Within the group of men, this number could have turned out to be anything. It just happened to be 13 percent. Within the group of women, it could have been anything, and, as it turns out, was 40 percent. Within each group, the number was free to be anything from 0 percent to 100 percent, so picking this percentage to test follows the statistical rule. The stat test indicates that the idea that these percentages are from the same place (or are the same) can be rejected, so we can say they are “statistically significantly different at the 95 percent confidence level.”

Something different often happens in practice, though. Since the computer programs that generate survey data do not “know” what summary measure will be important, these programs test everything. When looking at computer-generated data tables, the statistical results will look something like those shown in Table B.

If the Top Two Box score is selected ahead of time, and that is all that is examined (as in Table A), then this automatic testing is very helpful. It does the work, and shows that 13 percent differs from 40 percent. The other stat test results are ignored. However, if the data are reported as shown in Table B, there is a problem.

The percentages for the men add to 100 percent. If one percentage is picked for testing, it is “taken out” of the scale, in a sense. The other percentages are no longer free to be whatever they might be. They must add to 100 percent minus the set, fixed percent that was selected for testing. Percentages for the men can vary from 0 percent to 87 percent, but they can’t be higher, because 13 percent is “used up.” Similarly, percentages for the women can vary from 0 percent to 60 percent, but 40 percent is used already. When you look at testing in the other rows, or row by row, you are no longer using the confidence level you think you are using - it becomes something else.

Statistically, if one said of Table B that the percentages that “definitely would buy” and the percentages that “definitely/probably would buy” both differ at the 95 percent confidence level, it would be wrong. One of them does, but the other difference is at some unknown level of significance, probably much less than 95 percent, given one related significant difference.

Stat tests are very useful. Each one answers a specific question about a numerical relationship. The one most commonly asked about scale responses is whether two numbers differ significantly. If they are the right two numbers, and the proper test is used, the question is easily answered. If they are the wrong two numbers, or the wrong test has been used, the decision maker can be misled. 

 

Page Tools
Bookmark and Share

Related Suppliers: Research Companies from the SourceBook

Click on a category below to see firms that specialize in the following areas of research and/or industries

Specialties

Conduct a detailed search of the entire Researcher SourceBook directory

Related Articles

There are 756 articles in our archive related to this topic. Below are 5 selected at random and available to all users of the site.

A slight change in the route
The Minnesota Department of Transportation found that changing the question order in a long-time study had some interesting and ultimately beneficial effects.
Ensuring objectivity in packaging research
While it is commendable that some design firms have taken it upon themselves to conduct consumer research on packaging concepts, they lack the objectivity to make major decisions based on that research. Those decisions should remain in the hands of the client company. The author outlines how design firms and client companies can structure their use of research to the fullest advantage of both parties.
Cultural adaptation of research procedures and instruments in Hispanic and other cultures
There are many perils lurking in the shadows of cultural diversity for the unaware researcher. The researcher must be able to adapt instruments and procedures to the cultural groups being researched. This article discusses cultural relevance and adaptation, noting the difference between a cultural interpreter and a translator.
Is that your final answer?
There has been a fair amount of debate about how to increase response rates in consumer mail studies. This article offers a short true/false test about mail response rates. The questions and their answers are meant to serve as guidelines for conducting consumer mail studies.
Appropriate use of regression in customer satisfaction analyses:
In October, 1992 Quirk’s Marketing Research Review, Dr. William McLauchlan critiqued the use of multiple regression analysis to model customer satisfaction and asserted that self-stated importance is a superior approach. The author of this article disagrees with McLauchlan, and uses this article to respond to these critiques and explain why multiple regression analysis would be a more powerful choice of analysis technique.

See more articles on this topic

Related Events

DATA MATTERS CONFERENCE
February 17, 2010
Research Magazine will hold a conference, themed 'Data Matters,' on February 17 at the Mayfair Conference Centre in London.
RIVA COURSE 241: QUALITATIVE ANALYSIS AND REPORTING
February 18-19, 2010
RIVA Training Institute will hold a course, themed 'Qualitative Analysis and Reporting' on February 18-19 in Rockville, Md.

View more Related Events...

Related Discussion Topics

TURF Simulator
01/11/2010 by William Bailey
TURF Simulator
01/08/2010 by Manmit J. Shrimali
SPSS Versus Minitab
09/28/2009 by Ian L. Straus
SPSS Versus Minitab
09/23/2009 by Wendi Odenhausen
F-K Statistic
07/29/2009 by William Bailey

View More

Related Glossary Terms

Search for more...