Editor’s note: Jeffrey M. Kirk is director, research insights at the San Francisco office of research firm TNS NFO.

Additionally, statistical testing has long been automated by tabulation packages such as Quantum, a necessity given the massive amounts of data we collect and process in this industry. The process has become such an efficient one that too many of the data tabulations that leave the hands of the supplier are not accompanied with the necessary consulting or proper interpretation of the insights therein.

By the same token, many of the researchers at client organizations also lack the fundamental statistical training to properly interpret significant results. Although understandable that most practitioners’ jobs do not require an intimate knowledge of statistics (which is why I am gainfully employed!), this lack of interpretive ability with regard to hypothesis testing, or statistical testing, has dampened our effectiveness at providing value and delivering consumer insights.

The inertia resulting from many years of efficient process has produced a culture that hangs its hat on the results of statistical testing in banners but that does not foster an understanding of the real value in a statistical test. The practical use and proper interpretation of statistically significant results has largely been forgotten among client- and supplier-side researchers alike. Too often, because paging through hundreds of pages of data tabulations can be an overwhelming (and unpleasant) task, we have come to rely on scanning banners for those upper- and lowercase stat-testing letters to suggest to us which findings may be of interest. However, “statistically significant” does not necessarily imply “important” - it never has - although that is precisely how many of us erroneously interpret statistical differences.

Statistical testing can be a valuable tool in guiding business decisions that are derived from study results; however, we are in need of a refresher on the proper interpretation of statistical testing. I enjoy sharing with others my personal mantra, which summarizes my perspective: Statistical testing is no substitute for good judgment.

Discussed here are the two principal ways in which statistical testing is regularly misused: 1) interpreting “non-significant” findings as not meaningful; and 2) placing too much emphasis on findings that are statistically significant.

Focusing on the big picture: when “too few” findings are statistically significant

I have long been an advocate for employing a holistic approach to data analysis, part of which entails not being overly dependent on the outcome of statistical testing but rather focusing on the story that the data tell in totality. In too many cases, we are too stringent in our requirement that a key measure be statistically significant in order for us to glean insight from it. I cannot make this point clearer than with a quote from what some academic institutions refer to as the statistics bible:

“A test of significance is sometimes thought to be an automatic rule for making a decision...This attitude should be avoided. An investigator rarely rests [his or her] decisions wholly on a test of significance. To the evidence of that test, [the investigator] adds the knowledge accumulated from his own past work and from the work of others.” (Snedecor and Cochran)

This nugget of traditional wisdom, published in 1967, is as relevant as ever to research today. In the case of marketing research, the results from any one study do not stand alone. Rather, they must be interpreted in the context of 1) the researcher’s knowledge of the business, 2) related primary research, including qualitative and quantitative, and 3) any relevant secondary research.

Even within a study, a single, non-significant (or “directional”) difference interpreted outside the context of other findings from the same study is not very meaningful. It is when this finding is compared with the trends in consumers’ ratings and reports on other measures that its value (or lack of value) is fully understood.

Of course, I acknowledge that there are certain situations in which a go-no go decision has greater risk for the enterprise (e.g., launching a potentially inferior product formulation), and the criteria for making decisions need to be more stringent. However, in the majority of cases, we will develop much richer consumer insights if we evaluate and consider even directional effects that tell a consistent and convincing story. In other words, to extract maximal value from our data, it is necessary that statistical rigor be balanced with human analytic reasoning.

The flip side of the coin: when virtually every finding is statistically significant

Advancements in Internet technology have made consumers more accessible to researchers and more willing to participate in survey research. Obtaining large representative samples of almost any population can generally be accomplished quickly and cost-effectively. The positive impacts of these advances in marketing research have been an increase in robustness and reliability of samples. The potentially complicating consequence is in the interpretation of results. We all know that sample size is the biggest contributing factor in determining whether a difference is statistically significant. So, depending on just how large the sample becomes, we may find ourselves wading in an ocean of statistically significant differences. When this occurs, our reliance on statistical testing to direct us toward meaningful differences falls apart.

In a recent large-scale brand evaluation study, most mean attribute ratings were found to be significant at the 99 percent level. When looking at the actual means, they were identical when rounded to one decimal place. A similar study showed that even 1 percentage-point differences in Top 2 Box ratings between groups were significant at the 99 percent level. Do we want to make a big fuss over a 0.1 mean difference in ratings or a 62 percent versus 63 percent on a Top 2 Box basis?

The statistical bible offers counsel on this issue as well. The authors put forth that a statistically significant difference should be ignored if the magnitude of difference is not of “practical importance.” When was the last time most of us chose to ignore a statistically significant finding?

Of course, the next logical question raised by this directive is, “How do I determine ‘practical importance’?” Unfortunately, the answer to this query is much less straightforward. Determining practical importance - or, rephrased, importance to your business - cannot be answered by statistics: that decision requires the analyst’s judgment and category/brand expertise. Many companies have smartly made the decision on criteria for action standards an a priori part of the research process. For example, a consumer products company may require that a new product formulation be preferred by consumers two-to-one over a current product in order for it to be launched. These kinds of decisions can be much more difficult to make if they are made a posteriori (after fielding and data tabulation is complete). Nonetheless, in these large-sample situations, which will only become more common in the industry, the onus for setting decision criteria will fall increasingly on the marketing researcher’s judgment.

Rely on judgment

In summary, statistical testing has its place, but cannot be used as the exclusive decision maker. An analyst must rely on his or her own judgment to interpret each study by assimilating multiple findings using the analyst’s own category and brand expertise to derive meaningful consumer insights for his or her business. In short, a call to look beyond the upper- and lowercase letters is warranted. A loosening of our dependence on statistical testing to make business decisions is in some cases a necessity. After all, the decisions made as a consequence of interpreting CPG survey research do not carry with them the same gravity or social impact as those from clinical trials, medical research or some of the other physical sciences. It is often said among practitioners that marketing research is not brain surgery. Although it is usually said tongue-in-cheek, therein may lay a scintilla of wisdom.

Reference

Snedecor, G. W., and Cochran, W. G. (1967). Statistical Methods. Ames, Iowa: The Iowa StateUniversity Press.