Editor's note: William M. Briggs is an adjunct professor of statistics at Cornell University, Ithaca, N.Y., and a consultant in marketing statistics.
Statistics isn’t as easy as it looks. Mastering the subject isn’t equivalent to “submitting the data to software.” From my perspective as a statistician, these are the top five mistakes I have seen marketers and researchers make. Do any of them seem familiar to you?
Data drives statistics: If there isn’t any, few questions can be answered. Yet too much data causes problems just as too little does. I don’t mean big data, defined as rich and plentiful data, but of such size that it’s difficult to handle in the usual manner. Too much bad data is what hurts.
Who’s been in a survey-design meeting where a client wants to know what makes his product popular, where everybody contributes a handful of questions they want asked? And those questions lead to more questions, which bring up still others.
The discussion ranges broadly: Everybody has an idea what might be important. A v.p. will say, “I feel we should ask, ‘Do you like the color blue?’,” while a rival v.p. will insist on, “About blue, do you not like it?” Gentle hints that one of these questions could and should be dropped might be taken as impolitic. The marketing analysis company, wanting to keep its contract, acquiesces.
Statisticians are rarely invited to these soirées but if one were present he would have insisted that duplicate or near-duplicate data cannot provide additional insight but can cause the analysis to break or give absurd answers.
If there is genuine uncertainty about a battery of questions, then a test survey should be run first. This trial analysis works out bugs and sets expectations. The process can be iterated until the suite of questions are manageable and where there is now high likelihood each piece of data will be useful. This also prevents situations where a...