Editor’s note: Gary M. Mullet, Ph.D., is president, Gary Mullet Associates, Inc., a consulting and statistical data processing firm in suburban Atlanta, Georgia. The author wishes to acknowledge Paul M. Gurwitz, whose article in the February 1991 issue of this publication treated related issues.

Recently I tried to convince a statistical software package that when I typed "variable" I meant "variable". The software, however, used what I said and ignored what I meant to say. Shortly after that I ran across a headline for some new statistical software which blared, "For people who aren't statistics experts." I'm not sure that one necessarily has to be a statistical expert to properly use statistical software, but as long as computers and their programs do exactly what they're told to do, instead of what they should have been told to do, oversimplification of software use can lead to trouble. Many times the difficulty is as easy to spot as the "variable-variable" one. Many times it's not, as will be seen below.

None of the instances which follow are meant to deride or belittle anyone. Instead, they are shown to illustrate just how easy it is to push the wrong button and ask for the wrong analysis. I still type "variable" at least half the time, inadvertently and incorrectly. My error brings the analysis to a screeching halt and is easy to find. These examples are both more subtle and potentially more serious.

Examples

At least one data tabulation package does a t-test for proportions or says it does. Generally, for large enough samples (whatever that may be--and for proportions it's not necessarily anything greater than 30) the results will agree quite closely with the more correct Z-test or x2-test. However, there are some fairly strong assumptions underlying the t-test. Even though these assumptions may sometimes be violated with impunity, strictly speaking there is no such animal as a t-test for proportions. The program in question, however, is simple to use and incorrect analyses can be performed without question. While a "statistical expert" is probably not necessary to tell you whether or not your particular analyses are all right to report, someone with at least a modicum of knowledge could certainly help. The easy-to-use software can get an unwary analyst into serious difficulty.

While we're at it, you should be aware that the assumptions behind the above mentioned Z-test and/or x2-test are also quite stringent. For some of your analyses they, too, may be violated-and the computer package used might not flag the violation. It happens a lot in practice, because the programs do exactly as they are told, whether or not you really should have meant to tell it to do such an analysis. (If you do find cases where these tests shouldn't be done on your proportions, you're probably stuck either doing an exact test or an arcsin transformation.)

Another variation on this theme is the analysis which was done on a simple paired-product preference. How was it decided whether or not the proportion who preferred product A was different than that preferring B? A dependent or paired t-test. Why? The computer certified the methodology by performing the requested analysis. Quick, simple, easy-to-use and wrong. But at least the analysis was done without the use of a "statistics expert".

Computer programs that don't require a "statistics expert" may be useful in designing conjoint studies. Just push the right button (usually ENTER or RETURN), and here come your conjoint scenarios ready to print and send to the field. Again, at least in a few cases, the easy-to-use computer programs have been the source of trouble. In one, a 32-card sort was produced for a study in which one of the attributes had 5-levels. With the other attributes at 2-, 3-, and 4-levels, the design was not a desired orthogonal array--but was unknowingly used anyway.

Another conjoint study was designed for respondents to sort 16 cards. The problem here was that one combination of two of the attributes didn't vary together. The pairs of levels were constant. To illustrate, if one of the attributes was color with two levels, say, red and blue and the other was size, say, large and small, what the respondents saw was red-large on 8 cards and blue-small on the other eight. Clearly, there is no way to generate the utility estimates that were desired, but no one thought to question or check the computer generated design before the study was actually completed. The computer program which did the design ( and it was written especially for this study) performed exactly as instructed, not as it should have been instructed.

A cluster analysis was run on one of the easier-to-use, among the easy-to-use, cluster programs. The program ran exactly as told, but after a couple of iterations, clusters of size 1 or 2 popped up. What happened? Seems that for a handful of respondents, some, but not all, of their answers were punched one card column to the right of where they should have been. These respondents, then, were showing up as the small clusters since they were, in fact, very different from everyone else. The user of the cluster program had no idea whether or not the cluster made sense; after all, there were no error messages displayed.

Another computer program was designed to generate mailing labels from a data base. Just tell it how many you need and names are selected at random and mailing labels produced. In this particular case the computer generated labels weren't even given a cursory glance--after all the computer printed them--but just stuck on the envelopes and dropped into the mail. The only problem was that, while the name, city, state, and ZIP code were on each label, the street and number were not. Lots of undelivered surveys were returned to the sponsoring organization, with the obvious disastrous consequences to the study.

In yet another case, a computer program did an analysis which was really unnecessary. A series of statements were collected on a scale where 1 = Yes, the statement applies and 0 = No, the statement doesn't apply. No problem so far. What the computer was asked to do, and did, was produce correlations between these statements and the same set of statements recorded as 1 = No, statement doesn't apply and 1 = Yes, statement applies. The computer was all too happy to compute these unnecessary correlations, at no small cost. They could be done, therefore they were done.

Yet another frequent happening (mentioned by Gurwitz) is to request the computer to run a discriminant or regression or factor analysis. Quick and easy, if it weren't for item nonresponse. Most computer packages drop a respondent totally from such analyses for having only a single missing answer, sometimes out of 100 or so items. Several times the ultimate user of such analyses will be looking at their multivariate analyses for marketing insights only to find that the analyses weren't performed at all due to every respondent having at least one missing answer. These, at least, wave a red flag. Even worse are the analyses which are performed, retained and acted on even through the base sizes were only 10 or 15--those who answered everything requested in the survey. Again, the computers are merely following orders.

The missing data problem can be severe, but generally unnoted, when discriminant based perceptual maps are drawn. Reliability can be a real problem when the bases for such maps are only 10 or 15 respondents, but the mapping algorithms perform anyway--quickly and easily. Also, you can get maps done when the different brands shown are rated on different attribute lists or the attributes are scaled differently on the questionnaire. So-called multiple correspondence analysis maps have been produced from several 2-variables crosstabulations, rather than going back to the respondent data. They show all of the points required, even though the coordinates were not generated as they should have been. Then there was the discriminant based map which used such a high significance level that the attribute directions were essentially random. The map made no sense because someone told the computer to use a high significance level instead of a high confidence level. The computer didn't balk; thus, the analyses which could be done were done but the analyses which should have been done were not done.

The mystique associated with statistical computer programs is not limited to those commercially available. A computer program was specifically written to perform a non-standard, but still valuable, statistical procedure. As in most such cases, textbook data sets were used to test the program, which performed well. Unfortunately, the degrees-of-freedom were set as a constant value, 4, irrespective of the number of respondents and/or stimuli. No one noticed this one for weeks, mainly because everyone believed that the printed value should be correct; after all, the computer said it and the program was easy to use.

It's also easy to get in trouble, since the computer is like Ado Annie (it "can't say no"), on some harmless looking analyses. In one such study, a series of attribute ratings were of the variety "Too Big," "Just Right" and "Too Small." These were coded and entered into the data file as 1, 2, and 3, respectively. Two products compared on one such scale showed Product A with 3 vote for "Too Big" and 112 for "Just Right." Product B had 23 respondents say, "Too Big," 59 say, "Just Right" and 33 respond with "Too Small." Obviously, the products are different with respect to this scale. However, the computer was instructed to do a dependent t-test on the means which turned up as not significantly different. If only the computer could have said no!

Computers also don't question you (or me) when you try to analyze dependent samples as if they were independent (or vice versa) as long as the data fit the required format for the test. They also don't ask if you have an overlapping sample for analysis--they just do as they are told.

In one survey, the project director designed the study to test for order bias by using all 6 possible rotations of the 3 brands in the survey. Here the sin was of omission--the CRT interview did not capture which rotation was used on which respondent. Here, too, the computer did exactly as it was told-- it just wasn't told to do enough. An easily used CRT interviewing package was involved in this one.

Conclusions

It would be nice to say at this point that the above cases were all apocryphal. Alas, none of them are. This is not to say that you need to be totally paranoid every time you skim a computer generated statistical analysis, although a little paranoia may not hurt. The point is that the easier-to-use the statistical programs become, the more self-styled statistical experts seem to turn up-statisticians-on-a-chip, as it were. Doing the wrong thing or doing the right thing incorrectly, just because the computer programs allow it, is probably more harmful to a marketing research project than not doing anything at all. At least in the latter case, getting no answer at all is probably less harmful than getting the wrong answer (it sure is when I type "variable").

It's also not as simple as comparing the means from the statistical analysis with those from your crosstabulations. If they agree, then the statistical analysis must have been done correctly; if not, the advanced analysis must be wrong--right? Not quite.

In one recent study, the statistical analysis was correctly performed, on carefully "derotated" data and the means didn't even begin to agree with the data tabs. You guessed it--the data were not derotated before the tabs were done. The statistical analyst was questioned at length about the disagreement between the means, as well. In this case, at least, it was the easier analysis which the computer didn't question--and should have.

Nor is a solution coming through the haze of my crystal ball. Both the American Marketing Association and the American Statistical Association have wrestled with and continue to wrestle with the issue of certification, but that's probably overkill for this type of problem. Even assuming that certification would help, it's still a long way off. At the very least, we need to ask questions, lots and lots of questions--not just of our data but of those who ask questions of our data. Taking computer printouts at face value can be very risky until computers are programmed to know "what" as well as "how."