Every survey that is based on a sample from a large universe is subject to two different types of error, error which relate to very different ways in which survey results can yield a misleading picture.


"Error," says Alan Roberts, former manager of market research, Wayne Seed Division of Continental Grain Co., Chicago, "are factors which may cause the picture portrayed by the sample to differ from the picture that would have emerged if a completely accurate count (U.S. Census) had been made of the universe from which the sample was drawn.

"These two types of error are called sampling error and for want of a better word, non-sampling error," says Roberts. "Sampling error relates to the reliability of data; non-sampling error relates to the validity of data."

Reliability

Reliability is a concept like repeatability, says Roberts. That is, if you keep repeating, in all executionary details, your first survey, a technical statement can be made that results will probably fall within a certain range, that numbers generated will have a degree of stability, a certain percent above or below what the first survey reported.

"Note that this has nothing whatsoever to do with how accurately your survey reflects the real world out there, the world of everybody that your little survey did not communicate with," says Roberts. But that limitation never prevents researchers from making what they call "confidence statements" about the "statistical significance" of their numbers.

The confidence they speak of, such as 90% or 95% or 19 chances out of 20, comes only from a probability theory. It enables researchers to make very impressive statements that differences in numbers generated by a survey are either significant (i.e., outside range of numbers one would expect on a chance basis, given sample size) or not significant (i.e., within expected range).

Says Roberts, "This is all good and well, but survey research is used to guide decision-making by management. What management needs is a true picture, a true road map or blueprint, of a given market, and/or of the purchase processes that drive that market. There is only very limited value in management knowing that findings of a first survey would probably be very similar to those of a second survey, if it were identically conducted. Such knowledge begs the issue of whether the survey methodology was any good in the first place. In other words, statements of statistical significance beg the issue of data validity and hence its usefulness.

Types of error

One can scarcely list all possible types of non-sampling error, all the ways that a sample survey can yield misleading data, all sources of invalid information about a target market that can be associated with sample surveys. Just a dozen such types are listed here:

(1) Non-probability sample, which is by far the most common type of sample used and puts "up for grabs" the issue of degree to which the sample of convenience actually used reflects or fails to reflect the universe (or market) that management seeks to gain information about.

(2) Non-response, even when at an "allowably" low rate such as 15 or 20%, creates doubt (seldom addressed in research) as to how survey results would have changed if non?respondents had all, in fact, participated in the survey. In many procedures, such as widely used intercept surveys in shopping malls, no information at all is available about refusals, and there is no basis for learning more about non-response.

(3) Response by a non-targeted individual can arise in by-mail surveys when the questionnaire is executed, or influenced, by a person other (e.g. family member) than addressee.

(4) Interrespondent bias can occur in by-mail surveys, as when neighbors participating in the same survey get together, but more commonly occurs with research done in any theater-type setting where respondents sit side-by-side as they execute self-administering questionnaires. Or, during one-on-one interviewing in a public area, a subsequent respondent may overhear questions and answers from the interview with a prior respondent.

(5) Respondent "yea-saying" is a widely encountered phenomenon. It is based on a psychological need, more strongly felt by some individuals than in others, to please the interviewer by answering according to how the respondent senses the interviewer would like to have a question answered.

(6) Respondent fatigue may arise early or late in an interview, but more likely toward the end. Fatigue is a euphemism for unrest, since the respondent need not become physically tired and it does not require a two-hour interview process for such unrest to arise. Commonly, interviews are solicited with explicit or implied promise that they will be brief and/or easy. If, at any point, the respondent concludes that the interview has gone beyond his/ her expectations, termination may occur. But, more likely, the respondent will be too polite to cut off the interviewer and will simply begin to answer whatever comes to mind that will more swiftly conclude the interview. Quality of data deteriorates in that process.

(7) Questionnaire bias can involve either construction (sequence of questions) or phrasing. Order bias is a special issue that can occur within a question. Professional researchers are usually competent enough to avoid the more obvious types of questionnaire bias, but when operating management starts hanging "whistles and bells" on the professional's questionnaire draft, much bias can creep in. Even in otherwise unbiased questionnaires, some order bias may be unavoidable, as when sample size or other cost factors do not permit rotation of listing order to the fullest extent needed to avoid any possible bias.

(8) "Iffy" questions that yield "soft" data, i.e., data of low predictive or descriptive value, abound in questionnaires. Most notorious is the almost universal five-point "intent to buy" questions (definitely would buy/ probably would buy/might or might not buy/probably would not buy/definitely would not buy). Any question that asks for more than a respondent's actual (past) behavior and/or current opinions tends to be "iffy."

(9) Questions outside the respondent's qualified range of personal knowledge or interests the researcher hopes will be answered "don't know." Unfortunately, many respondents feel that admitting ignorance about a subject may undermine their self-image. So, they prefer to guess and their answers are tabulated right along with those of knowledgeable respondents.

(10) Interviewer bias can be insidious, especially in surveys where interviewing is not centrally controlled. Personal, one-on-one interviewing is a situation permitting overt or subtle exercise of influence by the interviewer on response pattern. This may occur with minor rephrasing of a question by the interviewer, tone of voice, facial expression, anything that clues the respondent as to an expected answer. Often, after completion of several interviews, the interviewer begins to expect a certain response pattern and may, without fully appreciating it, communicate that expectation in the course of subsequent interviews.

(11) Interviewing cheating need not be of the most egregious (and easily detected) sort that involves reporting of many totally fictitious interviews. It can also be more limited or subtle, as when an interviewer who has skipped a question or two, or experienced a termination just before asking the last couple of questions, yields to the temptation of raising her completion count by inventing a few brief answers here and there, after the interview. Or, the interviewer may find an apparently cooperative would-be respondent who fails to meet respondent qualifications specified in the survey design and yields to temptation of completing that interview after falsifying one or more questions on the qualifier.

(12) Simple incompetence in data gathering is probably a bigger source of invalid data than actual cheating, although both stem from the same root cause: interviewers tend to be poorly trained, part?time people, often grossly under-compensated given the importance of what they do. Sloppy interviewing techniques can take many forms, including misrecording answers, failure to probe, skipping or rephrasing questions, asking questions or reading lists out of required sequence and failing to qualify respondents.

Probable validity

Of course, do not exhaust all possible examples of non-sample error and do not address problems of maintaining data quality across the edit, code and tabulate-on stages, says Roberts. They are set out only to underscore how much false security may be involved when management accepts sample survey results in an uncritical way, basis a researcher's confidence statement about the statistical significance of various reported totals.

Far more salient to the success of management decision-making is the need for management to assess the probable validity of survey data referred to in decision-making, to dig into the design, methodology and controls used in the survey, to satisfy themselves that data reported likely will supply a reasonably accurate picture of the market and market segment that it purports to measure.

Someone once drew an analogy between total survey error and the hypotenuse of a right triangle, where the other two sides represent sampling error and non-sampling error, says Roberts. That is, the hypotenuse must be longer than either of the other two sides, because it is the square of the sum of the squares of the other two sides (e.g., the survey error hypotenuse is five when the sides are three and four).

That metaphor is useful because, first, it focuses attention on possible error other than that implicit in every sampling process and, additionally, it positions total survey error as necessarily something greater than sampling error alone.

Unfortunately, the metaphor is also a bit simplistic and misleading. Sampling error can be (and should be) stated with quantitative precision; all other sources of error - any factor tending to undermine data validity - are too diverse to permit quantification and require qualitative assessment by professionals whose skills extend into many areas besides probability statistics.

Assessing impact

In the real world, the confidence statement of statistical significance seems so scientific that the difficult, often messy, process of assessing impact of non-sampling error is all too easily overlooked. We are not used to thinking of a right triangle with two sides in ratio of one to 20 or 30. Yet, in terms of usefulness of survey findings, when survey error side is one and non-survey error side, if quantifiable, would turn out to be many times larger, management can be really "blind-sided" by the size of the hypotenuse.