A tale of two questionnaires

Editor’s note: Pete Cape is global knowledge director in the London office of Survey Sampling International (SSI). Based in SSI’s Fairfield, Conn., office, Jackie Lorch is vice president, global knowledge management and Linda Piekarski is the firm’s vice president, database and research.

In a recent study, Survey Sampling International (SSI) looked at the topic of survey design and analyzed how faulty design can result in faulty data. Study results showed that bad survey data and survey design, rather than panelists, are often responsible for poor-quality sampling results.

The findings make it clear that research companies and panel companies must work together to present panelists with good surveys: surveys containing understandable, concise and complete questions that elicit the most valid data possible.

To help clarify what constitutes a good survey (that which produces reliable survey data) and a bad survey (that which doesn’t produce reliable survey data), SSI fielded two different surveys in August 2007. Approximately 500 SSI United States panelists completed each survey. The first survey was designed as a worst-of-breed, using examples of poor types of questions SSI commonly encounters while checking surveys every year. SSI’s respondent experience group, which is dedicated to ensuring a pleasant survey experience, redesigned the second survey.

The following questionnaire design issues were tested:

  • incomplete scales;
  • means of gathering top-of-mind awareness;
  • means of collecting total awareness;
  • not specifying the category;
  • forcing a choice;
  • lack of criteria on which to choose;
  • not allowing “don’t know”/“none of these”;
  • lack of definition on mathematical issues;
  • two answers in one;
  • self-selection on questions;poor wording in statements;
  • poor use of English;
  • use of the negative; and
  • bias in questions.

Incomplete scales

When all possible options are not included in a question, the respondent can abandon the questionnaire or provide a random answer. A random answer distorts the data. As part of our survey-testing, a “year of purchase” question was included - the bad survey did not include all potential options and the good survey covered all eventualities. From the good questionnaire SSI observed 9 percent purchasing in 2004 and 43 percent purchasing before 2004. In the bad questionnaire, 72 percent chose the code “before 2004.” Should the researcher interpret the error as being read by the respondent as “2004 or before,” then a very real error in estimation will have occurred.

Means of gathering top-of-mind awareness

To gather top-of-mind awareness, researchers can choose to ask a single question followed by a second question for the rest of the spontaneous awareness or provide multiple boxes. In the second scenario, the first box is taken to represent top-of-mind. The two-question option takes slightly longer for respondents to execute. SSI found that data collected using each method is similar and there is little to be gained by asking separate questions and lengthening the survey.

Means of collecting total awareness

In SSI’s analysis, the good survey asked respondents to re-code their spontaneous answers into the prompted list and the bad survey gave respondents no instructions as to what to do. Respondents to the good survey re-coded into the prompted questions those brands they had written in the spontaneous; the bad survey respondents did less well. According to these results, a great deal of data could have been lost had there been a need to route the respondent from the prompted awareness question in the bad survey.

Specifying the category

It can be tempting for researchers to assume that respondents know what the researcher is asking about if it has been mentioned once. Our research shows that it is vitally important to restate the category when the question type changes. SSI found a large overestimation of awareness for brands that are well-known outside the category under consideration in the study.

Forcing a choice

Researchers often want to know what the top three choices are within a given set. If they do not allow respondents to choose fewer than three, they are, in effect, forcing them to lie. Forcing choices can result in large overestimates of non-consideration as respondents attempt to answer the question as posed.

Lack of criteria to choose

In a “brand association with statements”-type of question, it is important to specify precisely on what basis the brand(s) should be selected. SSI found that leaving out an option for “none of these” could seriously compromise brand association data.

Allowing “don’t know”/“none of these”

In a brand ownership question, SSI did not allow respondents in the bad survey to answer “don’t know.” In the good survey, 10 percent of respondents did not know the answer. In another question, an “other” category was not provided. Respondents in the bad survey were forced to choose between two options. Judging from the good survey data, almost half of respondents would have chosen the “neither” option, had it been available. We conclude that the proportionate split between the two options in the bad survey is most likely incorrect.

Lack of definition on mathematical issues

“How big,” “how many” and “on average” are terms that need to be precisely defined. In the bad survey, SSI deliberately left vague a question about capacity and did not include a “don’t know” option. In the good survey, 82 percent did not know the capacity of the object in question. In the bad survey, the average capacity was 15 times larger than on the good survey. In addition, asking people questions to which they do not know the answer is guaranteed to raise panelists’ ire.

Two answers in one

SSI’s surveys found that separating two items in one answer with a slash rather than “and,” “or” or “and/or” can result in over- and underestimating the importance of the items in question.

Self-selection on questions

A “consideration” battery should precede a “satisfaction” battery. SSI’s good survey included both, and only those who scored a four or five in consideration were asked to rate the factor in terms of satisfaction. The bad survey allowed respondents to choose whether or not to answer the satisfaction battery. All chose to answer the full satisfaction battery. Without the consideration battery, the only conclusion that can be drawn is that panelists had considered all the factors. In this instance, self-selection underreports the overall satisfaction measure that would have been achieved by correctly routing only those who had strongly considered the factor.

Poor wording in statements

For an agree/disagree-type scale to work well, it must be clear what it means to agree or disagree with the statement. Using “I tend to” instead of “I do” resulted in a different distribution of answers in SSI’s test.

Poor use of English

English is the established language for most international research; however, the questionnaire can be fundamentally flawed if the researcher is not a native English speaker. This may lead to divergence in the translated versions. While the marketing conclusion in SSI’s two data sets was the same, actual numbers were quite different.

Use of the negative (particularly the double negative)

Statements in the negative or double negative can be difficult to understand, especially if among a battery containing positive statements. SSI’s surveys resulted in two completely different data sets.

Bias in questions

The bad survey included a bias question, which was a qualification to a statement that was a matter of opinion. Results, although predictable, serve as a useful reminder that it is easy to manipulate opinions for one’s own purposes.

Panelists persevere

SSI found panelists recognize a well-designed survey. As part of its study, respondents were asked to rate the survey itself. More than a third of the panelists in the bad survey who said they did not enjoy the survey cited “quality of questions” as the reason; for panelists in the good survey who said they did not enjoy the survey the main reason cited was the survey topic.

Even when presented with poorly-worded questions and confusing instructions, panelists generally persevere and do their best. Contrary to SSI’s expectations, the top score on “enjoyment” for the bad survey was 60 percent compared to 67 percent for the good survey. While the bad questionnaire took, on average, 18 percent longer to complete than the good questionnaire, the dropout rate from the bad survey was only 4 percent higher than that of the good survey. Poor questionnaire design, however, may well have an impact on future survey-taking and once a panelist stops being a regular survey contributor, they stop participating in surveys altogether.

Overall, the study supports the importance of questionnaire design. Shortcuts should not be taken in online questionnaire design and, in fact, even more care and attention needs to be lavished on such surveys to ensure a positive survey experience for panelists as well as optimum quality data.