Migrating phone surveys to the Internet

By Randall Thomas | May 1, 2003

Reading time: 7 minutes

Abstract

Based on a conference presentation at the American Association for Public Opinion Research in 2002, this article summarizes two experiments out of a series conducted to determine the effects of including a NS/DK response to opinion questions to better understand the effects of item non-response on data comparability across survey modes.

Research Topics:: One-on-One (Depth) Interviews | Online Research | Telephone Interviewing/CATI
Content Type: Research Methodology

Share Print

Not sure about "don't know"

Editor’s note: Randall K. Thomas is senior research scientist at Harris Interactive, Rochester, N.Y. Portions of this article are based on a conference presentation at the American Association for Public Opinion Research, 2002.

The growth in the United States of market research utilizing Web-based surveys has been phenomenal over the past five years, with gross revenues rising by 20-fold. As a modality for information gathering, the Internet may be unsurpassed in giving us the ability to present questions and multimedia experiences to people widely scattered across geographic location and then find out how they respond quickly and inexpensively.

A significant portion of Internet-based research has involved transitioning tracking surveys that have been administered through non-Internet modalities (most often RDD telephone surveys). Many clients have conducted tracking interviews by telephone and are staunch believers in probability-based sampling approaches to information gathering. However, they have begun to realize that conducting Web-based interviews could bring them cost savings, enhanced capabilities, and rapid turnaround for research on widely dispersed or hard-to-reach people. These factors, combined with the potential for improved accuracy of measurement, have led them to migrate their trackers to the Internet.

A common purpose of a tracker is to gauge a general population’s opinions based on a sample of that population and track changes in that population’s interests and behaviors. Many factors affect the researcher’s ability to generalize findings from a specific sample to the larger population of interest to the client. One factor that threatens generalizability is survey non-response, which has received considerable attention by us. Another possible threat to generalizability is item non-response. One aspect of item non-response is how to handle “not sure” or “don’t know” (NS/DK) responses. Telephone-based approaches do not typically explicitly offer the option “not sure” or “don’t know” but will generally accept such a response if volunteered by a respondent (with some amount of effort often expended by the interviewer to minimize item non-response). Since Web-based interviews are self-administered, the NS/DK option is either offered explicitly or it is not. One concern that has been often voiced about the explicit presentation of NS/DK responses is that it will decrease comparability to telephone data. Clients often wish to retain historical data when they are transitioning from a telephone-based approach to a Web-based approach. We are often concerned about how to maintain historical trends for opinion-related questions (e.g., product satisfaction, purchase intention, etc.).

Some authors have believed that including these NS/DK responses would improve data quality by reducing the pressure to provide opinions when no true opinions exist (Converse, 1964; 1970). However, Dillman (2000) has indicated this area in particular has very little empirical guidance to help survey researchers understand how to migrate a survey from one mode to another. As part of our intensive investigations of practices that minimize differences between phone and Web-based surveys, we conducted a series of experiments to determine the effects of including a NS/DK response to opinion questions so that we could better understand the effects of item non-response on data comparability across survey modes. This article summarizes two experiments out of a series of experiments we have conducted to examine best practices in this area of survey migration.

Experiment 1
For our first study, we conducted a parallel phone and Internet survey in December, 2001. For our phone study we used RDD and had 1,011 respondents complete the survey. Our parallel Web-based survey had 2,098 participants. Their e-mail addresses were drawn from the Harris Poll Online panel using a stratified random sampling procedure to match the basic characteristics of the general U.S. population in terms of gender, age and region of residence within the U.S.

We had two experimental conditions in the online survey. The NS Present condition presented a series of questions with “not sure” presented as a response option while the NS Absent condition received the same series of questions with “not sure” absent. We randomly assigned 1,044 respondents to the NS Present condition and 1,054 respondents to the NS Absent condition.

Each respondent answered eight rating questions using a four-category rating scale (poor, only fair, pretty good, excellent). The first question was presented in isolation and asked how the respondent would rate the job performance of President Bush. The other seven questions were presented in a grid format with each target to be rated presented in the rows and the response categories presented in the columns. Figure 1 presents the questions used in both experiments reported in this article.

Figure 1 - Questions Used in Both Telephone and Online Survey Versions

How would you rate the overall job President George W. Bush is doing as president?
How would you rate the job each of the following is/are doing?
1. Democrats in Congress
2. Republicans in Congress
3. Senate Majority Leader Tom Daschle
4. House Speaker Dennis Hastert
5. Vice President Dick Cheney
6. Secretary of State Colin Powell
7. Secretary of Defense Donald Rumsfeld

Experiment 2
Experiment 2 involved a Web-based survey and took place in parallel to our telephone survey and Experiment 1. In Experiment 2, we used the same questions (attitude targets) as we did for the online version in Experiment 1. Respondents for this experiment were obtained by using a similar stratified random sample from our pool of Harris Poll Online panel respondents as described for Experiment 1.

We had three primary conditions – NS/DK Absent, Not Sure Option Present (NS Present), Don’t Know Option Present (DK Present). Of our 5,972 total respondents, we randomly assigned 3,922 respondents to the NS/DK Absent condition, 1,024 to the NS Present condition, and 1,026 to the DK Present condition.

Results

First, we examined for differences between the Not Sure and Don’t Know conditions in Experiment 2 and failed to find any significant differences in endorsement patterns or differences between means. We then collapsed the results for Experiment 2 into a single NS/DK Present condition and we report the combined NS/DK groups. Table 1 summarizes the percentage of respondents opting to choose the NS/DK response option when it was presented in the online surveys. The frequency with which respondents chose the NS/DK option was not significantly different for the eight items when we compared Experiment 1 with Experiment 2 for each item.

We calculated means for each of the questions and analyzed for differences as a result of experimental condition. Results for weighted data are reported in Table 2. In Experiment 1, six of eight means of the NS/DK Present were not significantly different from phone data using weighted data. For Experiment 2, five of eight means of the NS/DK Present condition were not significantly different from the phone data for weighted data. This contrasts sharply with our findings for the NS/DK Absent conditions. In Experiment 1, all of the means for the NS/DK Absent condition were significantly different from phone data and this was replicated in Experiment 2.

Figure 2 presents results for the two experiments in terms of the average proportion of endorsement difference (using weighted data). To calculate this we took the percentage of respondents who endorsed each response category for each question and subtracted it from the endorsement percentage for the telephone survey. We then took the absolute value of each difference and then averaged within conditions. Using a chi-square test, we found that the difference between the NS/DK Present and NS/DK Absent conditions was significant for both experiments. When the Not Sure or Don’t Know category was absent the average percentage deviation from the phone was 4.77 percent and 4.28 percent for Experiments 1 and 2, respectively. When the Not Sure or Don’t Know category was present the average percentage deviation from the phone data was 2.92 percent and 1.87 percent for Experiments 1 and 2 respectively.

Discussion

Of the many difficulties encountered when trying to create an online survey comparable to a phone survey, perhaps none is so vexing as trying to find the right set of responses that will parallel endorsement frequencies obtained in the phone survey. These two experiments begin to illuminate how possible variants will affect their comparability. We found that the presentation of a “Not Sure” or “Don’t Know” category for opinion questions presented online is more likely to yield data comparable to that found by way of telephone data. These results add to our growing knowledge about how to mount surveys in other modalities and obtain comparable data. We are currently conducting further experiments with parallel telephone and Web survey components to extend these findings.

Some limitations of the current study should be noted. First, these findings may hold only for attitudes and not for other types of questions (e.g., behavioral). In addition, another commonly used procedure to avoid item non-response is to present a familiarity screen. This procedure first asks which topics a person is familiar with and then presents only those topics with which he/she is familiar for subsequent questions. Since this procedure affects the basic structuring of the questions it may also affect data comparability and will also need to receive further study. Finally, while we investigated how the inclusion or exclusion of the NS/DK category affected comparability of data in two different modes, it didn’t address if the data was more or less valid with the inclusion of the NS/DK category (see Krosnick et al., 2002).

References

Converse, P. E. (1964). The nature of belief systems in mass publics. In D. E. Apter (Ed.), Ideology and Discontent. New York: Free Press.

Converse, P. E. (1970). Attitudes and non-attitudes: Continuation of a dialogue. In E. R. Tufte (Ed.), The Quantitative Analysis of Social Problems. Reading, Mass.: Addison-Wesley.

Dillman, D. A. (2000). Mail and Internet Surveys: The Tailored Design Method. New York: Wiley.

Krosnick, J. A., Holbrook, A. L., Berent, M. K., Carson, R. T., Hanemann, W. M., Kopp, R. J., et al. (2002). “The impact of ‘No Opinion’ response options on data quality: Non-attitude reduction or an invitation to satisfice?” Public Opinion Quarterly, 66, 371-403.

McClendon, M. J. and Alwin, D. F. (1993). “No-opinion filters and attitude measurement reliability.” Sociological Methods & Research, 21, 438-464.