The following is an excerpt from a chapter in the U.S. Dept. of Commerce report “Approaches to Developing Questionnaires." Principal contributor: Deborah H. Bercini

Observation of face-to-face interviews or monitoring of telephone interviews is most frequently thought of as a quality control technique, that is, a means of measuring interviewer performance and interviewer variability. This article examines the usefulness of observation and monitoring for a different purpose, that of evaluating the questionnaire and related data collection procedures. The term "observation" is commonly used in conjunction with face-to-face interviewing and "monitoring" with telephone interviewing, although both activities involve making similar sorts of judgments. In this report, "observation" is generally used in connection with both modes of data collection unless specifically stated otherwise.

Of the methods available to survey researchers for testing the adequacy of a questionnaire, observation of interviews is one of the most easily employed. Observation or monitoring to detect problems in the survey instrument and field procedures is conducted most frequently during the testing phase of the survey, including informal tests and formal tests.

Clearly, this is the time when observational feedback can be of the greatest value in making revisions. However, a program of observation can provide researchers or survey designers with useful insights at any stage of data collection. For example, observations made throughout the interviewing stage of a one-time survey with an experimental or methodological component can be enormously valuable when discussing the results. Also, observations made during repetitive or continuous surveys can result in improvement in subsequent interviews.

Perhaps because the technique appears to be so simple, nonparticipant observation is rarely mentioned in the standard survey planning texts. Authors may assume that all survey designers routinely observe their questionnaires in action, although this is not the case. Commonly, observation or monitoring of interviews is considered the responsibility of the field supervisors rather than of the survey planners. Undoubtedly, this stems from the fact that interviews are usually observed to evaluate interviewer performance instead of questionnaire performance.

Another reason why a discussion of observation and monitoring programs is usually absent in survey texts may be the seemingly subjective nature of the technique. The subjective element of a nonparticipant's observations allows for an unconstrained overview of the questionnaire and interviewing situation that is conducive to creative diagnosis of problems and formulation of solutions. However, the degree of subjectivity and reliability of observation is highly dependent on the system used to record the observations.

Observation of face-to-face interviews or monitoring of telephone interviews by a third party who has been involved in the design of the survey, questionnaire or data analysis plan can identify flaws in the data collection instrument and other procedures that cannot be detected by statistical analysis of the data or feedback from interviewers alone. Interviewers, no matter how skillful, are too involved in eliciting a response to "step back" from the interaction and fully analyze difficulties in communication with the respondent. Experienced interviewers may inadvertently conceal a defect in the questionnaire design by their ability to handle awkward situations. On the other hand, less experienced interviewers may attribute problems to the instrument that are more related to poor interviewing technique. Interviewer debriefings and written evaluations are extremely useful tools for judging the adequacy of a questionnaire.

However, they cannot substitute for the observations of someone who is thoroughly familiar with the concepts and objectives of each questionnaire item.

The following is a compilation of some of the interview characteristics and questionnaire design issues that lend themselves to evaluation through observation or monitoring. The list is presented in a field test context, although many of the same characteristics can also be studied during subsequent stages of the survey.

Respondent cooperation

Among respondents who agree to be interviewed, degrees of cooperation can vary greatly. The standardized explanation of the purpose of the survey and the confidentiality statement (if appropriate) that precedes the first question or a new series of question must both motivate and inform respondents. An observer can note whether respondents understand the task they are being asked to perform by the questions they ask the interviewer or by irrelevant responses. The willingness of respondents to search their memory for requested information can be ascertained by the quickness or off-handedness of responses. If the consensus among observers is that respondents are reluctant to put forth the effort necessary to provide complete, accurate, or "valid" responses, then the survey instrument becomes suspect.

Interview flow

A nonparticipant observer is in a particularly good position to judge whether the interview flows smoothly, and if not, to analyze the causes. Respondent confusion, distraction, or dwindling interest can be related to abrupt transitions between questionnaire topics or awkward and lengthy gaps, for example. Interviewers may have difficulty managing poorly formatted questionnaires, or multiple questionnaire booklets, whether the interview is constructed face-to-face or over the telephone. The physical appearance of the questionnaire can encourage or frighten respondents, and observers can easily make note of this. A third party can also check whether flashcards or other materials handed back and forth between respondent and interviewer are aids or impediments to the progress of the interview.

Length of interview

Interviewers are routinely instructed to record the beginning and ending times of an interview, so the overall length is almost always available. But nonparticipants can unobtrusively time individual sections of the interview and note the occurrence of substantial interruptions.

Observers can make notes relating the time to characteristics of the household members, health of the respondent, or other factor relevant to the survey topic. Because an observer does not have to be concerned with recording the responses, (s)he can be alert to cues that the respondent is losing patience, becoming fatigued, etc. The respondent's perception of the amount of time the interview is taking as manifested by comments such as "How many more questions are you going to ask?" is as valuable a piece of information as the measured interview time.

Personnel and skill requirements

For the most part, the personnel involved in the observation of interviews for questionnaire design purposes are members of the survey staff who have been involved in planning the survey design, questionnaire, data analysis, or interviewer training. It is important to ensure that people familiar with all aspects of the subject matter, objectives and procedures of the survey provide advice during the development process.

Depending on the type of system used to record the results of observations, one or more coders may also be required to tabulate and summarize the results.

Selecting the interviews to be observed

The primary purpose of a program of observation is to detect questionnaire and interviewing problems based on use in situations similar to those expected in the actual survey. Since this is also the general objective of a field test, formal or informal, the composition of the test sample is usually appropriate for a program of observation also. However, it is frequently not possible (and perhaps not desirable) to observe every interview in a field test. The survey researcher then must decide whether the kinds of observational feedback needed will be obtained from observations of a self-weighting, 'representative' subsample or from observations of a biased subsample that includes a disproportionate number of units likely to provide a test of selected sections of the questionnaire.

For telephone surveys, the method used to identify a sample of interviews to be monitored depends on the sampling frame of the survey itself. The selection of interviews to be monitored in a random digit dialed telephone survey field test cannot be as controlled as for a field test of personal interviews, because nothing is known about the sample unit before it is contacted. (In random digit dialed telephone surveys, the sample telephone numbers are generated randomly by computer.) Monitors should be aware that a large proportion of numbers dialed will be nonhousehold numbers, no-answers, or other forms of noncontacts. If the test sample for a telephone survey is in the form of a list of numbers known to contain eligible sampling units, then the selection of interviews to be monitored can be much more efficient.

Besides observing "live" interviews, another option available to survey planners involves tape recording the interview for detailed analysis afterwards. Respondent permission is necessary when this is done.

For all programs of observation or monitoring, it is particularly important that a variety of interviewers be selected so that observations are not biased by an interviewer effect. When monitoring telephone interviews, the monitoring schedule should convey as many interviewers as possible at different times of the day. For the same reason, it can be helpful to get feedback from as many observers as possible.

Characteristics of individual questionnaire items

To evaluate questionnaire items, an observer must have some notion of what constitutes acceptable question performance. Most researchers or survey planners probably feel that they will be able to detect question flaws without establishing a strict set of mental or written criteria.

However, it is useful to learn what researchers in the field of questionnaire evaluation through observation have determined to be characteristic of successful questions.

Cannell and Robison (1971) set forth two basic dimensions for judging the adequacy of a question: How well the question communicates with the respondent, and the extent to which the question builds and maintains the relationship with the respondent.

Morton-Williams (1979), in an elaboration of Cannell and Robison's work, developed nine criteria for judging question performance.

  1. The interviewer should have no difficulty asking the question correctly.
  2. The interviewer should have no difficulty determining whether the question should be asked.
  3. The question should be unambiguous.
  4. The question should be about a subject that has meaning and relevance for the respondent.
  5. The question should ask for information that the respondent is able to remember or has easy access to.
  6. The question should ask for information that the respondent is willing to give.
  7. The type of answer that is required from the respondent should be clearly conveyed by the wording or format of the question.
  8. The objectives of the question should be clear so that the interviewer can decide if the responses should be probed.
  9. The format of the question should make it easy for the interviewer to record the answer correctly.

On the assumption that a well-designed question will cause few problems for the interviewer or the respondent, survey researchers often evaluate questions by some of the same criteria that are used to evaluate interviewer performance. For example, individual questions are judged by whether the interviewer asked the question exactly as worded, asked the question in the correct sequence, omitted the question in error; whether the respondent asked for clarification, gave an adequate response, and so on.

In addition to general criteria which can be applied to almost any questionnaire item, observers usually evaluate the interviews against a set of very specific standards applicable to the individual questionnaire. For example, observers may note whether respondents consulted their bills and receipts for certain questions in a household expenditure survey or the ease with which the interviewer administers a complicated procedure that depends on the respondent's answer to a previous question.

System of quantifying observations and training of observers

For the observation/monitoring program to be of value to the questionnaire designer, the feedback from the observations must be relayed in a manageable, analyzable form. Similarly, the research or questionnaire designer must provide observers with some focus or objectives for their activities. Observers who are instructed to "note any problems" will probably return with a hodgepodge of unrelated comments that would be difficult to interpret. The survey planner must decide on the types of information (s)he wants to get out of the series of observations before the observations begin. The most useful feedback will come from observers who understand what specific problems and behaviors to look for and who have the ability to recognize the unanticipated rough spots as well.

The degree of structure imposed upon the observations will depend upon where the questionnaire is in its evolutionary development. The observational objectives for a questionnaire in an early draft form may be less defined because the survey planners are not yet fully aware of what the potential problems might be. As the questionnaire becomes more refined, so can the focus of the observations.

Using forms to quantify observations

Observations may be recorded on forms developed specifically for that purpose or observers can write comments directly on the questionnaire. If the survey planner wants to collect comparable information from each observer, it is advisable to use a standardized observer's form or observer's questionnaire. An observer's questionnaire can be constructed so that the observations are recorded in a standard fashion next to each questionnaire item. This is accomplished by inserting the observer's check item after each regular questionnaire item. Observation forms are often designed so that the same information is collected for each question, e.g., "question asked as worded," "question omitted in error," "respondent asked for clarification," and so on. Or the researcher may be interested in different but specific characteristics of some or all of the questionnaire items. In addition to the closed-ended, "check box" observations, more analytical, creative comments can also be gathered. In all cases, observers need to be trained on the use of the forms and the kinds of observations to record.

Verbal interaction coding

The kinds of observations that can be recorded during an interview are somewhat less detailed than those that can be obtained from analysis of a tape recorded interview. Cannell et al. (1971, 1975) and Morton-Williams (1979) used tape recorded interviews to develop and apply a coding scheme based on specific pieces of interviewer and respondent behavior, called verbal interaction. Each question was subjected to the same codes so that problem questions could be identified by the number and type of codes they received. Cannell's research (Marquis, 1971) involved the application of 52 specific behavior codes to 164 tape-recorded face-to-face interviews. Eight specially trained coding clerks coded the interviews. Agreement on which code to select was generally high (an inter-coder reliability of 86 percent was achieved) when coders could agree on whether a codable behavior had occurred. The following code categories, reduced from the original 52, were used in the analysis of the verbal interaction data.

Question Codes:

correctly asked question

incorrectly asked question

partial question

alternatives incomplete question

question omitted by mistake

 

Probe Codes:

repeat question

nondirective probe

"anything else" probe

directive probe

interviewer repeats answer

 

Clarification Codes:

interviewer gives clarification

respondent asks clarification

 

Response Codes:

inadequate response

"don't know" response

refusal

For each question, the average number of problem codes were calculated, based on the number of times the question should have been asked. Thus, questions with code categories that had high average frequencies were considered inadequate in some respect. By grouping codes in various ways, the types of problems could be identified and attempts made to diagnose their nature. Three basic kinds of problems were identified—interviewer problems, respondent problems, and problems with the questions. The possible diagnoses included problems with question wording or context, problems due to lack of understanding of the underlying concept, problems indicated by erroneous omission or inclusion, and problems of refusal.

In evaluating his procedure, Cannell acknowledged that its usefulness would be enhanced by simplification. A major deficiency resulted from the fact that a single behavior can have many causes so that the technique could not always differentiate the nature of the questionnaire problems. But Cannell concluded that the procedure had "considerable potential for use in tests to locate problem questions and to provide adequate information which will permit the study director to correct the problem. The use of the procedures may make a substantial contribution toward objective evaluation of questionnaires at test stages."

Morton-Williams (1979) used a similar but somewhat more detailed verbal interaction coding frame to evaluate a questionnaire in its testing phase. She considered it a valuable, although expensive and time-consuming, technique. To achieve an acceptable level of reliability, coders had to be highly trained, not only in the application of the specific codes but also in proper interviewing technique. However, Morton-Williams recommended that questionnaire designers code a few taped test interviews because it would help them to think precisely about the objectives of each question, the task being asked of the interviewer and the respondent, and whether the question is appropriate and the instructions adequate.

Interviewer training

The program of observation should begin at the interviewer training, even for informal tests. An observer/researcher who is confident that the interviewers received adequate preparation is in a better position to attribute difficulties in the interview to characteristics of the questionnaire or to the particular interview situation. If survey designers are made aware of shortcomings in the training, they may be able to reserve judgment on certain troublesome sections of the questionnaire.

The observation setting

It is possible that the presence of an observer in the face-to-face interviewing situation will have an effect on the interviewer's and respondent's behavior, and thereby influence the data collected. These effects can be minimized, however, by a polite but brief introduction of the observer to the respondent, and an unobtrusive manner of the observer. Usually the interviewer, after identifying herself/ himself and gaining entry to the household or establishment, introduces the observer with a simple, factual statement such as, "This is from _______________ (agency). He/she helps design the questionnaires we use."

An advantage of using this introduction is that it gives the observer a legitimate reason to probe the respondent's answers at the end of the interview based on observations made during the interview. During the interview, observers should do as little as possible to remind either the interviewer or the respondent of their presence. If possible, observers should sit so they are not in the direct line of vision of either of the interview participants. Page-turning and note-taking should be done inconspicuously, and the observer should not interrupt during the interview.

Interviewers need to be reassured that the purpose of the observation is not to judge their performance, but to see how the questionnaire affects their performance. In household interviews it is generally considered unwise to pair a male interviewer with a male observer since respondents are often reluctant to let two strange men into their homes. The topic of the interview also might make it advisable to send out observers (and interviewers) of a particular sex. Of course, when the interview is conducted by telephone or tape recorded, these restrictions do not apply. When properly conducted, an observation program for face-to-face interviews need not interfere with interviewers' schedules or delay the normal progress of the field test. Monitoring of telephone interviews can be accomplished with virtually no disruption whatsoever.

Cost considerations

The largest cost factor in an observation program is professional staff salaries. Depending on the geographic location and dispersion of the sample being observed, travel costs and related expenses for the observers may also be considerable. Otherwise, nonparticipant observation is a relatively low-cost way to improve the quality of questionnaire drafts.

References

Cannell, Charles F.; Lawson , S.A.; and Hausser, D.L. 1975. A Technique for Evaluating Interviewer Performance. Ann Arbor : Survey Research Center, U. of Michigan .

Cannell, Charles F., and Robison, Emily. 1971. "Analysis of Individual Questions." Working Papers on Survey Research in Poverty Areas. Ann Arbor : Survey Research Center, U. of Michigan .

Marquis, Kent H. 1971. "Purpose and Procedures of the Tape Recording Analysis." Working Papers on Survey Research in Poverty Areas. Ann Arbor : Survey Research Center, U. of Michigan .

Morton-Williams, Jean. 1979. "The Use of 'Verbal Interaction Coding' for Evaluating a Questionnaire." Quality and Quantity. vol. 13.