Editor’s note: Scott Tallal is president of Advanced Research Services, Dallas, Texas.

If you’re looking for an unbiased, dispassionate analysis of the relative merits of adding digital audio to telephone surveys, this article isn’t going to be it. Voice-capture has had a profound impact on almost every aspect of our research operation, changing everything from the way we write questionnaires and conduct interviews to the way we analyze and present the results.

What was once conceived as an optional add-on to our existing service is now a standard part of almost every telephone project we do. Even if we were never to share these audio recordings with our clients, we would still want to have voice-capture available for our own use. From an analyst’s perspective, it represents a quantum leap in the ability to understand what drives respondent behavior. More important, our experience with voice-capture has revealed that traditional verbatim transcription of open-end responses from a telephone survey can lead to erroneous interpretation.

Whether interviewers use paper-and-pencil questionnaires or computer-assisted surveys, they are simply incapable of taking down a respondent’s every word. Since very few people can write or type as fast as a respondent can talk (upwards of 150 words per minute), interviewers have to mentally edit these comments during the process of transcription. At the same time, the transcription process itself interferes with the interviewers’ ability to listen closely to what the respondents have to say. As a result, the quality of probing really begins to suffer, and most interviewers fall back on generic follow-ups (i.e., "What else?" "Anything else?"). With voice-capture, interviewers are free to focus their attention on respondent comments and follow-up probes.

Despite the apparent advantages of voice-capture, there are a number of logistical and operational considerations. The sheer volume of digitized data dramatically increases the demand on storage systems, beginning with data collection and continuing through every phase of analysis, presentation, and delivery. There are also several other hardware issues to address, and the additional analytical requirements are daunting to say the least.

Managing massive amounts of audio

Even at the lowest-quality setting (which is fine for voice recording), digital audio consumes mass quantities of hard disk space, roughly 40MB per hour. For 600 respondents, a voice-capture survey which records just one minute of audio from each requires close to half a gigabyte of data storage. Two minutes per survey starts to max out most computer networks; three minutes would bring most older networks to their knees. This, of course, assumes the interviewers are working on only one study at a time - running two or more projects simultaneously requires either very limited use of voice-capture or a RAID storage system which interconnects several large hard disks to one network server.

As very large hard drives and RAID systems have only recently become available (and affordable), over the years we have been able to devise a number of strategies to limit system demands. First, we’ve been able to determine which voice-capture questions are most effective and which ones aren’t; voice-capture should only be used on questions where listening to the recorded answers will be most productive for both the analyst and the client. Second, it’s not always necessary to ask every voice-capture question of each individual respondent; in many cases, it’s much more effective to limit follow-up probes to only those respondents providing specific answers to certain questions.

That’s why we frequently program our questionnaires to skip past voice-capture questions unless very specific conditions are met. For instance, many of our surveys focus on viewer reaction to local television newscasts (and the people who present them). If we (and our clients) want to determine what viewers like best about the local newscasts on a given station, we would ask this question only among those who’ve actually watched that station within the past week. If it’s been a month or more since they last watched that station for news, their answers may be driven by very old perceptions. Granted, there may be some value in exploring these older perceptions, but our experience has shown that value to be quite limited.

Most of our surveys also ask viewers to rate upwards of 20 of the people who present their local news, weather, and sports. For most personalities, the bulk of the reaction is either neutral or only mildly positive; very few generate a significant number of strong positives or negatives. We’ve historically found little value in using voice-capture to probe these middle responses, so we only ask voice-capture when a respondent feels strongly about a given personality, one way or the other.

By listening only to those with a strong positive response, our clients are able to develop a marketing campaign that emphasizes the qualities viewers really like. At the same time, they use the negatives to help coach that person’s on-air performance. We also usually limit voice-capture probes to only those personalities employed by our client station. It might be interesting to know why viewers like (or dislike) someone on a competing station, but there’s ultimately very little our client can do with that information.

Finally, in developing the initial specifications for the voice-capture software, we asked the development team at Creative Research (the Petaluma, Calif.-based makers of The Survey System, the interviewing software we use) to give interviewers the option of automatically erasing a specific response without saving it to the hard disk. With or without voice-capture, the answer to many open-ended questions is frequently nonresponsive, answers which (because of voice-capture) would waste valuable hard disk space. With The Survey System, once a question has been asked and answered (and probed for clarification as needed), on-screen instructions force the interviewer to decide whether or not the answer is worth keeping.

Together with the other techniques already addressed, this allows us to ask upwards of 20 voice-capture questions per survey, yielding approximately 15 full hours of recorded audio (which requires just under 600MB of disk storage).

Perfecting the art of the interview

Beyond the logistics of data storage, voice-capture can significantly impact on the quality of interviews. Clients will have a chance to hear every interview conducted for their survey, good and bad.

For a research firm, it’s one thing to spot-check the quality of an telephone interviewing staff; it’s quite another to have the opportunity to listen to each and every one of their interviews. Voice-capture gives research firms a golden opportunity to train their interviewing staffs. Since all of their interviews are recorded, company supervisors can monitor all of their work and sit down with them after each project to review and ultimately improve their performance.

Listening to respondent comments has convinced us that conversational interviewing makes the respondents much more likely to share their opinions, offering answers which are much more articulate and detailed.

Voice-capture has also prompted us to change not only the language we use within a questionnaire (to make it more conversational as well), but also the overall structure of the questionnaire itself. Being able to actually listen to respondents talk has caused us to rely more and more on open-ended questions, to the point where they now account for up to half of our voice-capture surveys. Of course, we still make very heavy use of multiple-choice questions as well, but increasingly these questions are designed and used as setups for the verbatim. In many cases, voice-capture has even prompted us to change the language we use in the multiple-choice setups.

Case in point: Prior to the use of voice-capture, we used to ask respondents to rate television personalities using a 10-point scale. At the time, the choice seemed logical; thanks to the Olympics, the use of a 10-point scale to rate an individual had become part of the vernacular. Using then-standard data collection techniques, we would first ask respondents to rate each personality on the 10-point scale, then ask them to explain why they rated that person that way.

Transcribed and then printed out on paper, these "personality" verbatims appeared to be just fine: they looked more or less like every other verbatim report we had ever seen. However, once we started using voice-capture, we started to realize just how badly we (and our clients) were misinterpreting these respondent comments. Much of the meaning is conveyed in how people say things, and that’s something which is impossible to transcribe (in fact, after hearing voice-capture for the first time, most clients realize that they’ve been misinterpreting verbatim printouts, putting their own spin on respondent comments).

We also found that the 10-point scale wasn’t quite as effective as we’d once thought in setting up the open-end probes. The verbatims may have looked fine on paper, but in listening to them we felt there was something missing. Our television station clients want to know two things from the personality comments: how to best market this person to the audience, and how to best coach this person to improve his or her on-air performance.

By restructuring the setup question, we now get verbatims that are much richer and more actionable for the client. Instead of the 10-point scale, we now ask viewers to categorize personalities into one of three groups: those they like to watch, those they don’t like, and those they aren’t affected by either way. In retrospect, this may seem to be a more obvious choice for such an evaluation, but we might have never come to this realization without voice-capture. This revelation caused us to rethink every other measure used in our basic boilerplate, even those which do not set up an open-end.

Overcoming the obstacles of data delivery and analysis

Even after we made the switch to voice-capture, at first we continued to code and analyze open-ends the way we always did, hiring a coder to categorize the general subject(s) touched on within each comment. Of course, in the absence of a printed transcription it was initially deemed too time-consuming for a top-level analyst to listen to all of the verbatims, so we eventually had the coders prepare brief write-ups summarizing and excerpting the responses to each verbatim question. It wasn’t long before we realized how much we were losing by not conducting a more thorough analysis.

At the same time, we were just beginning to make our first voice-capture presentations to our clients. Using the same software we used for data processing, we took time during the presentation to enter all of the individual parameters for each specific question to be played back for everyone to hear. Of course, since we used this same software every day, it only took us a matter of moments to call up the answers to a specific question; however, clients had difficulty accessing the comments in the weeks following the presentation. We also found that during our presentations, after hearing the first few comments made in response to each question, the clients soon tired of listening to rest of the comments. It was obvious that they would never take the time to review all 15 hours of recorded audio.

The solution to both problems turned out to be a very expensive proposition: the software would have to be revised specifically for client use, and we would have to dedicate a top-level analyst to the job of verbatim analysis. In addition to coding, this person would write up a much more in-depth analysis, at the same time choosing what would amount to an executive summary of the most articulate and representative comments. This written analysis could then be incorporated into the final report, and the clients would still be able to hear a selected sampling of what their customers had to say. Unfortunately, it takes even an experienced analyst about four hours to analyze one hour’s worth of comments: going this route meant we’d have to add 60 hours of high-level analysis to each voice-capture project (on top of the software revision costs).

Along the way, we also had the developers add new features to the software package, the most significant of which lists certain respondent characteristics on-screen while each comment was being played. This gives us (and the client) a chance to "see" who’s talking. With the new software interface (which allows the analyst to pre-program these respondent characteristics to be identified during playback), it’s no longer necessary for either the presenter or the client to manually enter these specifications each time he or she wishes to hear a verbatim. All of the comments (and the software required to access them) can now be transferred onto an inexpensive multimedia CD-ROM. Now, it’s a matter of simply loading the software, then using Windows’ point-and-click technology to trigger playback of the verbatim you wish to hear.

Getting the presenter out of the way

Perhaps it’s a phenomenon unique to our work in television, but it’s not uncommon for us to be delivering a project to a client who really hasn’t bought into the research process. Management/production executives may have ordered the study because they want to improve their ratings, but the producer/creative types don’t always want to hear what we researchers have to say. Maybe it’s a matter of the "right-brainers" feeling uncomfortable with what is essentially a "left-brain" exercise; perhaps they don’t fully comprehend the statistical validity of the process. They may even feel that, as researchers, we can’t truly empathize with the creative and artistic pressures involved in putting a program on TV. Whatever the reason, as far as they’re concerned, we might as well be wearing lab coats when we show up.

Once again, voice-capture provides the solution: It helps get the researcher out of the way of the data, giving the producer/creative types a chance to hear directly from their audience. Some might argue that they’ve always had that opportunity with focus groups, but many such clients almost invariably leave a focus group a) hearing only what they wanted to hear, and/or b) wondering just how representative group comments really are. Voice-capture eliminates this problem by providing a proportionate, random sampling of the audience as a whole.

Clients get to hear directly from their "customers" without worrying that the researcher might be putting his or her own interpretation on the data. It’s remarkable to witness first-hand just how quickly the walls begin to break down with voice-capture, when previously difficult clients open up to what the audience has to say.

SIDEBAR

The legal and ethical issues surrounding the use of voice-capture

Intrastate calling falls under the direct jurisdiction of individual state regulatory authorities, which vary from state to state. However, calls made across state lines are regulated by the Federal Communications Commission (FCC). An FCC attorney has advised our company that since only one party to an interstate call must grant permission to record the conversation, the interviewing firm can be the granting party. It is not necessary for us to ask the respondents’ permission to record (readers are advised to obtain their own legal counsel on this matter). However, as an ethical consideration we routinely ask all of its respondents if they are willing to permit the use of a computer "to digitally record your answers to certain questions, just to make sure we get down exactly what you have to say." Since we first launched the use of voice-capture five years ago, the cooperation rate which is consistently in excess of 90 percent.