The face-to-face interface

Editor's note: Stephen Turner is chairman of Fieldwork Inc., a Chicago research company. He is based in Honolulu.

The Internet has, in a few short years, redefined our abilities to reach diverse segments of people where they live and work either as a batch or real-time process. Thanks to consumer-level proliferation of broadband connectivity, Webcams, smartphones and the like, we can conduct complex interviews with dispersed and even rare samples of respondents using audio and visual communications of considerable fidelity. There is no question that this is a great step forward for our discipline.

But, in the midst of this rush to capitalize on the efficiencies of digital research, I want to cast some words of serious caution. My intent is not to denigrate the Internet as a research tool but to remind the reader that it does not erase the need to gather face-to-face data in pursuit of understanding human beings. My thesis is that our work isn’t fully done until we sit across the table from those we wish to understand – physically in their presence as we engage in discourse about their needs and interests.

Not the first time

This is not the first time, incidentally, that our industry has encountered such issues. In the first half of the 20th century, opinion polling (the forerunner of marketing research) was conducted largely by canvassing sampled neighborhoods on foot, with rigid rules about which households you should stop at and with whom you were to speak when you got there. But the efficiencies of mail and telephone surveys were too seductive to continue relying solely on a face-to-face approach.

Furthermore, it was clear almost from the onset that mail and phone studies each had its own set of limitations and biases. Mail surveys allowed you to provide visual stimuli but you had little control over who answered, when and with what sorts of preparation. Additionally, it was all but impossible to stop people from backtracking or otherwise distorting the order in which they answered questions. Phone surveys solved some of these problems but had their own issues to contend with – no visuals, for example, unless they were distributed beforehand. But more troublesome was the temporal imperative to answer the questions in relatively short order whether you understood the question or had the wherewithal to answer. Each approach had advantages but left something out in the process.

An extraordinary tool

In comparison, the Internet is an extraordinary tool. It can be used in so many different ways – from analyzing content that flows on its own (e.g., blogs, reviews, social media) to various synchronous and asynchronous querying techniques from chat boards to online focus groups, which simulate face-to-face encounters with considerable precision. And with today’s smart mobile devices, respondents can take the interview with them – into their homes where they’re comfortable or to the store where they can describe what goes through their heads as they weigh their options.

However, like all of the new techniques before it, there is still an important body of information left on the table, even as our technological skills bring us closer to the experience of face-to-face communications. My sense is that Marshall McLuhan’s phrase, “The medium is the message,” is as relevant to Internet-based research as it was to the advent of TV when McLuhan’s Understanding Media: The Extensions of Man was published in 1964. Indeed, I am absolutely sure it is.

A social animal

Man is a social animal. Parts of our brains have evolved over many millennia to attend to communications that take place on levels other than verbal. There are thousands of scientific articles attesting to the fact that a lot of what we “say” to each other is transmitted via all of our senses in ways so nuanced as to defy verbal recognition.

Moms and infants communicate with each other long before language forms for the little one. Adults recognize others’ dispositions and moods instantly without knowing exactly how or why. Experts in nonverbal communication tell us how to recognize when people are lying, just nervous or, perhaps, romantically inclined. In Malcolm Gladwell’s Blink, he describes peoples’ abilities to make good decisions instantly even when high-level cognitions tell them to do otherwise – decisions made on the basis of nonverbal communication.

The effects of nonverbal cognition are an important part of social and personal understanding. Today’s brain scientists tell us that we echo the appropriate emotions of others whom we are watching. We experience genuine fear and excitement and sadness and anxiety as we observe others in situations manifesting those emotions. So it happens that group behavior ebbs and flows with a rhythm that is interconnected among group members in ways that cannot be explained as an aggregate of individuals’ isolated thoughts and feelings. Say what you will about groupthink, the truth is that we think and behave as groups in real life. We are an inherently social species.

Taken in context

When you have a group discussion in a face-to-face environment, interactions take place on an entirely different level than they do in an Internet-based focus group. Interactions in digital groups take place on the basis of literal interpretations of what is being said. Interactions in a face-to-face group are based on literal communications taken in the context of a continuous flow and interpretation of descriptive metadata (having to do with how the information was delivered) – information our brains have come to understand and over the past million-or-so years. Such modifications often yield very different sets of messages.

I’m not trying to say that moderators, even the best of them, are extraordinary in their ability to read all the subtle cues that emanate from face-to-face encounters. What I’m saying is that all of us are hardwired to do this. A good moderator is perhaps better tuned in to such vibes than most. He or she may not be a studied expert in turning covert communications into overt messages but has learned how to use that underlying current of information to arrive at a deeper sense of what someone is really trying to say.

A good moderator thinks carefully about the literal meaning of what someone is saying, modifies that literal meaning in the light of nonverbal cues and then – this is the key – asks the respondent to clarify the extent to which the moderator’s interpretation of what was meant fits or doesn’t fit the respondent’s intended message. This is an iterative process and dramatically more effective in person than online.

Furthermore, and just as importantly, this same process is going on with everyone who is party to the conversation. Because we are human and because our brains are designed to attend to the flow of emotive cues that surround individual pronouncements, our reactions to what someone says in our presence are continuously being modified and do not always track with a literal interpretation of what was actually mouthed. Despite the efforts of even the most rigid of moderators, those reactions enter into the dynamics of all group discussions.

The metadata

People posture. They do it all the time. They do it to convince others and themselves that they truly are the person they project. The interesting thing is that we’re often more capable of – or at least willing to – acknowledge the pretenses of others than we are our own. Focus groups (as well as 12-step programs) make use of the fact that, as social animals, we sense when other people are misrepresenting themselves, perhaps because we are familiar with the same pattern in ourselves but also because we are all expert at attending to nonverbal cues that accompany the communication. Furthermore, it doesn’t take long in a physical group setting for members to press each other to explain discrepancies between what they literally say and what they seem to be saying when one takes into account the metadata.

Unfortunately much of that metadata is missing when we interface digitally. In a face-to-face setting, micro-expressions that would never be seen on screen are readily apparent, such as eyes rolling back, a one-sided sneer, a particularly-intense rather than off-handed delivery – all things that fine-tune the intent of one’s words. Body language is vivid. Hand gesticulations, submissive posture, a slight turn away from the listener or a cock of the head all add shades of meaning. Sighs, snorts, giggles, huffs and murmurs emphasize the emotional components of a viewpoint. Indeed, a host of marginal cues that may not be seen or felt in an Internet transmission are obvious in a face-to-face setting – perhaps to be considered at a low level of consciousness but nonetheless pertinent to interpreting an intended meaning.

And I’m only mentioning here things that are overtly apparent. Some communicators precluded from Internet representation entirely (e.g., odors, flop sweat, etc.) can modify how we interpret what literally spills from the mouth. Smell-O-Vision has been talked about for years but it isn’t here yet.

Lost in transmission

It’s not just odors, of course, that are lost in transmission. The sensing devices that feed remote interviews are, by-and-large, fixed in their focus. Cameras are generally trained on the face and upper torso and rarely offer acute details of either. Microphones also tend to be focused to filter out extraneous noise, blocking metadata in the process. Because almost all transmissions involve duplex communication, there is very little by way of useful sidebar information.

In a traditional, face-to-face focus group, I can direct my visual or audio focus anywhere I want at any time. If there’s a sidebar event taking place, I can divert my attention from the primary conversation to the sidebar (and I can assure you that sidebars are frequently more interesting and relevant than the primaries). Doing this is all but impossible in a digital encounter where sidebar information, if present, is generally too indistinct or garbled to track.

A complete story

So, it happens that 40+ years of conducting marketing research studies of all types have convinced me that face-to-face inquiry is an essential part of truly understanding peoples’ thoughts, feelings and dispositions toward the products, services and communications we study in our work. To be sure, we can get a huge amount of reliable and valid information by carefully collecting data via the Internet but we won’t have a complete story until we lace in some of the richness that comes only from sitting down across from someone in the physical world and talking things through.

The problem, of course, is that face-to-face work is expensive and time-consuming. It is especially difficult when you need to talk to people who are geographically dispersed. Still, I believe leaving out face-to-face work entirely is equivalent to the drunk who looks for his lost watch under a streetlamp because that’s where the light’s best.

What I wish to advocate here is that, as an industry, we develop hybrid approaches to research that include components suited to digital research in addition to substantial face-to-face work.

A goodly number of our clients are already marrying digital and face-to-face approaches that transcend the sum of their parts to create new avenues of understanding. The surface is just being scratched, with new ways of using smartphones and tablets to gather personalized observations and bringing those observations into face-to-face settings. I believe these approaches have enormous promise for illuminating peoples’ attitudes and motivations.

Not just words

But no matter how elegantly it is done, no matter how closely the medium mimics reality, I remain convinced that if you don’t spend a good deal of time and energy on thoughtful discourse in the physical presence of your customers, you are never going to understand exactly what they are trying to tell you. Articulateness is not just a matter of words. It also comes from the way words are packaged and no emoticons – no matter how clever – can achieve the warmth of face-to-face interaction.