Listen to this article

Cast the right net on the Net

Editor's note: Paul Oram is partner at the London office of research firm Antedote.

Facebook has over 1.49 billion monthly active users worldwide and every 60 seconds, 510 comments are posted and 293,000 statuses are updated. On Twitter, over 500 million tweets are posted a day. We have all seen statistics like this. Most of us will also have heard the logic that flows from this: These millions of unfiltered social conversations are a treasure trove for researchers seeking insight into consumers’ lives, opinions and behaviors – all you need to do is tap into them, run some smart analytics and away you go.

In the real world things are painfully different and in practice getting genuine insight from social media listening is much more challenging.

Harvesting social data is a bit like trawler fishing; you cast a big net out into a wide expanse of ocean in the hope of landing a good catch. If you use a net with a fine mesh you catch pretty much everything that swims, the vast of majority of which is useless to you. Most social data tools cannot distinguish between content that has originated from a consumer and content produced by a brand, or worse still, a bot. Analysis platforms have a long way to go before they can accurately classify the authors of social content. It’s only as you manually pick through your catch that you find how much of it is total junk.

Go with too wide a mesh and some really fine specimens that just happened to be a bit too small will be missed and you’ll never know they were even there. This often happens when trying to research broad subject areas, for example “snacking.” People talk about subjects like this in many different ways and getting a good sample out of the noise becomes a neverending quest to refine keywords, products and brands while fighting to exclude all the usages that don’t fit with your target.

Lack context

A lot of social posts lack context or, rather, the majority of social listening tools are not designed to retrieve it. For example, in a lot of tweets the author expresses his or her opinion and links out to a piece of content. Unless you follow that link and assess what the content is about you cannot really tell if it is relevant to your research.

A frequently-used item of context is geographic location. Let’s say you want to focus a study on U.K. consumers. You might set your social listening platform to only include content originating from within the U.K. The problem is, only a fraction of social posts are geo-tagged so you’re missing out on lots of relevant content. Listening platforms use various techniques to try and work around this; for example by extrapolating from a single geo-tagged tweet to the assumption that other tweets come from this place. Other platforms dig into the author’s bio and if the author has specified a location they assume everything they post is from there. These are big assumptions and must be factored in when working on studies that have an important geographic component.

What about the data sources themselves – the social channels? Can we consider them representative? For obvious privacy reasons, social analytics platforms can only tap into content that has been shared publically or channels under brand control (for example, its Facebook page). The problem is that people share a lot more privately (“dark social” as it is sometimes termed) than publically – reportedly around 70 percent1 is shared privately. The subjects that people share publically are also heavily skewed; people are fine publically sharing posts about their pets but are much more reticent in almost every other content category. All too often public content is not coming from people but from brands and bots.

Listening platforms are almost entirely geared towards text-based searching and analysis. A lot of sharing, particularly among younger generations, involves a high percentage of image and video sharing. If the poster provides enough context in any associated description or tags it’s sometimes possible to pull images from the stream. They can be useful for providing qualitative color to a study. However, until image search and analysis algorithms improve significantly this type of content is largely a dead zone for social media research at scale.

When you do manage to land a good catch of healthy, tasty-looking fish/social content, how do you sort through it all? Most platforms are fine at counting how many times your brand has been mentioned – assuming you’ve got an easily distinguishable brand name like Heineken. Try doing that if you’re Next! Does counting how much you’ve been mentioned actually tell you very much? It can occasionally be useful for benchmarking purposes but in most cases it’s a vanity metric.

You probably want to know if people are saying good or bad things about your brand or product. Most tools will sport some form of sentiment analysis which categorizes posts into positive, negative or neutral. The reality is that the error rates on such tools are tremendous and it’s not just because of the classic issue of detecting irony and sarcasm; frequently there simply isn’t enough context in what the person has said to be able to distinguish sentiment and even humans struggle to place content into such simple polarity categorizations.

Some social-listening platforms claim to be able to go deeper than simple polarity categorizations through the application of machine-learning techniques. They purport to be able to classify posts into categories such as purchase intent. Trouble is that in our experience, they only really work if you have neat, mutually-exclusive categories each with distinct language patterns. You will rarely find yourself in this enviable position.

Useful for research?

With all these challenges and limitations, is social media-listening useful for research? Yes but you need to be selective about the business questions you seek answers to and where you go fishing.

Generally, the more specific the question the better as it is usually much easier to craft the query to pull a good signal out of the noise. Self-evidently you need to be going after something that a decent number of people are talking about on social channels. For example, understanding initial reactions to a major new phone launch is likely to yield good results. Even in a domain like this the challenge of filtering out brand and promotional message should not be discounted – bots are designed to exploit anything that garners a lot of engagement to spread their own unrelated messages.

The art of successful social listening is identifying channels and Web properties that provide high-quality social content. Forums, for example, typically provide categorizations and tags that are useful for getting context and metadata on the user-generated content and the author. It’s critical to have an approach that is capable of gathering this context and making it available for analysis.

As you would expect, where people write longer-form content they express more complex and nuanced opinions, which can yield useful insight. The challenge is how do you break such posts down into the entities and concepts to uncover what things people are talking about and what they are saying about each thing? Natural language processing technologies are now capable of addressing this challenge at scale, enabling you to discover patterns and insights across large volumes of conversations.

At the risk of stating the obvious, any business process that has a social component – be it your own or a competitor’s – can usefully have social listening applied to it. Customer service, promotions and content marketing efforts are clear targets here. What are your customers complaining about? Which elements of competitor customer service are they raving about? Which of your competitors’ social media campaigns are driving engagement from your target segments? What types of content are customers like yours sharing? These are all good business questions to be asking of social-listening research.

Focusing on the authors of social content can sometimes be a much more interesting avenue of research. You don’t need to pull things out from the social fire hose based on a target topic. You can instead select your target audience – for example, a large group of your existing consumers – and analyze everything that your audience has publically shared over the last year. This is a great technique for getting a deeper understanding of the digital lives of your consumers, how they break down into discrete communities and what influences them.

In summary, if you going to use social media successfully for research consider the following:

  • Frame the business questions as specifically as possible. If they are too broad you’re better off going back to the drawing board.
  • Identify and focus on the cleanest, most context-rich set of data sources where consumers are talking about your target topic of interest.
  • Be mindful of the filtering and skews that using social content inevitably introduces.
  • Figure out how you’re going to analyze the content for qualitative insight. Do you have a platform that enables to you explore the meaning of what people are saying?
  • Find ways to get more context around social content – follow links to understand what content is being shared and dive into author profiles and classify them.

 

REFERENCE
1 “Dark social: We have the whole history of the Web wrong.” The Atlantic. October 12, 2012.
www.theatlantic.com/technology/archive/2012/10/dark-social-we-have-the-whole-history-of-the-web-wrong/263523/