Trying to make some sense of it all

Editor’s note: Eric Weight is director of text analytics products at Allegiance Inc., a South Jordan, Utah, provider of voice-of-the-customer and enterprise feedback management solutions.

The advent of data warehouses gave businesses the power to collect, store and analyze information from multiple corporate systems in a single, high-performance environment. However, business managers were limited to analyzing only structured data. Structured data consists of tidy or fixed answers and numerals arranged in rows and columns. These data are easily stored, categorized, queried, reported and rolled up by a database. Text analytics opens the floodgates to new insights by allowing companies to analyze unstructured, free-form data in the same way structured data has been analyzed in enterprise data warehouses.

Systems that interact with customers are inherently filled with a large amount of unstructured, free-form text. For example, notes entered from a call center, open-ended responses on a customer survey and comments posted on the Internet all are defined as unstructured text. Text analytics, also known as text mining, is a technology that turns that unstructured information into structured information so that it can be properly analyzed by business intelligence systems.

One term you may have heard is ETL, or extract, transform and load. This is a technology used in data warehousing that extracts information from various operational systems and transforms it into a standardized format, then loads it into large, centralized databases. Text analytics is often referred to as “ETL for text.” Text analytics systems are built to collect free-form text from operational systems and structure it, transform it and then load it into the data warehouse in a format that is easily useable by analysts operating traditional business intelligence systems.

There are many approaches and techniques used to turn text into structured information. Each approach has varying levels of accuracy and utility. In this article, we will explore those techniques and how they can be used in combination to uncover hidden insights stored within the text.

Missing or ignoring

Currently, there is an explosion of free-form text information being generated by consumers. Studies show that as much as 80 percent of the information that is created in a corporation is free-form or text in nature. At the same time, computer technology can not accurately process and understand language in its traditional form because computers are made to simply match patterns, compare and sort. Therefore, companies are missing or ignoring a large percentage of the valuable information that could be helpful to their business.

Since the revolution of the Internet, paying attention to this type of information has become even more important. Consumers now generate an incredible amount of online content by posting comments that are publicly available to everyone. Most compelling, this is information that is not being said to the companies themselves but to the world at large.

Companies have numerous internal systems such as call centers, e-mail and automated feedback systems to gather and manage customer information. However, public Internet comments are posted for all to see, providing low-cost access to relevant customer thoughts and feelings about a company and its competitors. Businesses and their competitors can use this information to do competitive research, understand general market trends and pinpoint emerging problems early on in the product development life cycle. However, due to the free-form nature and sheer volume of this information, it is an expensive and cumbersome process to gather and understand unstructured data.

Transaction or feedback surveys typically contain one or more verbatim questions such as, “How can we improve?” or “Please describe the problem you had.” Responses to these are typically very helpful individually. But what if you had a few thousand? How would you summarize them?

For these reasons, businesses are turning to text analytics systems and technologies to automatically process and analyze text in all its forms and transform it to be utilized in identifying trends, early warning signs, product issues, suggestions for improvement and cries for help from customers.

Answering the “why”

Traditional business intelligence systems that analyze structured data are very good for statistically reporting the current state of customers and markets - sales are up or sales are down; customers are satisfied or customers are unsatisfied; this region seems to be performing better than that region, etc. Although these are important facts to understand, the key insights that are missing are why those things are happening now. Answering the “why” behind the data is typically not possible, even with investments in interpolation, modeling and statistical analysis on traditional structured data.

However, when you combine structured data with unstructured data, such as free-form replies to open-ended survey questions or comments on the Internet, you add another layer of depth that can give you a complete picture. For example, you can see what customers are saying about a poorly-performing product, why customers in a specific region for a specific type of product and for a specific time period are unhappy and what were the key issues that drove low satisfaction. Text analytics can help understand these questions.

Well-designed surveys will typically ask for customers to rate products or services, then ask “Why did you give us that rating?” or “Why were you dissatisfied with our service?” The answers to those questions provide powerful insights. However, until recently this has been difficult to analyze. Businesses have traditionally relied on verbatim coding systems where vendors or analysts manually review a random sample of a few hundred responses and then create codes to categorize them into common issues.

Although manually reviewing a sample of responses provides some level of accuracy, there are some inherent flaws in that process. First and foremost is that you are not looking at all of the data. If you have thousands or hundreds of thousands of responses, you are only able to cost-effectively analyze a small fraction of the available information. The second flaw is human bias. Whenever humans are making decisions about the data, there is always a tendency for people to respond and categorize based on the way they are feeling that day. Eye strain and fatigue also play a role in delivering inconsistent results. One day an analyst may categorize a particular issue as a customer service problem, the next day or week they may think it is more of a product problem.

In addition, customers may have complex issues that are not easily categorized with traditional coding schemes. In this case, you may need multiple interdependent codes, but that can make it even more difficult for human analysts to be consistent. All of these challenges to analyzing free-form, open-ended comments in surveys are prevalent today. Text analytics delivers the capability to automatically process and analyze large volumes of free-form text with consistency and accuracy.

Variety of methods

A variety of methods have been developed for performing text analytics. These include:

Keyword or statistical analysis: The most traditional method is keyword analysis, which uses a type of pattern recognition. A Google search is a good example of this. When performing a search, you provide some search terms to a query program. The query program searches for those specific terms in the data warehouse and returns the hits or documents that contain at least one mention of the target terms. More advanced forms of keyword analysis provide the ability to search for terms that occur together as a specific phrase or words that occur within so many words of the target terms. Although this type of text analysis is very efficient and fast, it is not capable of discerning the roles, meanings and structure of words. Therefore, if you are searching for the word “suit,” for instance, you will get results that include somebody being sued for medical malpractice and a clothing sale at the local department store.

Natural language processing: To overcome the inherent flaws of keyword searching and analysis, providers of text analytics have developed more sophisticated natural language processing (NLP)-based technologies. These systems have been around for some time but have had varying levels of success. Natural language processing requires training the computer to think like a human. In other words, training a computer to understand language the way humans understand language. This requires understanding basic grammar rules and word forms such as verbs, nouns, adjectives, prepositional phrases, etc. Once the system understands the basic structure of language, it can use that new information to derive the true meaning of words and phrases. Some of the commercially-available text analytics techniques that employ NLP are named entity recognition, targeted event extraction and exhaustive fact extraction. Each of these is explained below.

Named entity recognition (NER): NER is the process of identifying and extracting classes of entities - persons, places and things such as companies, products, people, organizations, locations, dates, etc., stored in free-form text. This technique requires that the analyst know in advance the specific entities to be extracted and then assign them to predetermined groups or classes. The resulting extractions can then be stored in a database and used to understand the frequency of mentions or broad topics discussed in the targeted source. This type of analysis is superior to keyword approaches since it is capable of using NLP to distinguish between nouns and verbs and only extract the appropriate mentions of the targeted terms.

Targeted event extraction: Event extraction is a technical term that defines a process of creating complex rules to locate and classify data based on targeted terms that are often referred to as triggers. After locating a trigger word, the rules define common attributes that occur in relation to that term. Using the suit example above, an analyst would create rules such as looking for the trigger term “sue” and then identify the plaintiff, defendant, jurisdiction and date for all lawsuits mentioned in the targeted documents.

Exhaustive fact extraction: This new method of text analytics patented by Attensity uses linguistic heuristics and patterns to discern the key facts and concepts contained in the source text. These patterns can then be universally applied to the entire corpus of text data, allowing the system to generate and exhaustive database of all available facts in one structured database. The analyst is then able to utilize traditional database queries to report on the most frequently occurring topics expressed in the text. The advantage to this approach is that the analyst is not forced to determine the problems, issues or topics to be analyzed prior to executing the fact extraction process. This means that emerging issues and new insights can be discovered in a timely and efficient manner.

Regardless of the technologies used to understand text, the analyst will need to consider many additional factors based on how the comments are generated and stored. Consider the different ways text is generated on a social media site. One is Twitter, where users are constrained to 140 characters, and hence use lots of acronyms, codes, hash tags and cryptic language. Another is a customer product review in which customers write a narrative description of their experience with a specific product. Still another is a customer survey with a directed question asking for a directed response.

These conditions present great challenges for text analytics. Although NLP technology will be required to provide accurate results, the best text analytics systems will utilize a variety of approaches adapted to the type and purpose of the source information.

Actionable insights

One of the most common things that can be learned using text analytics is when a customer expresses some sort of positive or negative emotion in conjunction with a company or brand interaction. Considering all of the things that could be expressed by customers in a free-form comment, general sentiment is relatively easy to discern since the way that people describe being upset or unhappy is universal.

However, the most valuable insights gained from customer comments are those that are called actionable insights. Actionable insights actually point to a specific condition or state within a customer’s experience that the company could have an immediate impact on, such as a specific product problem. Another example is an issue with an operational procedure or policy that causes some frustration in customers or perhaps a poor interaction with a customer service agent regarding a refund. Unlike expressions of general sentiment, these are specific types of insights that can point to specific actions a company can take to keep customers from leaving or to directly increase loyalty and satisfaction.

The optimal VOC solution will be able process free-form text to understand sentiment and identify actionable insights using natural language processing and analysis. It will help companies understand the meaning of the comments and suggestions coming from customers so that they can effectively act on them. Many systems are available today that provide a dashboard for analysis and reporting of structured data from call centers and customer surveys. For example, when reporting the results of customer surveys, the dashboard can show each one of the questions and how they were answered.

A helpful VOC solution will show the same type of analysis and reports for the narrative replies to open-ended questions. In typical customer feedback dashboards, business managers click a button to see a long list of free-form comments, with no added analysis. Or they may have a method of categorizing these using manually applied codes. The best approach is to be able to treat that text data just like structured data. This allows you to automatically process and analyze it and instantly see the top 10 issues, suggestions or reasons customers left the company, for example. This is gleaned from reviewing every single comment that was provided, not just a sample.

So much text out there

In applying text analytics to gathering customer feedback from social media, many new challenges must be considered in addition to the accuracy of the analysis. The number one challenge is that there is so much text out there, yet only a fraction of it is actually relevant to your business. Even if you use traditional keyword filtering, it is still going to yield inconsistent and inaccurate results. For example, if you were evaluating comments about American Airlines, you would find some people who say, “I flew on American Airlines,” while others say, “I flew on American.” Think of the number of matches you will find if you just use the term “American”!

To manage this challenge, VOC programs using social media need to be able to apply smart filtering techniques and select only the relevant information from the mountain of available data. Text analytics technology based on NLP can also be utilized in the development of these smart filters, but due to the relatively new emergence of social media very few are commercially available. With the popularity of social media, many text analytics and customer feedback technology providers are rapidly developing systems to overcome this challenge.

New realm

Text analytics is opening up a new realm of analysis for VOC practitioners. The best VOC solutions incorporate structured and unstructured customer feedback from multiple channels into a centralized system for analysis and action. In effect, this allows companies to do primary customer research through surveys and secondary research using NLP-based text analytics technology, then integrate both into one feedback platform. The power of text analytics will allow companies to quickly and accurately identify actionable issues and then adapt in real time by taking immediate steps that will boost customer retention, differentiate their business and quickly grow revenue.