Listen to this article

Editor’s note: Seth Grimes is founder of Washington, D.C. – based IT strategy consultancy, Alta Plana Corporation. This is an edited version of a post that originally appeared here under the title, “12 criteria for choosing a text/social analytics provider.”

criteria checklistI included a set of tool and solution selection criteria in a recent conference presentation titled, Text and Sentiment Analysis for Research and Insights. The criteria apply across a wide set of text and social media analytics applications. While you have dozens of options – open source and commercial – ranging from Web service APIs and code libraries to business solutions that integrate analysis into a business workflow, it is important to make a careful, informed choice. Given these factors, I recommend 12 criteria for choosing a provider.

But first, some preliminary advice: Work back from your business goals. Determine what sorts of indicators, insights and guidance you’ll need. No business is going to need 98.7 percent sentiment analysis accuracy in 48 languages across a dozen different business domains. Be reasonable; stay away from over-detailed requirements checklists that rate options based on capabilities you’ll never use. Create search criteria that separate the essentials from the nice-to-haves and leave off the don’t-needs. Then design an evaluation that suits your situation – include proof-of-concept prototyping, if possible – to confirm whether each short-list option can transform data relevant to your business into the outputs you need, with the performance characteristics and at a cost you expect.

That advice out of the way, here are 12 criteria for choosing a text/social analytics provider:

  1. Industry and business function adaptation. We seek solutions built around the frame semantics notion that words may have different senses in different domains and from different points of view. Thin is a good example: Thin is good for a mobile phone, while in a hotel, thin walls mean a noisy room and thin, describing sheets, is associated with worn rather than warm. Responsive also means very different things in e-discovery vs. customer service.
  2. Customization, whether by you or only by the provider, to ensure that your analyses are true to life. Analytics tools need training to understand brands, products, packaging and attributes. Domain adaptation may not be enough, if your company, customers and prospects use distinctive language around product and features. For example, Coke Life and Pepsi True are two products. If a product you’re evaluating hasn’t been adapted for soft drinks, it may miss or mis-classify the social mentions of life and true, which after all are common words. Can you build/modify the rules, taxonomies, training sets and other artifacts that drive marketing analyses in that category, to capture the way people talk about these brands? If not, you’ll be missing insights.
  3. Data source suitability. The best algorithms for extracting information from 140-character tweets; long-form Yelp or TripAdvisor reviews; FlyerTalk discussion threads; and e-mail or chat exchanges will differ. Does the tool you’re considering handle a data source that’s important to you?
  4. Languages supported. Some tools handle only a single language, typically English. Others claim to handle dozens but you may find that some languages are handled much more carefully than others. Your provider may translate material from less-common languages into English for processing, in which case idiom and culture-related nuance may be lost. And even when non-English material is handled natively, it may be handled with less refinement. Ensure that the languages you need are supported, adequately for your needs.
  5. Analysis functions provided. I’m thinking here both about information extraction – for instance, lots of software will resolve entities or topics but not necessarily both, and they’ll score sentiment but not necessarily at a topic, entity or attribute level – and about analytical functions such as clustering, regression (for trending) or link/path analysis.
  6. Interfaces, outputs and usability. A graphic interface is … nice but does the candidate tool’s GUI match your work practices? If you need to automate a frequently repeated process, are there scripting possibilities? Or if your developers will be plugging an external Web service into your own software via an application programming interface (API), is there a software development kit (SDK) that fits your coding tools (e.g., Python, Java, C++, Ruby) or are you forced into a generic RESTful interface?
  7. Accuracy: precision, recall, relevance and results granularity. Even if there are no measurement standards, you still need good-enough accuracy, not some unobtainable absolute. We’re dealing with human language data. If you can read the inputs (or a colleague can if they’re in a language you don’t read), you can assess a candidate tool’s accuracy for yourself.
  8. High performance. You need speed, throughput and reliability.
  9. Provider track record, market position and financial condition. It is best if you can find a provider with experience solving your types of problems, working with the sorts of data that matter to you. That’s obvious. I’ll bring up two, more-specific points: First, while there is no dominant text or sentiment analysis provider, there are well-established players but some of them are struggling. One element is that their platforms are not aging well – rule-based NLP is particularly expensive to maintain – and another is that expansion plans, fueled by venture funding, have proven to be over-ambitious. One leading customer experience text analytics provider with both issues has burned its partner network and lacks capacity to implement newly sold projects. That company’s former chief rival went through the same experience just a few years back. Second, there’s a constant stream of emerging tech providers, particularly given machine learning’s power and promise, and at the other end of the range, several tech giants – IBM, SAS, HP – in the space. Expect consolidation.
  10. Provider’s alliances and tool and data integration. This criterion is a bit tricky, and in most cases, I’d prioritize it lowest of the twelve listed. The reasoning here is that many of your projects will apply data from more than one, if not several, sources. We talk about omnichannel marketing and about the customer journey, which involves the multiple touch-points, each of which may generate data. But a given organization may use different research and software providers for surveys, social listening, device-recorded data, customer relationship management (CRM), sales and loyalty programs, for instance. Many solution providers recognize that their clients have multiple vendors and may form alliances with their rivals, in a spirit of cooperation, in order to better serve shared clients. And even if there’s no alliance, they can take steps to facilitate integrations, whether via specialized connectors or data import/export using common interchange formats.
  11. Cost: price, licensing terms and TCO. Text and sentiment analysis provider pricing models vary widely. Shop around. But also, going back to criterion number two, consider total cost of ownership. You may find that to get the result accuracy you require, you will need to customize or extend the baseline lexicons, taxonomies, rules sets and search expressions provided by a candidate supplier. That means professional services costs or training and staffing expenses if you do the work in-house.
  12. Proof of concept. Try before you buy, using your data, in a proof-of-concept prototype that produces output samples sufficient to demonstrate whether a candidate tool or solution can deliver the insights you need. Again, we’re dealing with human-language data. It should be clear whether accuracy and performance meet your business-goal-driven needs.

 

These criteria are offered as a guide. Each situation is unique and each organization has its own priorities and must-haves. So there are no cookie-cutter evaluations in the selection of a text or social analysis provider, or in other vendor and tool selection processes. Just remember the first principle: Work back from your business goals. Keep outcomes in mind and design an evaluation that suits your situation and you will choose well.