Filter out the problems

Editor's note: Martin Pacino is senior director of research insights at The MSR Group, an Omaha, Neb., research firm.  

As it has the very fabric of the human experience, the Internet age has profoundly changed the practice of market research. In just the last two decades, the advent of online data collection has ushered in a rate of evolution of the science not seen since the telephone became as household object. Beyond reshaping the market research industry, online research has spawned dozens of new ones – DIY tools, analytic packages, programming software, e-mail deliverability, etc. – and none more ubiquitous than that of panel sample providers. 

For anyone unfamiliar with the panel industry, hundreds of companies have sprung up almost overnight and engaged the global populous to sign up to take online surveys and tens of millions of people have answered their call. What’s more, since there is no reliable source for procuring or randomly generating e-mail addresses in bulk, it is among these people alone that online research is conducted. 

With panel sample comes great opportunity but also great risk to data quality. Here is an examination of how panel companies function, how that function lends itself to potential trouble and exactly what you as a purveyor of market research data can do to protect it. 

Gather constant information 

In the process of enrolling and surveying their members, panel companies gather constant information about them. This information, called profiling data, is used to determine which respondents best fit a given survey. Procedurally, a full-service market research company enlists a panel provider to do a survey among a given population. The panel company uses its member-profiling data to assess how many surveys it can complete among the target population (feasibility) and at what cost per interview (CPI). Both sides shake hands. Someone programs the survey online, provides the panel company with the link to which it can direct its members and voila! We have a survey in-field, gathering data. 

There are two primary ways in which a panel company can drive respondents (traffic) to a given survey. 

Direct sampling: This is how most people envision panel companies working and for many, it is the primary way in which they do. This involves directly inviting its panelists via e-mail or through access to a portal with several surveys in need of respondents. These panelists are subject to an opt-in process and are generally considered to be the best-quality sample that a panel company can provide. They have been vetted and are subject to periodic validity checks.

River sampling: These respondents are recruited in real time from an ad or post on the Internet to take a survey. They are generally subject to some measure of identity verification, though less strident than an actual panelist. Thus, they are generally a less reliable respondent. In most cases, upon completion of a survey, they will be invited to join the panel for whom they have just completed the survey. This is one of the primary ways in which panel companies grow their membership.

What could be considered a third avenue for driving traffic are the ongoing and temporary partnerships that a panel company will form with another one to meet a given survey objective that they otherwise wouldn’t be able to. These can include acquiring sample in a country where they have no resources or bolstering their completes to meet a target sample size, for example.

Inherent foundational problem 

This process of a market research company engaging a panel provider to sample their survey sounds perfectly reasonable, however it carries an inherent foundational problem. In order to attract people to join their panel, and continue to take surveys, the panel companies have to provide them an incentive, usually a monetary one. That might not seem problematic but here is where things get precarious.

The profiling data that panel companies are collecting on their members isn’t just used to match them with appropriate surveys, it is also used to determine how common they are within the general population (their “incidence”). The less common they are, the more they need to be compensated to take a survey. If a survey requires 500 females 18+ years of age, the panel company will have many people who meet that criteria. It will only need a small percentage of its members who meet that profile to participate and still complete 500 surveys. They are a “high incidence” population, therefore the incentive required to complete the study is relatively low. 

Let’s assume that the CPI for that population is $X. If that same study required 500 females 18+ years old who fly commercial aircraft for a living, their incidence is much, much lower and therefore the incentive needed is higher (in this case significantly) to maximize the percentage of people who will take it. Where a sample of 500 females 18+ is $X per complete, the cost of females 18+ who fly commercial airplanes for a living could be $30X, and that cost is passed along to the market research company commissioning the study. 

Were the world populated by robots programmed to only tell the truth, this entire approach would work fine. Everyone would get accurate data at a price commensurate with whom they are trying to survey. However, panel members being actual human beings, and often aware of this relationship where low incidence = high incentive, the result is a sampling structure with a baked-in motivation for the respondent population to lie about who they are. What’s more, this process also creates a motive for panel companies to not expose them. After all, the more people they can muster from low-incidence populations, the more they will be paid to sample a survey thereof.

Envision an average Joe who works as a bartender or realtor or any number of jobs held by millions of others, globally. Joe doesn’t have a particularly unique hobby, he’s not wealthy, he doesn’t have any commercial decision-making authority and nothing really distinguishes him from the masses as a survey target. However, Joe knows he can get paid to take surveys online and he understands that the more unique he can appear, the more money he can make. When joining a panel, perhaps Joe eschews bartending and says he’s a surgeon. Rather than report all of his traits accurately, he makes himself a low-incidence target on every possible dimension – an uncommon ethnicity, a luxury-car owner, the sole decision maker for medical supply procurement and so on. Where real Joe would have commanded maybe $1-$2 per survey, fake Joe is now at an incidence within the population that could earn him upwards of $200 for his opinions.

Let’s assume Joe creates 20 profiles, all with different but similarly low-incidence traits. Joe then gets turned onto this technology called Internet bots and goes from being 20 people to 2,000. Suddenly the sample population for this panel is so full of Joes that the odds of its data reflecting actual market reality are very long. Add to that the number of different people who do the same thing Joe does and that data gets precipitously more suspect. 

Joe is what is commonly known as a professional survey-taker. Indeed, that is now a job title in our lexicon and they are gumming up the works for many companies and all of the reasons they may employ survey data to make decisions, which can extend beyond just business needs. What if the organization sponsoring the survey is, for example, the Leukemia and Lymphoma Society and it needs to know how doctors are prescribing medications to best treat gravely ill children? Can we trust Joe to self-censor and his already broken moral compass to resist the money he can make by offering his completely unqualified opinion for such an endeavor?

In fairness to panel companies, many are dedicating untold resources to not only eradicate professional survey-takers and cheaters from their panels but also to ensure they are never able to join in the first place. Approaches like double opt-in enrollment and constant re-profiling/auditing of their panelists can be highly effective at cleaning up data. In fact, many panel companies have (wisely) made their approach to preserving data quality a central tenet of their brand promise and competitive position. There remains, however, the persistent problem that while Joe may be bad for data, he can be good for business and he is aided by evolving technology that makes him more evasive and sophisticated in his methods. 

While this veritable arms race escalates between panel providers out there and those who would defraud them, many market research companies and their clients are left powerless to hope the partner they’ve selected to collect their data does everything they can to ensure it’s clean and accurate. 

Well, powerless no longer.

Simple and effective steps 

Here are some simple and effective steps that any market research professional can take to ensure the panel data they are using is as accurate as it possibly can be.

Trap questions. While you might not have much control over who takes your online survey, you have complete control over the questions they are asked. A trap question is just what it sounds like – one that is designed to catch a cheater.

Remember when Joe employed a bot to augment his ability to cheat? Bots can be a serious problem in the sample world. While the term may conjure an image of a scowling, 36-armed robot pounding at dozens of keyboards, they are actually just software programmed to be able to take surveys, many surveys, and quickly. 

Bots rely on a principle that has become a treasured axiom of us researchers: “We are only seeking your opinions. There are no wrong answers to these questions.” Although that encourages honesty, it also allows a non-sentient respondent to answer survey questions in any fashion and not break any rules. If there is no objective truth then any response to a given question is acceptable.

So, the first principle for a trap question is that it actually have a wrong answer. Here is an example of a common trap question. 

Which of the following is a color?

  1. House
  2. Ball
  3. Red
  4. Movie
  5. Bottle
  6. None of these

That may seem simple enough to answer correctly but try doing it without a functioning brain. The typical bot will not know the correct response and will have a one-in-six chance of getting it right. Program your survey to terminate anyone who gets this wrong and you will catch most/all bots. 

This question will also inform on the quality of the panel company with whom you are partnering. No real respondents should be terminated at this question. If you find that upwards of 20-30 percent of people who start the survey are being terminated here, you need to have a word with your sample partner. 

While some cheaters play the numbers game and unleash bots on every survey they can, others actually take surveys themselves when the incentive is high enough. For them, trap questions are going to need a bit more sophistication. Anyone can identify a color, however there is always content that your target sample population will find elementary but will elude a cheater.

For example, IT decision makers (ITDMs) are a high-demand survey population. As technology evolves in the workplace, those who sell it are constantly surveying their market. Many professional survey-takers know that ITDMs are highly incentivized respondents and also get a steady flow of survey invitations. 

In a survey of ITDMs, here is a good trap question:

What does SaaS stand for?

  1. Service as a Sale
  2. Software as a Service
  3. Solutions as a Service
  4. Service and all Software

There is not an ITDM on Earth who doesn’t know that the correct answer to that question is #2 but the same can’t be said for all cheaters. And while that question alone might not catch all cheaters, adding more of these types of trap questions that are obvious only to those qualified to take your survey will weed out just about all of those who are not. 

IP lock security. One of the tools cheaters employ, either via bots or by their own hand, is to make themselves appear to be more than one person on a panel. This allows them to take the same survey multiple times and repeatedly collect its incentive. However, many do not bother using more than one computer to take surveys. Your IP address is tied directly to your ISP connection and is a unique identifier for the connection itself and not the user. Virtually every survey programming software tool comes with a security function that will lock out an IP address after it has taken the survey once. Make sure you understand that and are using it. If you are not programming your own survey, make sure whoever is programming is using it.

Test your survey repeatedly from the same IP address to see if you are allowed into it more than once. Have multiple parties ask to be able to test the survey from different locations but then use the provided test path from one machine repeatedly. If you are allowed into the survey from the same IP address, make sure that its host has IP-level security. If your programmer does not have IP-level security, strongly consider working with one who does. Even honest respondents can be members of multiple panels and if two of them form a partnership on your project, the same respondent could be unwittingly invited to take it by each. IP security will prevent them from being able to complete it.

Join your partners’ panels. There is perhaps no better way to gain visibility into the practices of a given panel company than to join it yourself. You will be exposed to countless insights about which you would otherwise have had to take them at their word. This will include, but by no means be limited to, the process by which you were recruited to join, the procedures for verifying the authenticity of your information and who you claim to be, the frequency with which you are asked to take surveys, the frequency and manner in which your identity is confirmed, the incentive being paid relative to what you are paying, etc. The insights to be gleaned from being a member of your providers’ panels are many. 

The most discovery-rich approach one can take to join their partner’s panel is to do so in a way that does not alert them to your presence as a member. That can achieved by Googling different ways to join panel companies and frequenting parts of the Internet where recruitment is common. If all else fails, you can and should just ask them to invite you. You likely won’t get to see as fully behind the curtain but at least they will know you’re looking.

Control speeding and straightlining. Remember, professional survey-takers have no interest in even trying to provide you accurate data. They maximize their profits by taking a survey as quickly as possible and moving onto the next one. The same is true for bots they might employ. A cheater will almost never take the time to actually read the content of the survey questions or their response choices. 

So, one way to easily spot cheaters is by noticing the outliers who complete the survey impossibly quickly. If you test your survey as a real respondent and it takes about 11 minutes to complete, anyone who completes it under five minutes has probably not read a word of it. Take the time to establish where the line is between a fast reader and someone who is exceeding human capability. If anyone crosses that line, their data is worthless.

Almost all online surveys contain matrix or grid questions. This is a design that asks for your opinions to many different items on the same page and all in response to one overall question. 

Across the top of that page will be your scale of agreement, then the items you want rated will run down it. For example:

Please rate how much you agree or disagree with each of the following statements on a 1 to 5 scale.

  • Item 1
  • Item 2
  • Item 3
  • Item 4

The professional survey-taker will want to get through these grids as quickly as possible and therefore will often just give the same response to each one, all the way down the page. This is called straightlining. Alternately, they may mix up their responses at random with no regard to reading the item text.

There are two primary ways to combat this. 

If you find each matrix question in your survey has the same response to all items, that is a straightliner and their data is no good. 

In order to catch the more clever cheater who knows better than to straightline, simply make one of your vertical items a direct command, such as: “Please select the number 2 for this item.” 

Anyone just clicking through and not reading the survey will have a one-in-five shot at getting that right. Put another way, no one ever made it in baseball with a .200 batting average.

Watch the river. As we previously covered, one of the ways in which panel companies are sampling your project is to reach completely outside their dedicated panelists and use river traffic directly from the Internet. While a small percentage of river sample is generally fine, an excessive amount can cause problems. 

Most panel companies will be transparent as to how much river sample they are using or plan to use. That’s step one in controlling for too much river sample: Ask your partner. Generally speaking, 10 percent of your total sample is a good target limit for consumer traffic from river sources. However, for a higher-level B2B study, 0 percent is an appropriate target due to the lack of vetting and Internet sourcing.

There are many panel companies out there who only provide river traffic and have no actual panelists. They provide cheap, fast sample and this can make them an attractive partner to any panel company looking to maximize profits, which can create significant problems. 

For example, a drawback of river sample (apart from data quality) is that it cannot be recontacted. If you finish your study and realize you forgot to ask everyone a critical question, any river sample you have used is off into the ether. This creates an opportunity if you are suspicious that there is more river traffic on your study than was agreed upon. Tell your panel provider you may need to do a recontact and ask their expected completion rate. If they say 60 percent+ they are likely being honest. If they say 20-30 percent, or worse yet 0 percent, your sample is likely to be mostly from the river. 

It can be hard to identify river sample from actual panelists but one way to do so is to insist on different survey links for river and for panelists and, on the panel link, include a question that says, “Who invited you to take this survey?” Your panel provider should be able to tell you exactly what the answer to that question should be. They, after all, should have invited everyone on that link. If you get answers to that question that are all over the place, or say things like “ESPN.com,” you will know something isn’t right.

Not going back in the bottle 

The market research world was a simpler place before people were paid to be quantitative respondents but that genie is not going back in the bottle, nor should we want it to. Just like any industry, there is a wide range of quality among panel companies and your first step to ensuring client success is to identify the ones you know to do a good job managing data quality and engage them as much as possible. 

However, even the best ones can be subject to a particularly clever or well-equipped cheater. Only through having a full understanding of how the panel industry works, the risks inherent to its structure of sampling and the steps that you can take to actively mitigate them can you ensure that your clients are using the highest-quality data possible to make decisions. Your panel partners and clients can and will have differing objectives when it comes to gathering data and it is incumbent upon us as researchers to hold those of our clients paramount.