By Molly Strawn, Senior Marketing Strategist, InnovateMR

Throughout the market research and survey fielding process, it is essential to keep in mind respondent data quality from start to finish. This year, Cybersecurity Ventures anticipates $6 trillion in damages as fraud methodologies have become more advanced and more sophisticated. Acknowledging that these dangers exist is vital to position research for success. 

To tackle this challenge, Lisa Wilding-Brown from InnovateMR, Hilary DeCamp from the LRW Group and John Voda from AT&T joined forces to educate the industry about top data quality methodologies. Their holistic insights cover design, sampling and data analysis.

“No one is impervious. Cyber fraud poses a very real and material threat to companies large and small,” says Lisa Wilding-Brown, chief research officer at InnovateMR. 

How fraudsters do it

Just as data quality measures are continuing to evolve, so are the means by which nefarious fraudsters exploit weaknesses to pilfer survey incentives. Today, these individuals are capitalizing on technological prowess such as:

  • Advanced bots – Bots today are able to bypass basic red herring questions through advanced AI that makes them more convincingly human. Since these types of bots act as real survey participants, basic length of interview (LOI) parameters and other standard quality checks can’t catch them.
  • Click farms – Fraudsters now use thousands of devices simultaneously, cashing in on survey rewards at scale. These new tricks are making standard device fingerprinting and IP authentication checks obsolete, as each device has a unique SIM card and spoofed IP address.

From the researcher’s perspective

John Voda, senior market research and analysis manager at AT&T, touts the importance of ensuring that every interview is vetted as a quality respondent. 

Voda was once involved with a medical market research study many years ago. In that particular case, a doctor’s son learned about the study and shared the link online and dozens of high school students took the survey posing as specialty doctors. With very real life-or-death consequences on the line, the burn that Voda felt influenced his dedication to top data quality from then on.

“Not everyone realizes the importance of identity verification,” Voda says. “Whether you survey a teenager, an underwater welder or the CEO of an international firm, it is critical that you ensure each respondent is a legitimate, qualified respondent.”

Presenting low-quality data to end clients can lead to:

  • Poor business decisions that do not accurately help them answer market questions
  • Wasteful cost per interview and budget hits
  • Refielding requirements
  • Improperly weighted data that skews insights
  • Lost projects
  • Lost clients
  • Irreparably jeopardizing a firm’s reputation

Allowing bad sample to enter your study can cost a lot down the road. According to Hilary DeCamp, chief research officer at LRW, businesses often choose a cheaper supplier to adhere to a more stringent budget. However, having to clean up the mess before presenting the data can be far costlier. In her experience, she has seen a researcher once remove over 80% of their data due to poor data quality that was not caught earlier. 

“It’s critical to focus on a quality-first approach; we often feel pressure to use low-cost options due to budget constraints but it’s imperative to ask questions around sourcing and understand the various security and methodological layers in place,” DeCamp says.

What researchers can do

When it comes to B2B sample, the stakes are exceptionally high as this segment is frequently targeted due to the high incentives typically offered in this line of research. According to Voda, there are six questions you should ask your sample provider when looking for high-quality B2B sample:

  1. How is your B2B panel recruited and vetted?
  2. What information is collected from your B2B audience?
  3. How is your B2B audience incentivized?
  4. What mechanisms are in place to mitigate poor quality?
  5. What has been your experience with B2B audiences in the market I am looking to study?
  6. Are you Microsoft-certified?

“Don’t leave it up to others to handle quality on your behalf. It is dangerous to assume that incoming participants have been appropriately vetted,” Voda says. “Ensuring quality requires a partnership between you and your vendors and you should work closely with these companies to have an open and transparent dialog about the tactics employed before, during and after the survey.”

Beyond including a layered data quality strategy when choosing a panel partner, the design of the survey itself can impact data quality. Poor study design that includes overwhelming grids, a length of over 30 minutes or leading questions that encourage over-endorsement can all lead to a skew in data.

“While average and maximum break-offs are highest in surveys longer than 30 minutes, poor study design can drive high break-offs in shorter surveys also, so carefully proof each survey to find and eliminate problematic questions,” DeCamp says.

One in-survey method commonly used to terminate fraudulent respondents is red herring questions, typically asking respondents to answer a simple knowledge question, perform a simple action or complete a basic mathematical calculation. The problem with these traditional measures is that they are antiquated; new AI-driven bots can portray human-like behavior and can make quick work of these questions. Just like Siri in Apple smartphones, bots can easily answer that the sky is blue, identify the word apple from a list or solve 2+2.

Building in more sophisticated red herring questions that are front-loaded early in a respondent’s survey experience is essential for keeping up with new bots. This can include:

  • Asking for industry-specific terminology – For example, if you are conducting a study targeting cryptocurrency experts, asking which answer most closely matches the definition of blockchain can be an easy way to root out both bots and unqualified respondents early on. It is critical to test and verify domain expertise.
  • Adding fake brands to a recognition list – Asking respondents to rank familiarity with fictitious brands can easily term bot selections or unengaged respondents. Survey takers should not be selecting that they use a brand that doesn’t exist.
  • Unaided and aided awareness questions – If looking for cloud computing experts, first ask them to list the tools they use via an open-ended question and then compare these responses to an aided awareness question that is captured later in the survey.

“A very basic trap question does not provide a material impact when it comes to invalidation,” Wilding-Brown says. In a test fielded in 2020, “A simple question only invalidated 1.4% of respondents and again, we have observed bots answering these types of questions with ease. When asked an optimal red herring question, nearly 22% of the same participants were invalidated across a wide variety of sample sources evaluated.”

From the supplier perspective

Panels should always take a layered approach to quality, following a respondent throughout their pre-registration, panel registration, pre-survey and post-survey lifecycles. Relying on self-reported data in a singular timeframe does not work; respondents must be exposed to validation and revalidation checks throughout their entire panel lifespan.

“The key here is to never assume that what people self-report is accurate or true. You must expose respondents to various checks throughout their lifetime in the panel,” Wilding-Brown says.

What suppliers can do

Throughout a respondent’s ongoing interaction with surveys, layered checks should be put into place at each lifecycle stage:

  • Pre-Registration – Diverse recruitment campaigns, benchmark quality, attitudinal, behavioral, demographical testing
  • Panel Registration – Digital fingerprinting, GEO-IP checks, hidden re-CAPTCHA, bot traps, e-mail validation, address/mobile verification, pattern detection, double-opt-in validation, red herrings, open-end analysis
  • Pre-Survey – Digital fingerprinting, GEO-IP checks, encrypted password and survey URLs, hidden re-CAPTCHA, bot traps, multi-factor authentication, red herrings, open-end analysis
  • Post-Survey – IP analysis, LOI tracking, address/mobile verification, pattern detection, reward redemption validation, analysis of client feedback, open-end analysis

“Quality does not exist at a singular point of time. Look for suspicious patterns throughout the respondent life cycle and take action quickly by quarantining participants from survey activity,” Wilding-Brown says.

At InnovateMR, specifically for verifying open-end responses, our team of research experts have developed a natural language processing tool called the Text Analyzer, capitalizing on AI technology. 

Text Analyzer helps to mitigate and remove gibberish answers, copy-and-pasted answers, profanity, duplication, personally identifiable information or other personal information that should not be passed into survey data as well as noncontextual answers that do not directly answer the question asked.

Available in English, Spanish, German and French with more languages to come, this tool pulls from an extensive library of questions served randomly to further validate the content of the respondent answer. In testing conducted last year, results yielded a 24% invalid rate for open-ended questions across the several suppliers tested but InnovateMR’s invalid rate was just 8% due to the vigorous quality controls invested by the firm.

“InnovateMR includes this tool in our registration path and panel ecosystem, but clients can also integrate this technology directly into their survey environment or platform via API,” Wilding-Brown says. “Clients can also use the tool on a more ad hoc basis via our batch uploader where customers can upload verbatim files to flag poor quality responses.”