Data fraud: A threat to market research

Listen to this article

Fraudulent responses and poor data quality

Editor’s note: Tom Burdick is vice president of Eleven Market Research, Austin, Texas.

There’s an episode of “The Office,” season 5, called “Customer Survey” where Jim and Dwight are shocked to learn of their poor customer satisfaction ratings. They later learn that Kelly tampered with the results to get revenge on them for missing her party. Like many episodes of “The Office” the farcical plot is hilarious. Unfortunately, for us in the market research industry, there is nothing farcical or hilarious about intentional data fraud. Today the industry is facing an existential threat from our respondent community – from bad actors, bots, AI imitation and other sources of data quality deterioration.

Gaps in combatting data fraud

The data collection space is fragmented, often with different parties taking pieces of a project. It’s like an assembly line for collecting vital data points from respondents. This “specialization” is often a benefit in terms of expertise, efficiencies and cost-savings. However, it can also lead to gaps in combatting fraud. The players in the process can often include the end client, the MR agency/consultancy, the questionnaire developer, the survey programmer, data processing, the sample acquisition firm and the sample/panel provider. There is of course overlap between these parties, but even firms that consolidate some of these steps in-house still need to address hand offs and the “chain of custody” to ensure a successful project.

It is also important to note that currently AI/bots make up a small, but growing, proportion of the fraudulent responses/bad data category. Fraudulent responses/poor data quality can come from individuals trying to deceitfully qualify for surveys, professional survey-takers or lazy/fatigued respondents trying to finish the survey. Poor survey design, obvious screener qualifications, lengthy and/or repetitive surveys and too many open ends all contribute to poor data quality, even from legitimate respondents.

As an industry we need to unite as a community to fight poor data quality on all fronts because we all can play an essential role in defeating the fraudsters. Below are some ideas on how each player can do their part.

End client, MR agency/consultancy, questionnaire developer

These actors can be classified together to focus on the critical element of the development of the screener and questionnaire. This is the foundation of the data collection. The screener should serve as the guardian of the entry point to the survey. It should conceal the goal/topic of the research and not “tip” or lead the respondent into knowing how they can qualify. Don’t provide an introduction that tells them what the research is about. There’s no reason to let them know. Even after qualifying its best to keep things intentionally vague. Make sure you are terminating as many as possible within the screener. Finally, include demographic and firmographic questions as panels often don’t have the most up-to-date profiling.

Once in the main survey, add quality traps, red herrings, “dummy brands” and attention checks within the survey to terminate fraud in real time and enhance the data quality coming back. Verify a response or two later to assess the consistency of the answers. The more bad respondents that can be terminated here means less tedious manual review and judgement calls about keeping or tossing a record later. Of course, open ends can be a good source to measure quality, but more respondents know this and use AI and the internet to craft their response; responses that are often “too perfect” in nature. Keep in mind that too many open ends can also turn off legitimate respondents (two or three should be enough).

Survey programmer, data quality software

Those in charge of survey programming can be a strong partner to the questionnaire designer to implement proper term points, logic and routing. Optimal formatting, a user-friendly interface and mobile optimization are all essential to respondent experience, but most importantly, to detect fraud. We all know about speeder and/or straight-liner terms but adding other measures from disabling the ability to copy/paste into an open-end text box, or adding honeypot questions (embedded hidden questions that only bots would answer), allows you to identify suspicious behavior.

Data quality/fraud detection software can help to block potential bots, AI and click farms from ever entering the survey and can incorporate fraud detection measures into the survey itself. There are at least half a dozen companies touting their advanced digital technology and techniques in this space. While they likely all have some limitations, they can play a vital part within the process. It’s important to evaluate how each could help in relation to one’s specific needs.

Sample acquisition firm, sample/panel provider

The “frontline” and genesis of the respondents are often blamed for all of the bad data. They are an integral part of the process, but they are only as good as what the above players construct. It’s like putting lipstick on a pig and hoping that it yields beautiful results.

However, panels can absolutely improve the product (the respondents) that they deliver. Panels can and should remove panelists flagged repeatedly for poor data. They can identify respondents masking their IP address or using VPNs often. They can regularly update profiling points and look to obvious inaccuracies from prior profiling – a sign of dishonest behavior. They have an abundance of data points collected on their panelists that take surveys and can uncover anomalies with some of them. They can analyze panelists that take an exorbitant number of surveys in a certain period. All of these practices, plus others, can help cull out bad respondents from their panel and thus enhance the quality of those that remain.

It is also recommended that the field management team use multiple sample sources on each project to guard against both panel bias as well as the risk of one or two sample sources providing tainted data and corrupting an entire data set. Using many sample providers lowers this risk and allows for adjustments based on quality while in field.

Data processing, data review

This is the last line of defense and is often at the mercy of all the actors that come before. However, it is important to note that data processing and data review can play an important role in checking soft launch and identifying strange and/or suspicious responses. If awareness seems off or if allocations for revenue or spending are not as expected, these should be investigated before collecting additional data. Similarly, identifying the specific panelist IDs that are providing poor data can help the field management team look for patterns and then quickly remove or shift quota during fielding based on the client’s data quality review and feedback.

Data quality standards

The market research community needs to work together to help defeat this existential threat to our industry. The good news is that the majority of us are dedicated professionals who want to do the right thing and deliver flawless data quality and trustworthy client reports. If the market research industry can confront this issue and set the bar high for data quality standards, we will continue to flourish. As Michael Scott once said, “The only time I set the bar low is for limbo.”