Elevating Data Quality

Editor’s note: This article is an automated speech-to-text transcription, edited lightly for clarity.  

IResTech’s VP of Product, Roddy Knowles, gave a presentation on the issue of survey fraud within the industry on February 1, 2024, during the Quirk’s Virtual – Innovation event.

Knowles touched on the importance of fighting fraud and emphasized that this is a problem that the entire industry needs to come together to solve. He gave some tips on how to catch survey fraud before it happens and other ways to combat this issue.  

Watch the full recording or read the transcript to get all the insights Knowles offered.  

Session transcript

Joe Rydholm: 

Hi everybody and welcome to our session “Elevating Data Quality Catching Survey Fraud Before It Starts.” 

I'm Quirk’s Editor, Joe Rydholm. Before we get started, let's quickly go over the ways you can participate in today's discussion. You can use the Chat Tab to interact with other attendees during the session. And you can use the Q&A tab to submit questions to the presenters during the session and we'll answer as many as we have time for during the Q&A portion.  

Our session today is presented by dtect. We hope you enjoy it.  

Roddy Knowles: 

I appreciate everyone joining today. I'm excited to be chatting with y'all about a few things related to data quality and survey fraud.  

Before we get into it, a little bit about me. I'm Ruddy Knowles. I'm VP of product at IResTech where I focus on building the dtect platform. We think a lot all day and every day about survey security and a lot of different types of fraud. So, I'm really, really excited to be talking with you all about this today.  

Let's jump right in and sort of set the stage. There are three things I'm going to do in our time together. I'm going to start by taking a look at the landscape of fraud, really just to set the stage because we talk about fraud a lot, but what do we actually mean when we're talking about that? It means a lot of different things.  

I also want to give everyone a peek behind the curtain so you can see different types of fraud that we're facing in the market research industry.  

And then I want to zoom out at the end, we think about the bigger picture and the impact of fraud on our industry, but also on all of you and your individual jobs and what you do in your day to day. 

So, let's jump right in a little bit about dtect. As I mentioned, we deal with survey fraud all day. That's one of the primary reasons why dtect was founded and why we built the platform really to solve two fundamental challenges that are really, I think existential to the research industry. It's to grapple with fraud, which is persistent and ever evolving as we're going to see. And also, to deal with inefficient and oftentimes laborious field management.  

We really try to solve two things with our platform, stopping fraud before it starts and giving you a place where you can manage all of your projects. You can manage supply, you can look at data quality, and you can compare across suppliers in real time, so you know exactly what you're getting.  

So just for a little bit of context, we've been talking about fraud in the research industry for, I don’t know, forever maybe. I mean definitely in the couple of decades plus that I've been in the industry. It's been a constant conversation point, but I would say it's picked up in the last couple of years for sure. Try to go to an industry event without hearing a conversation about fraud in an hour. Good luck with that. And it's not because people are just hypersensitive to it for no reason because fraud really has become an even bigger issue.  

Also, the types of fraud that we see are changing the ones we're combating today are not what we were combating yesterday. The ones we're going to be combating a month from now, are going to look different than a year from now.  

So, it's really important to keep on top of this conversation and also to understand how the landscape of fraud is changing. 

Like most industries, it's a bit of a cat and mouse game. It's a combination of humans and technology and humans leveraging technology on the fraud side and then also on the fraud prevention side, it's really the same thing.

So, while you might not know the exact mechanics of everything that's going on, that's fine, but I do think it's important to know and to keep an eye on the basics so you can have informed conversations and also know what you're fighting against. 

So, let me jump right in. As I mentioned quickly, I'm going to talk about the landscape of fraud a little bit. This is a quick tour. This is not meant to be comprehensive. We could talk about this for half a day easily. This is meant to do a little bit of justice to the landscape of fraud, in a couple minutes, talk through some key terminology and some important things that I think you need to know.  

So, first of all, survey farms. You hear a lot about survey farms. Maybe you picture a farm, you picture something idyllic in your brain. Survey farms are not like that.  

If you're not familiar with what a survey farm is, it's an organized group of people that are spending time trying to gain incentives from competing research surveys. So why is it called a farm? 

It's because there are a bunch of people essentially trying to farm or take the incentives or garner things from that farm by ultimately committing fraud and completing surveys. 

So, the image that pops in people's brains sometimes is like this super sophisticated high tech call center type place where there's people sitting around and leveraging technology to complete surveys all day. Well, it's not exactly true, but it's mostly not true if you really think about it. 

Yes, it's a group of people for sure, leveraging technology to ultimately commit fraud and complete surveys, but really honestly looks a bit more like this. It is, I'd say, a semi-organized group of people typically led by a leader who leverages technology. We're going to talk about some of that today, increasingly AI to figure out how to cheat the system.  

So having different scripts for people who are at different levels of training and trying to get past a screener and ultimately convince us that it's a legitimate complete. And I'm a person who actually is qualified to take the survey and ultimately get the incentive.  

A lot of this traffic happens in countries where it's profitable to amass relatively small incentives, like at scale, that really adds up. We don't see as much of this in North America, for example, as we see in other parts of the world, and that's going to be relevant to my conversation in just a minute. 

Another thing that's to set the stage with and define quickly are bots. This term gets used all the time and means many different things to different people. But really when I think about bots, I think about automated agents or scripts that are employed, in our case, to take surveys as humans would. 

Again, this word is used liberally. And in talking to clients, they use this word a lot to mean many different things and sometimes even to talk about human traffic, they see bad data and often assume that it's a bot.  

You should sort of care what bots are doing, but ultimately what really matters is perception, right? If someone was perceiving traffic to be bought traffic, and really, its human traffic, it sort of doesn't matter as long as the traffic is bad and ultimately going to problematize your study.  

Botnets is another term that's coming up increasingly. Essentially, a botnet is a distribution of malware infected machines used to take surveys. In this case primarily used to spoof locations and other digital elements to mimic legitimate machines or devices that people are using or legitimate people that are taking surveys.  

And this is one of the things that we've discovered at dtect that people have started to use more frequently and we put measures in place and ways in our platform you can ultimately track and prevent these botnets from becoming more problematic.  

And then also on the side as humans too. Again, I've mentioned technology and technology is obviously critical, but fraud takes many forms.  

And so, think about survey farms and more automated and more broad at scale. I think about humans as being a little bit different here. I think about these as being individual people who are trying to game the system. These could range from sophisticated people who are lurking or participating in message boards, trying to figure out how to manipulate end links or trying to pass screeners or things like that, to people who are just moderately savvy and trying to take advantage of what they think qualifies them for a survey by maybe answering that they're high income, by maybe selecting all the responses saying, I've done all these things so I can qualify. These are the people oftentimes who get into your study and pretend they're an IT decision maker, but they're not.  

There's also, I'd say a third group of people who are maybe good people, but at times they fudge the truth because they want an incentive or maybe they are frustrated because we bounce them around a router and waste a lot of their time and they keep screening out and never qualify and they get fed up with it and they ultimately say something that they're not or claim something that is not true in order to get an incentive. 

And then lastly, I consider this a little bit different. It's not necessarily fraud, but it gets lumped in this category sometimes with people that aren't necessarily paying attention, they get bored during a survey, you torture them with a bunch of grid, you throw them a 20 minute survey and they start to exhibit poor survey taking behaviors, they maybe straight line, they maybe go through open ends and don't give you great responses. I wouldn't necessarily say it's fraud, I mentioned it because this tends to get looped in under the rubric of fraud in some people's eyes.  

So, these differences matter to me because I think about this all day. They matter to you, most of you because it's important for you to understand these differences because of the way that you would attack human fraud and also inattentive or problematic survey taking behavior and also more automated fraud and more fraud at scale through usage of bots and oftentimes found from survey farms. You're going to attack it in different ways, but oftentimes your clients don't care.  

And I think that's important because what your clients care about is that getting good data. Is data quality something they can rely on to make the decisions they're trying to make, and will they feel confident in doing that?  

So, whether it's coming from a bad human, a survey farm, someone using a bot, it sort of doesn't really matter.  

Now I want to give you a glimpse behind the curtain a little bit. We talked about the landscape of fraud, and this is also a place where we can sort of go really, really deep. But I want to go behind the curtain just a little bit more to focus on one specific thing, which is what's at the center of fraud these days? What's at the center of almost every conversation these days? It's AI.  

No huge surprise here. This is a hot button topic within the industry and rightfully so. This is what many frauds are leveraging. What are we leveraging to fight fraud? It's AI too. It's oftentimes a similar toolkit.  

AI really is the battle ground right now and I think it will continue to be for the foreseeable future. So just the basics.  

I think most of you, hopefully all of you have used ChatGPT or Bard or some generative AI platform by now.  If you haven't, I encourage you to take 30 minutes out of your day and do that for a number of reasons. 

Not necessarily just so you understand what frauds are using, but because you can think about how to leverage the technology in a number of meaningful ways. But it is really helpful from the fraud perspective to understand how people are using it. The primary way that people have been using ChatGPT or any of these generative AI platforms are for open-ends.  

As we know as researchers, anyone who's been in the industry for a while, open-ends have been used as a primary indicator of fraud and poor data quality for some time. You see a response to a question that feels like someone didn't pay attention, they just put in gibberish, they're using a language that's not the language that the survey is in. There are a number of really easy tells within an open-ended question to see if this is a good response.  

But what does ChatGPT and other generative AI platforms allow you to do? Well, they allow you to do things like simply put in a question and then copy and paste.  

So, what we were seeing a few months ago was stuff like this. You ask a relatively simple question and put this into, in this case ChatGPT, and you get a pretty in-depth response.  

So, how do you recycle a disposable coffee cup? This is what ChatGPT gives you. When you see this copied and pasted into an open end, that's a pretty easy tell that this is not coming from a human. It would take a lot of effort and be a thorough survey taker to come up with something like this. This is clearly created by ChatGPT.  

Something we also started to see is selective copy and paste. There is obviously some some acknowledgement that putting in a large response like this may be problematic and may tip people off that this is fraudulent. So oftentimes selecting what comes at the end.  

In this case, it was just coming at the end here, there this latter part about compostable cups. They were taking the last few sentences and pasting that in. This is also a tell that we continue to see this a fair amount where people just paste them. The end of a statement there, which is still not a great actual response to this question. So that's a pretty easy to tell too.

So, keep in the back of your head this copy and paste behavior because this is super, super important.  

Another way that people use ChatGPT, or something like it, to answer questions is to seem smarter.  

I wouldn't say this is intentional, but the responses seem smarter. I'm not saying survey respondents aren't smart, but generally the amount of thought and effort put into an open-ended question looks more like what's highlighted in yellow than it does down below.  

I won't read both of these to of you now, but just an answer to a question. I'll look at the one on the left. This was asking reaction to a concept and say, yeah, a car in this case it looked cool and fast. Okay, that's something a human might say in a survey. 

This larger response about the design and overall appearance of playing a significant role in yada, yada yada. This is not really a very human sounding response, at least it's not typical of most open-ended data. So, this is another tell that you can use when you're looking to say, is this probably generated by AI?  

But what's changing? I mentioned what we've seen in the past, how we see people starting to get a little bit smarter. We see that fraudsters themselves are getting smarter and they're using Chad GPT in better and more efficient ways that prompt engineering is getting better. So for those of you who aren't familiar with prompts, prompts are what you give the platform in order to create an output.  

So rather than just asking a question, you could say, pretend you're this persona, do it in this voice. Write in on an eighth-grade reading level, make some grammatical mistakes.  

You can be really clear about what you want it to do in order to give a response that is going to look more legitimate and human-like. So, this is something that is going to continually change over time. So bad responses that we used to be able to look at with human eyes and say, this looks like AI. 

There's actually still a lot of that stuff out there. So you can definitely do that, but it's getting harder. So that's a really important thing to think about. 

I'm going to ask you to look at this for a second. I'm not going to read all of things to you, but these are responses to a concept that was presented to someone. And the question is, what do you find most appealing about this concept?  

So, have a look at these questions here or these answers to these questions here. I think there's six of them. Think about it for a minute. Think about which of these looks like they're AI.   

So, a few things to think about here.  

One is usage of punctuation. This oftentimes tends to be a tell when there's punctuation comma usage. Not that humans don't know how to use commas, but they're not typically thorough about it when responding to open-ended questions. So, you might look at comma usage. You can look at the length of the open-ends too. Maybe that's a tell here. Just look at the last one about uninterrupted pet food supply. It's not a very human way of saying something. So maybe that one is AI.  

What do you think?  

Well, this is actually a trap question. These are all created by AI. So even the ones that look like they're human, there's maybe no punctuation here.  

“Love it. No more panic when I realize I'm not a pet chow.” That doesn't really sound like something a computer would say, does it?  

So, something a human would say, but if you go with the right prompt, it sounds like a human.  

So yeah, I tricked you a little bit here. But the point is, if you're going to catch fraud, not just what looks like fraud, then you need other tools at your disposal. Copy paste detection is one way to do that. And there are other tools that you can use as well to really determine what is AI. You'll need more tools rather than just human eyes to catch it, and that's critical.  

So, staying vigilant I think is really important. You need to take a multi-pronged approach in a sophisticated approach to calling out AI. This is one of the things that we really focused on when we are building our platform. A way for people to detect AI usage.  

Among other things, you can use screening questions, you can use some of the typical tools that researchers have used for years. You can also leverage AI detection and some other things I'm going to talk about as well in order to make sure that you can be confident in the data that you're getting and that it's not coming from a robot.  

Now, something else that I want to point out before we move on and sort of go in a different direction are some of these things that are on the horizon.  

We started to see something interesting really toward the end of last year. And these are customized browsers that people were using. This was evading a lot of the typical detection methods that we had used and others in the industry had used to detect fraud.  

What customized browsers are in a nutshell, or like an instance of a Chrome browser, which is customized to get past survey screeners, they do things like manipulate end links, manipulate URLs and do other things to mimic location but it's customized for the specific purpose of doing this. These are not very difficult to build. And we started to notice this when we saw fraud in a specific study. 

We actually noticed in a number of studies, but I'll give you just one example. And we were looking at all kinds of different elements of the study to really take it apart and decide what was happening here. And then ultimately how do we track and prevent this?  

This is a U.S. based study. When we got into it, we looked at the browser types. They were 99% mobile. Yes, we got a lot of mobile survey takers, 60-70% range, but 99% is a little high. 

Also, 84% Android users. It's about a 50-50 market share between Android and iOS. Depending on who you're targeting, it may fluctuate a little bit, but you're not going to say 84% Android on pretty much on anything.  

And then when we dug in and we looked at the countries where these sessions are coming from, I won't read them out to you, but like 90 some percent of them were coming from different Asian countries. 

And this is supposed to be a U.S. study, so not calling any panel out here, but there are definitely some quality issues with the participants that were coming into this study, and these were coming from these gamed browsers.  

We built in detection in our system to not just look at location but block certain locations to target these game browsers and look at other elements that we use to ensure these aren't impacting your study. This is just one example of something that we've seen recently, which I think is going to be leveraged more and it's one of the things that you really, really need to keep on top of.  

But now that I've given you a look at the basic landscape, a glimpse behind the curtain of just some of the things that we all face every day, certainly I face every day, but a lot of you may be facing but you don't really realize it.  

I wanted to zoom out and look at the bigger picture here and think about what this means for us as an industry, as a market research industry, and what does this mean for you specifically?  

I said earlier, who really cares about the nuances of where bad traffic is coming from or what bad quality data is? I said I do. Most of you do. Who feels the pain? We certainly do, but also your clients do too. 

By delivering data to them that has fraud in it or is just poor-quality data in general, they are feeling that pain and go out and talk to other clients. Any of you who work directly with clients, if you work with agencies and in term work with clients, talk to the people who are buying research. And this pain is ever present right now, and it's important to understand because it's changing the way people think about our industry.  

And when I think about our industry, there's also two things that I want to just remind you of. These are basics, but they're important and fundamental.  

The industry is built around making money, like every industry. How do we make money? We get revenue not from every transaction but from successful transactions.  

So, if someone's coming through a system and they're fraudulent and they're not a legitimate complete, we're not making any money, the people along that path are not making any money. And that's really critical, obviously.  

Also, what is the market research industry and what is market research fundamentally about? It's about giving people data to make decisions and data they can actually rely on and data that they're confident in.  

And with fraud being as rampant as it is and continuing to really get worse and even provide more challenges for us, it fundamentally challenges these two basic things, the revenue that most companies need in our industry and the very underpinning of the industry itself. So, I'm not overstating this, fraud continues to be the central thing we need to focus on as an industry. 

And where does it hurt most? I can sit here and talk about this all day. Yeah, fraud is bad for this reason and that reason and whatever. And maybe you're having some of these internal conversations like ‘woe is me,’ ‘all this bot traffic’ and ‘all this bad data,’ and ‘my client complained about this,’ but it really hurts most when it hits your pocketbook, when it hits the bottom line.

There are three primary ways I think about this, and I think you should be thinking about it too. How is fraud actually impacting your business?  

First is sample, waste. We have traffic that's coming in that's ultimately not converting to a paid complete. This hits sellers, this hits buyers, this hits marketplaces, aggregators, anyone in the middle too. And this sample waste is a total drain on our entire market research ecosystem.  

It also hurts because of the time. It takes time and resources. The amount of time and ultimately money spent on QA projects, on data processing, on project management, on increased time for analysis, going back into field and refueling a study for example. All of these things take time and ultimately cost money. And as I say, time is money. And so it takes more time in field when you've got fraud coming through, when you don't realize it until after the fact that you pull a study out of field and you have to go back and replace 50% of your completes because they're problematic, which gives you slow results and ultimately makes you look bad in front of your clients.  

So, it's important to think about these different ways that it hits the bottom line, but most importantly, the loss of confidence and the data we collect is something that none of us can afford. And I say that as a blanket statement. 

It's absolutely critical to not be delivering data to your clients that is fraudulent or just generally low quality. This is a place where we're all in this together and I can't see you all, but I know a lot of you are nodding your heads at me.  

I guarantee you've all been in a situation where you deliver data to your client and then they come back to you with complaints about the data quality and you're like, “oh, [insert expletive here].”  

Those are not fun conversations. As those conversations start to scale and add up, what happens? Clients end up heading for the door, which is ultimately something that none of us can afford if we want to stay in business.  

And this is really, I think, the most existential threat to the industry. And this is something we really all have to come together and rally around. 

And so, parting words about what this means for you and what you can do because again, we're all in this together. 

The first and most important thing, and this is really critical, is to get ahead of the conversation about fraud. Get ahead of the conversation about data quality. If you're in a position where your client comes to you because of a specific study because they begin to notice things and maybe not bring it up to you because they're reading some things about the industry, whatever it is, and they're asking you these questions about data quality and about fraud and what are you doing about it? You're already on your heels.  

I encourage you all to have proactive and confident conversations with your clients and that confidence is only going to come from educating yourself. It's really important to stay on top of this. 

You don’t need to go into every detail, but it's important to understand the landscape of fraud. It's important to understand what you're doing about it and to have a strategy. Now that strategy is multi-pronged. You can leverage a platform like detect or leverage whatever other best in class technology you are using to prevent fraud before it starts. But you're going to need a multi-pronged approach.  

It's going to be leveraging technologies, it's going to be leveraging the humans that you have at your company, and it's also going to rely on you finding like-minded partners and allies to work with. So, find people who are trying to solve these same problems. Find partners who are trying to solve these issues from the technological front as well. Find your clients and have conversations with them because they're trying to address some of these issues too. 

So, it's ultimately about being in this together and finding the right people to go on the journey with you.  

With that, I will conclude here, and I'll stick around for a few minutes to answer any questions because ultimately, I want to figure out what I can do to help. 

Again, I spend all day sitting around and thinking about talking about this and building a platform to address fraud and data quality, and I'm always happy to have these conversations.