Listen to this article

Can AI stop AI?

Editor's note: Sebastian Berger is head of science at ReDem GmbH. He can be reached at sebastian@redem.io. Bernhard Witt is CEO of 2x4 Solutions GmbH. He can be reached at bwitt@2x4.de.

Penetration tests, or pentests, are the gold standard for uncovering vulnerabilities in IT systems. While preparing our research paper for this year’s ESOMAR Congress, we deliberately tried to break survey quality checks. This raised an obvious question: Why doesn’t the survey industry use penetration tests?

The parallel is clear. Just as IT pentests expose vulnerabilities before attackers exploit them, survey pentests could reveal weaknesses in data quality controls before fraudsters take advantage of them. Introducing such a practice could help restore trust in online surveys at a time when the industry faces a serious data quality crisis.

It may sound unusual that we tried to break survey quality checks but our goal was to find out whether AI can stop AI from faking surveys. This effort was inspired by recent research showing that no single method reliably prevents bots from completing surveys undetected. This research makes one thing very clear: The main challenge in survey fraud prevention today is no longer catching inattentive or fraudulent humans; it is detecting and stopping advanced AI-driven bots.

Our work set out to examine whether our recently published recommendation, which combines AI-powered coherence checks with a content- and behavior-based analysis of open-ended responses, can stop even the most advanced AI bots from faking surveys.

From cybersecurity to survey security

In cybersecurity, a penetration test is a controlled attack that simulates real adversaries in order to uncover weaknesses under realistic conditions. The same principle applies to surveys. What matters is not whether a single quality check works on its own but whether the entire system holds up against an adversary who is actively trying to defeat it. Our experiment applied this idea by gradually escalating attacker capabilities and exposing the points where quality controls break when faced with determined, AI-assisted fraud.

We began with a simple automation bot that raced through a custom questionnaire, filled open-ended questions with placeholder text and failed every layer of defense. Next, we integrated a large language model to generate plausible open-ended responses. The content became more convincing but these entries were still flagged as bot-written. We then added a humanize function that reduced AI text detection and made the wording appear more natural. Yet, because the text was still entered artificially, behavioral analysis continued to expose the bot. By simulating keystrokes with pauses, corrections and reading delays and after several iterative adjustments we managed to bypass the behavioral layer. In the final step, we turned the bot into a fully autonomous AI agent that answered all question types coherently and used a synthetic persona to maintain internal consistency throughout the interview. At this stage, the agent became very difficult to detect.

Why surveys need pentests now

This is not a reason to panic. Our experiment was a stress test, using insider knowledge and significant effort to deliberately bypass even the most advanced defenses. In real-world conditions, such a coordinated attack is rare.

The key lesson is that no system is completely unbreakable. Given enough time, skill and resources, any fraud detection can eventually be circumvented. This is precisely the point. If defeating the system requires disproportionate effort, fraud becomes economically unattractive. It is like spending months digging a tunnel into a bank only to steal a few thousand dollars. The cost of the attack simply outweighs the potential gain.

What should you do? Think of the burglar analogy. Your door does not have to be indestructible; it only needs to be more secure than your neighbor’s. In our industry, this means that those who adopt robust quality controls early are the ones who remain better protected. This shifts the question away from whether your survey is theoretically vulnerable and toward understanding where and how it is vulnerable in practice. A survey pentest provides exactly that insight. By simulating increasingly sophisticated attacks on your online survey, from simple automation scripts to fully autonomous AI agents with coherent personas, a pentest uncovers the precise combinations of questionnaire design choices and quality checks that allow non-human interviews to slip through.

The result is a quantified and reproducible account of failure points that can be addressed and then retested, just as security teams do after an IT penetration test. This approach turns survey quality assurance from an abstract promise into a measurable, verifiable standard.

A clearly defined target 

A well-designed survey pentest follows the same escalating approach as our experiment while respecting the realities of research operations. It starts with a clearly defined target, usually a specific questionnaire on a given survey platform. The pentest team first uses a simple automation bot to probe speed rules, minimum engagement thresholds and basic trap items. 

Next, AI-generated open-ended responses are added to test content filters and language-based detectors under different prompt styles. The process then simulates human interaction by reproducing keystrokes, natural typing delays, reading pauses and corrections to evaluate behavioral analytics on both desktop and mobile devices.

The fourth stage introduces a coherent AI agent that answers grid, single-select and multi-select questions as well as open-ends in a way that is logically consistent throughout the interview. The final stage assigns the agent a synthetic persona to ensure cross-item plausibility, which in our experiment was the point where detection became very challenging.

The result is a comprehensive narrative report that includes evidence traces, pass-fail matrices for each layer, likely root causes and prioritized recommendations. These recommendations typically focus on strengthening the questionnaire’s ability to generate interdependent quality signals, for example by adding opportunities for coherence evaluation and crafting open-ended questions that elicit behaviors revealing advanced AI-driven fraud. They also address analytics, combining content, coherence and behavioral data into calibrated quality scores with conservative thresholds.

A follow-up pentest then verifies that the identified vulnerabilities are effectively closed under realistic attack conditions.

The bottom line for research leaders

Our research shows that the fight against survey fraud is increasingly similar to the fight against computer viruses. It requires constant adaptation. New fraud techniques must be tracked and countered in real time and detection systems need continuous refinement, expansion and reevaluation.

There is no single check that reliably detects all forms of fraud. What works today may fail tomorrow. Every individual method can be bypassed. Digital fingerprints can be faked with manipulated metadata and behavioral checks can be fooled by simulating human-like input patterns.

Only a layered defense that combines metadata, content-based analysis and behavioral checks can reliably catch advanced fraud. The solution is not a single silver bullet but an ongoing process that treats quality as a living system, something to be challenged, measured and improved.

Pentesting your surveys removes the guesswork from this process. It shows where defenses fail under real attack conditions, guides concrete improvements and gives your teams and clients confidence that your data is strong enough to support critical decisions. This is what credibility requires in the age of AI and it is possible today if we test ourselves before fraudsters test us. 

References

Berger, S. (2025) “How researchers’ mental shortcuts open the door to online survey fraud.” Quirk’s Marketing Research Review, Vol. 39. No. 1., pp. 32-33.

Berger, S. (2025) “Why online surveys need smarter quality assurance now.” Greenbook.org, May 20.

Berger, S., Mittermayr, J., and Witt, B. (2025) “Can AI stop AI from faking surveys?” ESOMAR Publication Series Volume S419, Congress 2025.

Sawhney, G., Bijlani, A. and DeSimone, J. A. (2025) “Are existing bot-detection techniques sufficient: An exploration with real bots.” Society for Industrial and Organizational Psychology Annual Conference, Denver.