Defining synthetic respondents
Editor’s note: Carlos Ochoa is innovation manager at Netquest-Bilendi, Barcelona, Spain.
It's the latest revolution that promises to change everything. The innovation that, this time, seems destined to end surveys: synthetic respondents.
While we're still trying to assimilate the impact of artificial intelligence (AI) on our lives, new and surprising uses of this technology appear in different activities and industries. AI is already being used to replace or complement the work of translators and editors, legal advisors, computer programmers, designers and more.
But what are synthetic respondents? The idea is related to the more general concept of synthetic data, which consists of creating artificial data that mimics the statistical characteristics of real data, without containing information from real people. This data is usually generated through algorithms, simulations or statistical models, with the aim of reproducing patterns and correlations of real data in contexts where obtaining them would be impossible or too costly. For example, if we want to simulate what volume of patients a hospital emergency care system can support, we can estimate the waiting times we would obtain for different patient volumes, using real data about what times these patient visits accumulate, how much time their care usually requires, etc.
The use of synthetic data is not new. Its use dates back practically to the mid-20th century, when statistical techniques were developed that facilitated obtaining simulated data from known probability distributions, with the less than laudable purpose of developing the atomic bomb.
But could synthetic data eventually be used to replace the data we obtain from consumers through surveys? That is, can we obtain synthetic respondents? This would allow researchers to perform statistical analyses and make decisions just as we currently do with data obtained through surveys, but at lower cost, with greater speed and without the problems associated with the privacy of personal information.
The emergence of Large Language Models (LLM) – such as ChatGPT (OpenAI), Gemini (Google) or Copilot (Microsoft) – opens the door to an ambitious idea: generating data without depending on real people. Observing the capacity of these models to answer questions with coherent and well-structured reasoning – in many cases, indistinguishable from what a human would give – it becomes almost inevitable to consider their use to simulate human responses. And, certainly, the proposal is as suggestive as it is promising.
Two ways to use AI to generate synthetic respondents
How can we use AI to replace responses from real people in quantitative studies? Fundamentally, there are two strategies.
The first is to use AI to replace survey respondents.
The idea consists of defining the different population profiles we want to investigate and asking the AI, through detailed instructions provided through "prompts" (that is, textual indications that guide the model on what and how it should respond), to generate plausible responses that said profiles could provide. For example, if I conduct an urban mobility study, I could ask the AI to respond to a questionnaire about transportation use in daily city trips as if it were a 25-year-old man, or as if it were a 45-year-old woman. If I request multiple responses for each profile, I'll end up obtaining a data set comparable to what I would achieve through a traditional survey.
The second strategy consists of skipping the simulation of individual responses and asking the model to respond directly and in aggregate form to the research questions.
Following the example above, it would be asking the AI to estimate what percentages of each of the population profiles use each type of transportation and what the main reasons are. This strategy would be equivalent to receiving a final analysis of data that we have never actually observed.
Each strategy has its advantages and disadvantages. Generating individual responses through AI offers greater granularity of information. LLM models don't always deliver the same response to the same request; they generate possible responses, each with different probabilities of being correct or appropriate. By adjusting a parameter known as temperature, we can control the degree of variability in responses. With low temperature, the model tends to always give the most probable response. With high temperature, the model allows greater variation in responses. If we set a high temperature and repeat the same request several times, we'll obtain different responses, of which the frequency will reflect, approximately, the relative probability. We can take advantage of this to request multiple responses for the same simulated profile and thus capture certain variability of opinions or behaviors within the population we want to study.
In other words: if we ask a simulated profile – for example, a 25-year-old man – what means of transportation he usually uses, and the model estimates an 80% probability that he'll respond "public transportation" and 20% that he'll say "private transportation," with low temperature and 100 repetitions we would always get the same response ("public transportation").
On the contrary, generating aggregate responses takes us directly to the end of the process. We're asking the model to perform certain reasoning – if the term is allowed – to directly estimate the expected response distributions for certain questions. The LLM will operate here in a substantially different way. It will try to estimate said distributions from existing data in its training. Following the example, it will look for information it has seen in its training data about public transportation use in the studied city and, if it doesn't find specific data, it could resort to available information from cities with similar characteristics. It would do something similar to what we humans would do: combine information from secondary sources with common sense.
Synthetic respondent use cases: Description or prediction
The principle behind synthetic respondents has certain logic, but does it really work? Here we can distinguish two main use cases.
Descriptive use
The first could be termed a descriptive use: asking an LLM to estimate already existing behaviors. For example, we could ask it to tell us what percentage of the population consumes energy drinks or, more narrowly, a specific brand of said drinks. In these cases, models usually offer good results, although what they do doesn't differ much from what we could achieve ourselves by searching for already available consumption reports and combining them with data such as advertising spending or demographic reports, among others. LLMs can search and combine data to generate coherent responses. However, this type of study is usually currently resolved with secondary data sources, without needing to resort to surveys.
Predictive use
The real problem arises with predictive use. Predictive use anticipates future or present behavior, but not observable, including opinions on topics that have not been previously raised. Most problems in commercial research with primary sources belong to this category, for example:
- What percentage of the population would buy a new product?
- Why do consumers prefer one brand over another?
- What proportion of consumers remember having seen a particular advertising campaign?
Can an LLM really respond accurately to this type of question, for which it doesn't have solid evidence in its training data or in sources accessible on the internet?
Empirical tests: AI to complement or replace survey data
Last July, the biennial congress of the European Survey Research Association (ESRA) was held – a reference in methodological survey research, but also in the development of alternative methods for data collection and analysis. The application of AI to complement or replace survey data aroused great interest.
One of the congress tracks was specifically dedicated to the theme of "Synthetic data generation and imputation with LLM." In particular, the first presentation, by Leah von der Heyde, LMU Munich, Munich Center for Machine Learning, shared the results of an experiment aimed at evaluating the capacity of LLMs to substitute survey respondents in order to predict the results of the 2024 European elections. The key question of the study: "Can LLMs predict the aggregate results of future elections?"
To answer this question, the researchers used three LLMs to predict the electoral behavior of 26,000 European voters, providing them with individual information about each voter's profile, according to the real demographic composition of the population, and compared the generated responses with the real results. They also attempted to obtain aggregate estimates using the same models.
The results were, in general, disastrous. Relevant differences were observed by country and language, and accuracy depended largely on the prompts including not only sociodemographic data but also attitudinal information. The study authors emphasized the limited applicability of synthetic samples generated by LLMs for predicting public opinion, which casts doubt on other possible uses in market research. As an example: the average percentage of electoral participation predicted by the models was 83%, when reality was 49%.
But why do synthetic data fail in this type of task? Several researchers – including the authors of this study – mention factors such as biases in training data, overrepresentation of certain groups, the inherent complexity of social and political dynamics, the digital divide affecting certain population segments or the hallucinations that sometimes occur in LLM responses.
I would go even further: the real question is not why synthetic data fails, but why we would expect it to work. LLMs identify relationships between words (or parts of words, tokens) from extensive training texts, using architectures with millions of parameters. In these relationships, both human knowledge and, to a certain extent, the logic that articulates it are concentrated, which allows models to emulate human reasoning in their responses. But in what way could an LLM represent, faithfully and representatively, unobserved behaviors?
The results described are devastating. Even so, some providers already offer solutions based on synthetic respondents, especially oriented toward conducting qualitative interviews with certain profiles of interest. I don't think it's coincidental, for two reasons:
- LLMs are terribly convincing in their responses. They usually make sense and have logic, whether they're correct or not.
- In qualitative studies, we don't have an objective truth to contrast results with, so no one can easily refute the apparent value of the information obtained.
The use of synthetic personas may have value for the researcher, but it's probably more in providing a well-informed interlocutor with whom to explore hypotheses or debate ideas, than in faithfully representing a typical member of the group of interest. This could be useful in the early phases of research to detect promising proposals, but could never completely replace data generated by humans, as pointed out in a study published in Harvard Business Review developed by Brand, Israeli and Ngwe.
In short, as Nik Samoylov, Conjointly, pointed out, synthetic data could be something like the homeopathy of market research: there's no evidence that they work, but many people still believe in them.
AI helping researchers with surveys
Despite the above, AI seems destined to play a fundamental role in market research. Several presentations at ESRA addressed precisely these possible uses, summarized in Reveilhan's (2025) presentation, which include:
- Questionnaire design
- Translation and adaptation
- Development of questionnaires capable of adapting to participants' responses
- Forecasting and prevention of nonresponse
- Interpretation and coding of open responses
- Data quality control
- Imputation of missing values
- Interactive analysis through natural language instructions ("talk to data")
In short, the survey – which has been attempted to be declared dead so many times (with the emergence of the internet, social networks and passive data) – remains more alive than ever and, paradoxically, could emerge strengthened with the arrival of AI.
References:
Brand, J., Israeli, A. and Ngwe, D. “Using Gen AI for early-stage market research.” Harvard Business Review, July 2025. https://hbr.org/2025/07/using-gen-ai-for-early-stage-market-research
Reveilhac, M. “Advancing survey research through AI and machine learning: Current applications and future directions” (conference session). European Survey Research Association (ESRA) Conference 2025, Utrecht, Netherlands. https://www.europeansurveyresearch.org/conf2025/prog.php?sess=137#main
von der Heyde, L., Haensch, A.-C., Wenz, A. and Ma, B. “United in diversity? Contextual biases in LLM‑based predictions of the 2024 European Parliament elections” (Version 2) [Preprint]. arXiv, 2024. https://doi.org/10.48550/arXiv.2409.09045