Fascinating Fakery

Michael Wagstaff • 2 June 2025

Is synthetic data the panacea for an industry struggling with data quality issues?

Market research is caught in a relentless race towards speed, affordability and scalability, driven by the increasingly urgent demands of modern business. However, this pursuit comes at a price: data quality. The industry today is littered with poorly engaged respondents,  bots, rushed completions and outright fraudulent entries. Now, generative AI has entered the room, capable of creating convincing "synthetic respondents."


This begs a critical question: should market researchers embrace AI-generated responses or is this a Pandora's box best left unopened?


The data quality struggle

Traditional surveys have long struggled with data quality. Speeders who race through surveys without thoughtful consideration, straight-liners who select the same answer repetitively and random clickers who choose options without engaging, are not just annoyances, they actively undermine insight integrity. Many panellists are driven by incentives, turning data collection into a transactional, rather than reflective, exercise. Add to this the widespread issue of survey fatigue, exacerbated by lengthy, tedious or poorly designed questionnaires and the conditions are ripe for low-quality responses. These concerns become especially evident in open-ended questions, where copy-pasted answers and shallow insights often dominate. Before even considering AI-generated text, much of the qualitative input we currently receive is already alarmingly thin.


Enter generative AI

Enter generative AI, presenting itself as both intriguing and potentially problematic. Synthetic respondents are essentially AI-generated outputs that simulate human responses. These virtual participants can convincingly articulate answers to survey questions, extracted from extensive language models trained on vast datasets of human-produced text. Academia and commercial businesses have already begun experimenting with this technology, attracted by its substantial promise: faster turnaround times, dramatically reduced costs and the ability to simulate niche or difficult-to-reach populations effortlessly. In addition, synthetic respondents offer the unique ability to rigorously scenario-test surveys, ensuring robustness before deployment to real-world audiences.


Ethical minefield and bias

However, synthetic data represents an ethical and methodological minefield. Central among these issues is validity: can an AI truly replicate nuanced human behaviour, complete with emotional responses, subtle biases and unpredictable cognitive patterns? AI-generated outputs might sound plausible but they lack genuine lived experiences and authentic emotional engagement. Furthermore, transparency becomes paramount. Shouldn't clients and stakeholders be explicitly informed when synthetic data has influenced - or replaced - real participant responses? Without transparency, trust and credibility in market research could be severely compromised.


Bias amplification poses another significant risk. AI models inherit and reflect the biases present within their training datasets. If unchecked, synthetic respondents might unwittingly perpetuate or even exacerbate existing stereotypes, inadvertently shaping business strategies around fundamentally flawed insights. Equally troubling is the risk of overconfidence. Beautifully constructed but ultimately ungrounded responses can create a dangerous illusion of insight, potentially leading researchers down misguided strategic paths.


Legitimate uses

There are, however, legitimate and constructive uses for synthetic data, provided clear boundaries are maintained. Pre-testing survey logic or refining wording is an ideal candidate, offering a practical, low-risk entry point into synthetic applications. AI-generated responses can also be beneficial for creating "expected" outputs that train or validate analytical models. Additionally, synthetic respondents can spur creativity and idea generation in the exploratory phases of qualitative research.

But there are clear ethical lines that must not be crossed. Replacing genuine respondents entirely in quantitative studies is deeply problematic, undermining the very essence of market research - authentic human understanding. Equally concerning is mixing real and synthetic data without disclosure, a practice that risks deceit and erodes stakeholder trust. Using synthetic data to artificially "validate" findings also threatens methodological integrity, distorting reality and potentially leading to misguided business decisions.


The implications for the market research industry

The rise of synthetic data carries profound implications for market research professionals. AI literacy is increasing rapidly. The industry needs to take stock and think about how to critically assess AI-generated outputs, distinguishing between genuine human insight and artificially intelligent mimicry. Data quality needs reframing: beyond merely eliminating poor respondents, researchers must now rigorously question the origins and authenticity of every data point. This evolution necessitates updated industry standards and ethical guidelines, explicitly addressing the legitimate use and disclosure of synthetic data.


While caution is essential, outright dismissal of synthetic respondents should be avoided - for the moment at least. Generative AI undoubtedly holds potential for innovation and efficiency in market research. The challenge lies in harnessing this potential responsibly. The industry should proceed with a balanced mix of curiosity and vigilance, prioritising transparency, rigorous methodology and the preservation of trust above all else.


by Michael Wagstaff 26 May 2025
Do tech people know more about market research than social scientists? In this article we discuss whether big tech is taking over the insights industry and what role is there for human researchers.
by Michael Wagstaff 12 May 2025
Does curating a personal brand make a difference? In this article, I consider my own lack of brand image and reflect on what might have been.
by Michael Wagstaff 9 May 2025
If surveys are ill-suited for addressing the questions that remain genuinely unanswered and unnecessary for those that are readily answered, then what exactly is their future?
by Michael Wagstaff 8 May 2025
In an effort to improve quality, the market research industry's current focus is on survey length and question engagement. This overlooks a more fundamental issue: respondent authenticity.
by Michael Wagstaff 7 May 2025
The true value of review sites lies in going beyond the stars and analysing what reviewers are actually saying.
by Michael Wagstaff 28 April 2025
Predictive analytics is reshaping brand insights by moving beyond descriptive dashboards to proactive decision-making
by Michael Wagstaff 28 April 2025
The uncovering of large scale survey fraud has shaken the market research industry. In this article, we discuss how a much more rigorous panel sign up procedure is needed to reduce the impact of fraudulent respondents.
by Michael Wagstaff 8 April 2025
The huge volume of data available through consumer comments, reviews and surveys can make cutting through the noise difficult. In this article we discuss how text analytics combined with human expertise can uncover the insight.
by Michael Wagstaff 10 March 2025
Market research agencies are going all in on AI based models to generate next level consumer insight. But are these just more illusion than substance?
by Michael Wagstaff 3 March 2025
With the online survey already on the ropes due to poor quality, has data science finished it off?
Show More