Fascinating Fakery
Is synthetic data the panacea for an industry struggling with data quality issues?

Market research is caught in a relentless race towards speed, affordability and scalability, driven by the increasingly urgent demands of modern business. However, this pursuit comes at a price: data quality. The industry today is littered with poorly engaged respondents, bots, rushed completions and outright fraudulent entries. Now, generative AI has entered the room, capable of creating convincing "synthetic respondents."
This begs a critical question: should market researchers embrace AI-generated responses or is this a Pandora's box best left unopened?
The data quality struggle
Traditional surveys have long struggled with data quality. Speeders who race through surveys without thoughtful consideration, straight-liners who select the same answer repetitively and random clickers who choose options without engaging, are not just annoyances, they actively undermine insight integrity. Many panellists are driven by incentives, turning data collection into a transactional, rather than reflective, exercise. Add to this the widespread issue of survey fatigue, exacerbated by lengthy, tedious or poorly designed questionnaires and the conditions are ripe for low-quality responses. These concerns become especially evident in open-ended questions, where copy-pasted answers and shallow insights often dominate. Before even considering AI-generated text, much of the qualitative input we currently receive is already alarmingly thin.
Enter generative AI
Enter generative AI, presenting itself as both intriguing and potentially problematic. Synthetic respondents are essentially AI-generated outputs that simulate human responses. These virtual participants can convincingly articulate answers to survey questions, extracted from extensive language models trained on vast datasets of human-produced text. Academia and commercial businesses have already begun experimenting with this technology, attracted by its substantial promise: faster turnaround times, dramatically reduced costs and the ability to simulate niche or difficult-to-reach populations effortlessly. In addition, synthetic respondents offer the unique ability to rigorously scenario-test surveys, ensuring robustness before deployment to real-world audiences.
Ethical minefield and bias
However, synthetic data represents an ethical and methodological minefield. Central among these issues is validity: can an AI truly replicate nuanced human behaviour, complete with emotional responses, subtle biases and unpredictable cognitive patterns? AI-generated outputs might sound plausible but they lack genuine lived experiences and authentic emotional engagement. Furthermore, transparency becomes paramount. Shouldn't clients and stakeholders be explicitly informed when synthetic data has influenced - or replaced - real participant responses? Without transparency, trust and credibility in market research could be severely compromised.
Bias amplification poses another significant risk. AI models inherit and reflect the biases present within their training datasets. If unchecked, synthetic respondents might unwittingly perpetuate or even exacerbate existing stereotypes, inadvertently shaping business strategies around fundamentally flawed insights. Equally troubling is the risk of overconfidence. Beautifully constructed but ultimately ungrounded responses can create a dangerous illusion of insight, potentially leading researchers down misguided strategic paths.
Legitimate uses
There are, however, legitimate and constructive uses for synthetic data, provided clear boundaries are maintained. Pre-testing survey logic or refining wording is an ideal candidate, offering a practical, low-risk entry point into synthetic applications. AI-generated responses can also be beneficial for creating "expected" outputs that train or validate analytical models. Additionally, synthetic respondents can spur creativity and idea generation in the exploratory phases of qualitative research.
But there are clear ethical lines that must not be crossed. Replacing genuine respondents entirely in quantitative studies is deeply problematic, undermining the very essence of market research - authentic human understanding. Equally concerning is mixing real and synthetic data without disclosure, a practice that risks deceit and erodes stakeholder trust. Using synthetic data to artificially "validate" findings also threatens methodological integrity, distorting reality and potentially leading to misguided business decisions.
The implications for the market research industry
The rise of synthetic data carries profound implications for market research professionals. AI literacy is increasing rapidly. The industry needs to take stock and think about how to critically assess AI-generated outputs, distinguishing between genuine human insight and artificially intelligent mimicry. Data quality needs reframing: beyond merely eliminating poor respondents, researchers must now rigorously question the origins and authenticity of every data point. This evolution necessitates updated industry standards and ethical guidelines, explicitly addressing the legitimate use and disclosure of synthetic data.
While caution is essential, outright dismissal of synthetic respondents should be avoided - for the moment at least. Generative AI undoubtedly holds potential for innovation and efficiency in market research. The challenge lies in harnessing this potential responsibly. The industry should proceed with a balanced mix of curiosity and vigilance, prioritising transparency, rigorous methodology and the preservation of trust above all else.