Artificial Data, Real Insights: Evaluating Opportunities and Risks of Expanding the Data Ecosystem with Synthetic Data

Richard Timpone, Yongwei Yang

arXiv:2408.15260·cs.HC·Published 2024-08-10

Synthetic Data is not new, but recent advances in Generative AI have raised interest in expanding the research toolbox, creating new opportunities and risks. This article provides a taxonomy of the full breadth of the Synthetic Data domain. We discuss its place in the research ecosystem by linking the advances in computational social science with the idea of the Fourth Paradigm of scientific discovery that integrates the elements of the evolution from empirical to theoretic to computational models. Further, leveraging the framework of Truth, Beauty, and Justice, we discuss how evaluation criteria vary across use cases as the information is used to add value and draw insights. Building a framework to organize different types of synthetic data, we end by describing the opportunities and challenges with detailed examples of using Generative AI to create synthetic quantitative and qualitative datasets and discuss the broader spectrum including synthetic populations, expert systems, survey data replacement, and personabots.

TopicsGenerative Models & Discovery

Tagsscientific-discovery

arXiv categoriescs.HC, cs.CY, stat.ME

arXiv abstract pagePDF