SynPAT: A System for Generating Synthetic Physical Theories with Data

Jonathan Lenchner, Karan Srivastava, Joao Goncalves, Mark Squillante, Lior Horesh

arXiv:2505.00878·cs.SC·Published 2025-05-01·Updated 2026-02-04

Machine-assisted methods for discovering physical laws from background theory and data have recently emerged, promising to advance our understanding of the physical world. However, training and benchmarking these systems remains challenging: real physical theories are limited in number. To address this need, we introduce SynPAT, a system for generating synthetic physical theories with accompanying data. SynPAT produces: (i) a consistent set of axioms forming a synthetic theory, (ii) a symbolic consequence of these axioms representing the discovery target, and (iii) noisy data approximating this consequence. Crucially, to mirror historically incorrect theories (e.g., Newtonian mechanics before Special Relativity), SynPAT can also generate theories whose axioms do not strictly entail, and in fact conflict with, the observed consequence, requiring a correction to the assumed axioms to bridge the gap. We detail SynPAT's methodology and benchmark several open-source symbolic regression systems on our generated theories and data.

TopicsGenerative Models & Discovery

Tagssymbolic-regression

arXiv categoriescs.SC

arXiv abstract pagePDF