Training a generalizable diffusion model for seismic data processing using a large-scale open-source waveform dataset

Xinyue Gong, Sergey Fomel, Yangkang Chen

arXiv:2603.13645·physics.geo-ph·Published 2026-03-13

We introduce the Seismic Waveforms dataset for Automatic Neural-network processing (SWAN), a comprehensive and standardized benchmark designed to advance data-driven seismic signal processing. SWAN aggregates diverse synthetic and real seismic waveforms spanning a wide range of geological structures, noise conditions, propagation environments, and acquisition geometries, providing a unified foundation for training highly generalizable models. Leveraging this dataset, we develop and evaluate a conditionally constrained residual diffusion model for core seismic processing tasks, focusing on missing-trace reconstruction. Extensive experiments demonstrate that diffusion models trained on SWAN achieve state-of-the-art performance across heterogeneous testing scenarios, outperforming leading deep-learning and physics-based baselines on both synthetic benchmarks and field data examples. The results highlight SWAN's value as both a scalable training corpus and a rigorous evaluation framework, and illustrate the strong potential of diffusion-based architectures for robust, generalizable seismic data processing.

TopicsGenerative Models & Discovery

Tagsdiffusion-models

arXiv categoriesphysics.geo-ph

arXiv abstract page PDF