ColliderML: The First Release of an OpenDataDetector High-Luminosity Physics Benchmark Dataset
Doğa Elitez, Paul Gessinger, Daniel Murnane, Marcus Selchou Raaholt, Andreas Salzburger, Stine Kofoed Skov, Andreas Stefl, Anna Zaborowska
arXiv:2512.15230·hep-ex·Published 2025-12-17
We introduce ColliderML - a large, open, experiment-agnostic dataset of fully simulated and digitised proton-proton collisions in High-Luminosity Large Hadron Collider conditions ($\sqrt{s}=14$ TeV, mean pile-up $μ= 200$). ColliderML provides one million events across ten Standard Model and Beyond Standard Model processes, plus extensive single-particle samples, all produced with modern next-to-leading order matrix element calculation and showering, realistic per-event pile-up overlay, a validated OpenDataDetector geometry, and standard reconstructions. The release fills a major gap for machine learning (ML) research on detector-level data, provided on the ML-friendly Hugging Face platform. We present physics coverage and the generation, simulation, digitisation and reconstruction pipeline, describe format and access, and initial collider physics benchmarks.
TopicsParticle & High Energy Physics
Tagscollider-physics
arXiv categorieshep-ex, cs.LG, physics.data-an, physics.ins-det
arXiv abstract pagePDF