Taming Multi-Domain, -Fidelity Data: Towards Foundation Models for Atomistic Scale Simulations
Tomoya Shiota, Kenji Ishihara, Tuan Minh Do, Toshio Mori, Wataru Mizukami
arXiv:2412.13088·physics.chem-ph·Published 2024-12-17·Updated 2025-11-10
Machine learning interatomic potentials (MLIPs) are changing atomistic simulations in the field of chemistry and materials science. However, constructing a single universal MLIP that can accurately model molecular and crystalline systems remains challenging. A central obstacle is the integration of diverse datasets generated under different computational conditions. We present Total Energy Alignment (TEA), which is an approach that enables the seamless integration of heterogeneous quantum chemical datasets without redundant calculations. Using TEA, we trained MACE-Osaka24, the first open-source MLIP model based on a unified dataset covering molecular and crystalline systems. This universal model displays strong performances across diverse chemical systems, exhibiting similar or improved accuracies in predicting organic reaction barriers compared to those of specialized models, while effectively maintaining state-of-the-art accuracies for inorganic systems. These advancements pave the way for accelerated discoveries in the fields of chemistry and materials science via genuine foundation models for chemistry.
TopicsMachine-Learned Potentials for Sulfides and Minerals
Tagsmace mlip
arXiv categoriesphysics.chem-ph, cond-mat.mtrl-sci
arXiv abstract pagePDF