A Simple and Scalable Kernel Density Approach for Reliable Uncertainty Quantification in Atomistic Machine Learning
Daniel Willimetz, Lukáš Grajciar
arXiv:2508.14613·physics.chem-ph·Published 2025-08-20
Machine learning models are increasingly used to predict material properties and accelerate atomistic simulations, but the reliability of their predictions depends on the representativeness of the training data. We present a scalable, GPU-accelerated uncertainty quantification framework based on $k$-nearest-neighbor kernel density estimation (KDE) in a PCA-reduced descriptor space. This method efficiently detects sparsely sampled regions in large, high-dimensional datasets and provides a transferable, model-agnostic uncertainty metric without requiring retraining costly model ensembles. The framework is validated across diverse case studies varying in: i) chemistry, ii) prediction models (including foundational neural network), iii) descriptors used for KDE estimation, and iv) properties whose uncertainty is sought. In all cases, the KDE-based score reliably flags extrapolative configurations, correlates well with conventional ensemble-based uncertainties, and highlights regions of reduced prediction trustworthiness. The approach offers a practical route for improving the interpretability, robustness, and deployment readiness of ML models in materials science.
TopicsGenerative Models & Discovery
Tagsuncertainty-quantification
arXiv categoriesphysics.chem-ph, cond-mat.mtrl-sci
arXiv abstract pagePDF