A Simple and Scalable Kernel Density Approach for Reliable Uncertainty Quantification in Atomistic Machine Learning

Daniel Willimetz, Lukáš Grajciar

arXiv:2508.14613·physics.chem-ph·Published 2025-08-20

Machine learning models are increasingly used to predict material properties and accelerate atomistic simulations, but the reliability of their predictions depends on the representativeness of the training data. We present a scalable, GPU-accelerated uncertainty quantification framework based on $k$-nearest-neighbor kernel density estimation (KDE) in a PCA-reduced descriptor space. This method efficiently detects sparsely sampled regions in large, high-dimensional datasets and provides a transferable, model-agnostic uncertainty metric without requiring retraining costly model ensembles. The framework is validated across diverse case studies varying in: i) chemistry, ii) prediction models (including foundational neural network), iii) descriptors used for KDE estimation, and iv) properties whose uncertainty is sought. In all cases, the KDE-based score reliably flags extrapolative configurations, correlates well with conventional ensemble-based uncertainties, and highlights regions of reduced prediction trustworthiness. The approach offers a practical route for improving the interpretability, robustness, and deployment readiness of ML models in materials science.

TopicsGenerative Models & Discovery

Tagsuncertainty-quantification

arXiv categoriesphysics.chem-ph, cond-mat.mtrl-sci

arXiv abstract page PDF