Thermodynamic Descriptors from Molecular Dynamics as Machine Learning Features for Extrapolable Property Prediction

Nuria H. Espejo, Pablo Llombart, Andrés González de Castilla, Jorge Ramirez, Jorge R. Espinosa, Adiran Garaizar

arXiv:2603.12017·physics.chem-ph·Published 2026-03-12·Updated 2026-03-18

The limited extrapolative power of structure-based machine learning (ML) models is a critical bottleneck in chemical discovery, particularly for industrial R&D, where navigating uncharted chemical space to find next-generation materials or drugs is paramount. These models, reliant on structural descriptors or graph neural networks (GNNs), often fail when predicting properties for molecules with novel chemotypes. Here, we introduce a physics-augmented ML framework that overcomes this limitation. Our approach replaces conventional structural inputs with thermodynamic properties such as cohesive energy, heat of vaporization, and density, derived directly from molecular dynamics (MD) simulations. While performing comparably to structure-based models on known organic compounds, our method uniquely maintains low error when extrapolating to dissimilar chemical spaces. Crucially, it accurately predicts boiling points for entire chemical classes absent from the training set, including inorganic compounds, salts, and molecules with elements such as Si, B, and Te. By learning from the intermolecular forces that govern phase transitions, our framework provides a more fundamental and generalizable strategy for molecular property prediction, enabling chemical exploration beyond established structural domains.

TopicsMolecular Representation & Learning, Property Prediction & ADMET, Quantum Chemistry & Force Fields

Tagschemical-space gnn molecular-dynamics phase-transition property-prediction

arXiv categoriesphysics.chem-ph

arXiv abstract pagePDF