Combining feature-based approaches with graph neural networks and symbolic regression for synergistic performance and interpretability

Rogério Almeida Gouvêa, Pierre-Paul De Breuck, Tatiane Pretto, Gian-Marco Rignanese, Marcos José Leite Santos

arXiv:2509.03547·cond-mat.mtrl-sci·Published 2025-09-02·Updated 2025-09-05

This study introduces MatterVial, an innovative hybrid framework for feature-based machine learning in materials science. MatterVial expands the feature space by integrating latent representations from a diverse suite of pretrained graph neural network (GNN) models including: structure-based (MEGNet), composition-based (ROOST), and equivariant (ORB) graph networks, with computationally efficient, GNN-approximated descriptors and novel features from symbolic regression. Our approach combines the chemical transparency of traditional feature-based models with the predictive power of deep learning architectures. When augmenting the feature-based model MODNet on Matbench tasks, this method yields significant error reductions and elevates its performance to be competitive with, and in several cases superior to, state-of-the-art end-to-end GNNs, with accuracy increases exceeding 40% for multiple tasks. An integrated interpretability module, employing surrogate models and symbolic regression, decodes the latent GNN-derived descriptors into explicit, physically meaningful formulas. This unified framework advances materials informatics by providing a high-performance, transparent tool that aligns with the principles of explainable AI, paving the way for more targeted and autonomous materials discovery.

TopicsLarge Language Models & Materials, Molecular Representation & Learning

Tagsgnn materials-discovery materials-science

arXiv categoriescond-mat.mtrl-sci, cs.LG

arXiv abstract pagePDF