Composition-Weighted Symbolic Regression for General-Purpose Property Prediction

Yang Huang, Jingrun Chen

arXiv:2605.02267·cond-mat.mtrl-sci·Published 2026-05-04

We introduce a composition-weighted symbolic regression framework for interpretable prediction of materials properties directly from chemical composition. The method jointly learns analytical functional forms and task-dependent elemental weightings without predefined descriptors. By incorporating max/min operators, it naturally enforces constraints such as non-negative band gaps and bounded classification probabilities, unifying regression and classification tasks. Efficient search is achieved through a hybrid Monte Carlo tree search--genetic programming algorithm with gradient-based refinement and parallel computation. Benchmarks on MatBench tasks show competitive accuracy relative to state-of-the-art black-box models while yielding explicit analytical expressions. Applied to III--V semiconductor alloys, the model produces smooth composition-dependent trends and learned elemental weights with chemically meaningful periodic behavior. This framework provides a scalable and interpretable route for materials discovery and property screening.

TopicsMaterials Science & Condensed Matter

Tagsmaterials-discovery symbolic-regression

arXiv categoriescond-mat.mtrl-sci, physics.comp-ph

arXiv abstract pagePDF