UNATE: UNsupervised ATomic Embedding for crystal structures property prediction
Laura Solà-Garcia, Àlex Solé, Javier Ruiz-Hidalgo
arXiv:2605.25866·cs.LG·Published 2026-05-25
Accurately predicting crystal properties is critical for accelerating materials discovery, but it is often limited by scarce labeled data and costly theoretical calculations. To alleviate this, we propose UNATE (Unsupervised Atomic Embedding), a framework that leverages structural information extracted from unlabeled crystal structures. UNATE integrates an unsupervised denoising autoencoder with self-supervised contrastive learning to learn robust atomic representations, which are then used as input features for downstream property prediction. Experimental results show that replacing raw atomic numbers with UNATE-pretrained node embeddings yields a 2.7\% improvement over the full-data baseline. Notably, the benefits become more pronounced in scenarios with limited labeled data, reaching improvements of up to 10\% when only 25\% of the labeled data is used.
TopicsLarge Language Models & Materials
Tagsmaterials-discovery property-prediction
arXiv categoriescs.LG, cond-mat.mtrl-sci, physics.class-ph
arXiv abstract pagePDF