Data-driven active learning approaches for accelerating materials discovery

Jiaxin Chen, Tianjiao Wan, Hui Geng, Liang Xiong, Guohong Wang, Yihan Zhao, Longxiang Deng, Zijian Gao, Susu Fang, Zheng Luo, Huaimin Wang, Shanshan Wang, Kele Xu

arXiv:2601.06971·cond-mat.mtrl-sci·Published 2026-01-11·Updated 2026-02-01

Materials discovery is a cornerstone of modern technological advancement, yet it remains constrained by traditional trial-and-error paradigms and the inherent bias of human intuition. Artificial intelligence (AI) has emerged as a transformative tool in materials science by effectively modeling structure-property relationships. Despite substantial efforts to enhance model expressiveness, data efficiency remains an equally critical challenge, given the limited availability of experimental and computational resources. Active learning (AL), as a data-driven machine learning paradigm, has shown great promise for discovering novel materials and enabling the efficient navigation of vast materials spaces. In this review, we follow the evolution of sampling strategy design techniques in AL, from Bayesian optimization to advanced deep learning-based strategies. We then highlight how AL enhances data efficiency across various data regimes, ranging from task-specific settings with limited data to the development of general-purpose datasets and large-scale models. We further provide a systematic overview of AL applications throughout the materials research pipeline, including computational simulation, composition and structural design, process optimization, and self-driving laboratory systems. Finally, we pinpoint key challenges and future perspectives of AL in materials discovery.

TopicsMaterials Science & Condensed Matter

Tagsmaterials-discovery

arXiv categoriescond-mat.mtrl-sci

arXiv abstract pagePDF