Large Language Model Assisted Discovery of Optimal Dopants for Enhanced Thermoelectric Performance in CoSb$_3$ Based Skutterudites

Yagnik Bandyopadhyay, Dylan Noel Serrao, Houlong L. Zhuang

arXiv:2604.06048·cond-mat.mtrl-sci·Published 2026-04-07

We present a data-driven approach for accelerating the discovery of high-performance CoSb$_3$-based skutterudites by curating a comprehensive dataset of compositions with various filler elements from over 300 research articles. Leveraging large language models (LLMs), we extract and embed compositional representations, which are then used to train a regression head for predicting thermoelectric figure of merit. Compared to traditional deep neural networks relying on elemental descriptors such as atomic radii, our LLM-based model achieves significantly lower mean-squared error losses. We further employ the trained model to propose novel filler compositions with promising thermoelectric properties. Finally, we support these predicted candidates through density functional theory and molecular dynamics calculations to assess their electrical and thermal conductivity. This data-driven approach demonstrates the potential of combining natural language processing, machine learning, and quantum simulations for thermoelectric materials design.

TopicsProperty Prediction & QSPR

Tagsmolecular-dynamics thermal-conductivity

arXiv categoriescond-mat.mtrl-sci

arXiv abstract pagePDF