Generative Chemical Language Models for Energetic Materials Discovery

Andrew Salij, R. Seaton Ullberg, Megan C. Davis, Marc J. Cawkwell, Christopher J. Snyder, Cristina Garcia Cardona, Ivana Matanovic, Wilton J. M. Kort-Kamp

arXiv:2604.03304·physics.chem-ph·Published 2026-03-30

The discovery of new energetic materials remains a pressing challenge hindered by limited availability of high-quality data. To address this, we have developed generative molecular language models that have been pretrained on extensive chemical data and then fine-tuned with curated energetic materials datasets. This transfer-learning strategy extends the chemical language model capabilities beyond the pharmacological space in which they have been predominantly developed, offering a framework applicable to other data-spare discovery problems. Furthermore, we discuss the benefits of fragment-based molecular encodings for chemical language models, in particular in constructing synthetically accessible structures. Together, these advances provide a foundation for accelerating the design of next-generation energetic materials with demanding performance requirements.

TopicsLarge Language Models & Materials

Tagschemical-llm molecular-llm

arXiv categoriesphysics.chem-ph, cond-mat.mtrl-sci, cs.AI, cs.CL, cs.LG

arXiv abstract pagePDF