polyRETRO: a Language Model Approach to predict Polymerization Class and Monomer(s) for a Target Polymer

Sakshi Agarwal, Wei Xiong, Rampi Ramprasad

arXiv:2512.05138·cond-mat.soft·Published 2025-12-01

While machine learning has transformed polymer design by enabling rapid property prediction and candidate generation, translating these designs into experimentally realizable materials remains a critical challenge. Traditionally, the synthesis of target polymers has relied heavily on expert intuition and prior experience. The lack of automated retrosynthetic tools to assist chemists, limit the rapid practical impact of data-driven polymer discovery. To expedite lab-scale validation and beyond, we present a retrosynthetic framework that leverages large language models (LLMs) to guide polymer synthesis. Our approach, which we call polyRETRO, involves two key steps: 1) predicting the most likely polymerization reaction class of a target polymer and 2) identifying the underlying chemical transformation templates and the corresponding monomers, using primarily natural-language based constructs. This LLM-driven framework enables direct retrosynthetic analysis given just the target polymer SMILES string. polyRETRO constitutes a initial step towards a scalable, interpretable, and generalizable approach to bridge the gap between computational design and experimental synthesis.

Topicsvia:author-whitelist:polymer informatics, Polymer Genome

Tagsinverse-design polymerization

arXiv categoriescond-mat.soft, cond-mat.mtrl-sci

arXiv abstract pagePDF