Steering Generative Models for Protein Design: Aligning and Conditioning Strategies

Filippo Stocco, Michele Garibbo, Noelia Ferruz

arXiv:2511.21476·q-bio.BM·Published 2025-11-26·Updated 2026-02-26

Generative artificial intelligence models learn probability distributions from data and produce novel samples that capture the salient properties of their training sets. Proteins are particularly attractive for such approaches given their abundant data and the versatility of their representations, ranging from sequences to structures and functions. This versatility has motivated the rapid development of generative models for protein design, enabling the generation of functional proteins and enzymes with unprecedented success. However, because these models mirror their training distribution, they tend to sample from its most probable modes, while low-probability regions, often encoding valuable properties, remain underexplored. To address this challenge, recent work has proposed strategies for steering generative models toward user-specified properties. In this review, we survey and categorize these strategies, distinguishing approaches that modify model parameters, such as reinforcement learning or supervised fine-tuning, from those that keep the model's parameters fixed, including conditional generation, retrieval-augmented strategies, Bayesian guidance, and tailored sampling methods. Together, these developments are beginning to enable the steering of generative models toward proteins with desired properties.

TopicsGenerative Design & Molecule Optimization, Protein & Biomolecules

Tagsgenerative-model protein-structure reinforcement-learning

arXiv categoriesq-bio.BM

arXiv abstract page PDF