Context-Guided Diffusion for Out-of-Distribution Molecular and Protein Design

Leo Klarner, Tim G. J. Rudner, Garrett M. Morris, Charlotte M. Deane, Yee Whye Teh

arXiv:2407.11942·q-bio.BM·Published 2024-07-16

Generative models have the potential to accelerate key steps in the discovery of novel molecular therapeutics and materials. Diffusion models have recently emerged as a powerful approach, excelling at unconditional sample generation and, with data-driven guidance, conditional generation within their training domain. Reliably sampling from high-value regions beyond the training data, however, remains an open challenge -- with current methods predominantly focusing on modifying the diffusion process itself. In this paper, we develop context-guided diffusion (CGD), a simple plug-and-play method that leverages unlabeled data and smoothness constraints to improve the out-of-distribution generalization of guided diffusion models. We demonstrate that this approach leads to substantial performance gains across various settings, including continuous, discrete, and graph-structured diffusion processes with applications across drug discovery, materials science, and protein design.

TopicsGenerative Design & Molecule Optimization, Protein & Biomolecules

Tagsdiffusion-model drug-discovery generative-model materials-science protein-structure

arXiv categoriesq-bio.BM, cs.LG, stat.ML

arXiv abstract pagePDF