Mask prior-guided denoising diffusion improves inverse protein folding

Peizhen Bai, Filip Miljković, Xianyuan Liu, Leonardo De Maria, Rebecca Croasdale-Wood, Owen Rackham, Haiping Lu

arXiv:2412.07815·q-bio.BM·Published 2024-12-10·Updated 2025-07-25

Inverse protein folding generates valid amino acid sequences that can fold into a desired protein structure, with recent deep-learning advances showing strong potential and competitive performance. However, challenges remain, such as predicting elements with high structural uncertainty, including disordered regions. To tackle such low-confidence residue prediction, we propose a Mask-prior-guided denoising Diffusion (MapDiff) framework that accurately captures both structural information and residue interactions for inverse protein folding. MapDiff is a discrete diffusion probabilistic model that iteratively generates amino acid sequences with reduced noise, conditioned on a given protein backbone. To incorporate structural information and residue interactions, we develop a graph-based denoising network with a mask-prior pre-training strategy. Moreover, in the generative process, we combine the denoising diffusion implicit model with Monte-Carlo dropout to reduce uncertainty. Evaluation on four challenging sequence design benchmarks shows that MapDiff substantially outperforms state-of-the-art methods. Furthermore, the in silico sequences generated by MapDiff closely resemble the physico-chemical and structural characteristics of native proteins across different protein families and architectures.

TopicsProtein & Biomolecules

Tagsprotein-structure

arXiv categoriesq-bio.BM, cs.LG

arXiv abstract pagePDF