CAP: Commutative Algebra Prediction of Protein-Nucleic Acid Binding Affinities

Mushal Zia, Faisal Suwayyid, Yuta Hozumi, JunJie Wee, Hongsong Feng, Guo-Wei Wei

arXiv:2510.22130·q-bio.QM·Published 2025-10-25

An accurate prediction of protein-nucleic acid binding affinity is vital for deciphering genomic processes, yet existing approaches often struggle in reconciling high accuracy with interpretability and computational efficiency. In this study, we introduce commutative algebra prediction (CAP), which couples persistent Stanley-Reisner theory with advanced sequence embedding for predicting protein-nucleic acid binding affinities. CAP encodes proteins through transformer-learned embeddings that retain long-range evolutionary context and represents DNA and RNA with $\textit{k}$-mer algebra embeddings derived from persistent facet ideals, which capture fine-scale nucleotide geometry. We demonstrate that CAP surpasses the SVSBI protein-nucleic acid benchmark and, in a further test, maintains reasonable performance on newly curated protein-RNA and protein-nucleic acid datasets. Leveraging only primary sequences, CAP generalizes to any protein-nucleic acid pair with minimal preprocessing, enabling genome-scale analyses without 3D structural data and promising faster virtual screening for drug discovery and protein engineering.

TopicsProperty Prediction & ADMET, Protein & Biomolecules

Tagsdrug-discovery protein-ligand

arXiv categoriesq-bio.QM

arXiv abstract pagePDF