Contrastive Multi-Task Learning with Solvent-Aware Augmentation for Drug Discovery

Jing Lan, Hexiao Ding, Hongzhao Chen, Yufeng Jiang, Nga-Chun Ng, Gerald W. Y. Cheng, Zongxi Li, Jing Cai, Liang-ting Lin, Jung Sun Yoo

arXiv:2508.01799·q-bio.BM·Published 2025-08-03·Updated 2025-08-27

Accurate prediction of protein-ligand interactions is essential for computer-aided drug discovery. However, existing methods often fail to capture solvent-dependent conformational changes and lack the ability to jointly learn multiple related tasks. To address these limitations, we introduce a pre-training method that incorporates ligand conformational ensembles generated under diverse solvent conditions as augmented input. This design enables the model to learn both structural flexibility and environmental context in a unified manner. The training process integrates molecular reconstruction to capture local geometry, interatomic distance prediction to model spatial relationships, and contrastive learning to build solvent-invariant molecular representations. Together, these components lead to significant improvements, including a 3.7% gain in binding affinity prediction, an 82% success rate on the PoseBusters Astex docking benchmarks, and an area under the curve of 97.1% in virtual screening. The framework supports solvent-aware, multi-task modeling and produces consistent results across benchmarks. A case study further demonstrates sub-angstrom docking accuracy with a root-mean-square deviation of 0.157 angstroms, offering atomic-level insight into binding mechanisms and advancing structure-based drug design.

TopicsProperty Prediction & ADMET, Protein & Biomolecules

Tagsdrug-discovery molecular-representation protein-ligand

arXiv categoriesq-bio.BM, cs.AI, cs.LG

arXiv abstract page PDF