Developing a Machine-Learning Interatomic Potential for Non-Covalent Interactions in Proteins
Lejia Zeng, Xintong Zhang, Yuchan Pei, Lifeng Zhao, Lan Hua, Jincai Yang, Niu Huang
arXiv:2601.11628·physics.chem-ph·Published 2026-01-14·Updated 2026-01-27
Machine learning interatomic potentials (MLIPs) enable efficient modeling of molecular interactions with quantum mechanical (QM) accuracy. However, constructing robust and representative training datasets that capture subtle, system-specific interaction motifs remains challenging. We introduce PANIP (PAirwise Non-covalent Interaction Potential), an ensemble MLIP model built upon the NequIP framework and trained on non-covalent interactions (NCIs) between protein-derived fragments. PANIP is trained using an automated multi-fidelity active learning (MFAL) workflow, in which a representative training subset, termed PDB-FRAGID (PDB Fragment Interaction Dataset), was distilled from an otherwise prohibitively large pool of fragment dimers extracted from the Protein Data Bank (PDB). PANIP retains $ω$B97X-D3BJ/def2-TZVPP-level accuracy and achieves mean absolute errors below 0.2 kcal/mol on out-of-distribution systems, demonstrating excellent transferability across diverse NCI motifs. Compared to the widely used ANI-2x potential, PANIP delivers substantially lower errors, particularly for charged and strongly interacting dimers. Coupled with a fragmentation-based energy decomposition scheme, PANIP estimates protein-ligand binding energies at near force-field computational cost yet QM-level accuracy, enabling its use as a fragment-based scoring function that rivals specialized docking scoring functions.
TopicsMaterials Science & Condensed Matter
Tagsmachine-learning-interatomic-potentials mlip
arXiv categoriesphysics.chem-ph
arXiv abstract pagePDF