Local-Global Multimodal Contrastive Learning for Molecular Property Prediction

Xiayu Liu, Zhengyi Lu, Yunhong Liao, Chan Fan, Hou-biao Li

arXiv:2601.22610·cs.LG·Published 2026-01-30

Accurate molecular property prediction requires integrating complementary information from molecular structure and chemical semantics. In this work, we propose LGM-CL, a local-global multimodal contrastive learning framework that jointly models molecular graphs and textual representations derived from SMILES and chemistry-aware augmented texts. Local functional group information and global molecular topology are captured using AttentiveFP and Graph Transformer encoders, respectively, and aligned through self-supervised contrastive learning. In addition, chemically enriched textual descriptions are contrasted with original SMILES to incorporate physicochemical semantics in a task-agnostic manner. During fine-tuning, molecular fingerprints are further integrated via Dual Cross-attention multimodal fusion. Extensive experiments on MoleculeNet benchmarks demonstrate that LGM-CL achieves consistent and competitive performance across both classification and regression tasks, validating the effectiveness of unified local-global and multimodal representation learning.

TopicsMolecular Representation & Learning, Property Prediction & ADMET

Tagsmolecular-representation property-prediction

arXiv categoriescs.LG, cs.AI

arXiv abstract page PDF