QComp: A QSAR-Based Data Completion Framework for Drug Discovery

Bingjia Yang, Yunsie Chung, Archer Y. Yang, Bo Yuan, Xiang Yu

arXiv:2405.11703·cs.LG·Published 2024-05-20

In drug discovery, in vitro and in vivo experiments reveal biochemical activities related to the efficacy and toxicity of compounds. The experimental data accumulate into massive, ever-evolving, and sparse datasets. Quantitative Structure-Activity Relationship (QSAR) models, which predict biochemical activities using only the structural information of compounds, face challenges in integrating the evolving experimental data as studies progress. We develop QSAR-Complete (QComp), a data completion framework to address this issue. Based on pre-existing QSAR models, QComp utilizes the correlation inherent in experimental data to enhance prediction accuracy across various tasks. Moreover, QComp emerges as a promising tool for guiding the optimal sequence of experiments by quantifying the reduction in statistical uncertainty for specific endpoints, thereby aiding in rational decision-making throughout the drug discovery process.

TopicsProperty Prediction & ADMET, Protein & Biomolecules

Tagsdrug-discovery

arXiv categoriescs.LG

arXiv abstract pagePDF