Boltzmann Graph Ensemble Embeddings for Aptamer Libraries
Starlika Bauskar, Jade Jiao, Narayanan Kannan, Alexander Kimm, Justin M. Baker, Matthew J. Tyler, Andrea L. Bertozzi, Anne M. Andrews
arXiv:2510.21980·cs.LG·Published 2025-10-24
Machine-learning methods in biochemistry commonly represent molecules as graphs of pairwise intermolecular interactions for property and structure predictions. Most methods operate on a single graph, typically the minimal free energy (MFE) structure, for low-energy ensembles (conformations) representative of structures at thermodynamic equilibrium. We introduce a thermodynamically parameterized exponential-family random graph (ERGM) embedding that models molecules as Boltzmann-weighted ensembles of interaction graphs. We evaluate this embedding on SELEX datasets, where experimental biases (e.g., PCR amplification or sequencing noise) can obscure true aptamer-ligand affinity, producing anomalous candidates whose observed abundance diverges from their actual binding strength. We show that the proposed embedding enables robust community detection and subgraph-level explanations for aptamer ligand affinity, even in the presence of biased observations. This approach may be used to identify low-abundance aptamer candidates for further experimental evaluation.
TopicsProtein & Biomolecules
Tagsfree-energy
arXiv categoriescs.LG, math.PR, q-bio.QM, stat.ML
arXiv abstract pagePDF