Evaluating machine learning models for predicting pesticide toxicity to honey bees
Jakub Adamczyk, Jakub Poziemski, Pawel Siedlecki
arXiv:2503.24305·cs.LG·Published 2025-03-31·Updated 2026-01-09
Small molecules play a critical role in the biomedical, environmental, and agrochemical domains, each with distinct physicochemical requirements and success criteria. Although biomedical research benefits from extensive datasets and established benchmarks, agrochemical data remain scarce, particularly with respect to species-specific toxicity. This work focuses on ApisTox, the most comprehensive dataset of experimentally validated chemical toxicity to the honey bee (\textit{Apis mellifera}), an ecologically vital pollinator. The primary goal of this study was to determine the suitability of diverse machine learning approaches for modeling such toxicity, including molecular fingerprints, graph kernels, and graph neural networks, as well as pretrained models. Comparative analysis with medicinal datasets from the MoleculeNet benchmark reveals that ApisTox represents a distinct chemical space. Performance degradation on non-medicinal datasets, such as \mbox{ApisTox}, demonstrates their limited generalizability of current state-of-the-art algorithms trained solely on biomedical data. Our study highlights the need for more diverse datasets and for targeted model development geared toward the agrochemical domain.
TopicsMolecular Representation & Learning
Tagschemical-space gnn molecular-representation
arXiv categoriescs.LG, cs.AI
arXiv abstract pagePDF