SOLD: SELFIES-based Objective-driven Latent Diffusion

Elbert Ho

arXiv:2509.25198·cs.LG·Published 2025-09-03

Recently, machine learning has made a significant impact on de novo drug design. However, current approaches to creating novel molecules conditioned on a target protein typically rely on generating molecules directly in the 3D conformational space, which are often slow and overly complex. In this work, we propose SOLD (SELFIES-based Objective-driven Latent Diffusion), a novel latent diffusion model that generates molecules in a latent space derived from 1D SELFIES strings and conditioned on a target protein. In the process, we also train an innovative SELFIES transformer and propose a new way to balance losses when training multi-task machine learning models.Our model generates high-affinity molecules for the target protein in a simple and efficient way, while also leaving room for future improvements through the addition of more data.

TopicsRepresentation & Foundation Models

Tagsdiffusion-model selfies

arXiv categoriescs.LG

arXiv abstract pagePDF