Crystalite: A Lightweight Transformer for Efficient Crystal Modeling
Tin Hadži Veljković, Joshua Rosenthal, Ivor Lončarić, Jan-Willem van de Meent
arXiv:2604.02270·cs.LG·Published 2026-04-02
Generative models for crystalline materials often rely on equivariant graph neural networks, which capture geometric structure well but are costly to train and slow to sample. We present Crystalite, a lightweight diffusion Transformer for crystal modeling built around two simple inductive biases. The first is Subatomic Tokenization, a compact chemically structured atom representation that replaces high-dimensional one-hot encodings and is better suited to continuous diffusion. The second is the Geometry Enhancement Module (GEM), which injects periodic minimum-image pair geometry directly into attention through additive geometric biases. Together, these components preserve the simplicity and efficiency of a standard Transformer while making it better matched to the structure of crystalline materials. Crystalite achieves state-of-the-art results on crystal structure prediction benchmarks, and de novo generation performance, attaining the best S.U.N. discovery score among the evaluated baselines while sampling substantially faster than geometry-heavy alternatives.
TopicsGenerative Design & Molecule Optimization, Large Language Models & Materials, Molecular Representation & Learning
Tagscrystal-structure drug-discovery generative-model gnn structure-prediction
arXiv categoriescs.LG, cs.AI
arXiv abstract pagePDF