Graphlet Histogram Representation Database of Inorganic Crystals

Aaditya Panigrahi, Yanjun Liu, Omri Lesser, Krishnanand Mallayya, Eun-Ah Kim

arXiv:2606.10195·cond-mat.mtrl-sci·Published 2026-06-08

Machine learning models for materials property prediction increasingly rely on representations learned end-to-end from large density-functional-theory databases, limiting their applicability when only scarce experimental data are available. Domain-knowledge-driven representations precomputed from crystal structures alone offer a data-efficient, interpretable alternative, but existing approaches capture at most composition or bonding connectivity and discard local structural geometry. Here, we present Graphlet-MP, a database of graphlet histogram representations for 149,082 inorganic crystals from the Materials Project (MP). Seventy-nine distributions describe each material over three hierarchical graphlet orders: atomic sites, bonded pairs, and bond-angle triplets, extracted via screened Voronoi tessellation from the crystallographic information file. We provide a complete technical specification of the representation, an Earth Mover's Distance metric for comparing materials in this space, and the full precomputed database. An accompanying open-source codebase enables users to generate graphlet histograms for arbitrary crystal structures, including experimentally determined ones, and to extend the database to new materials or target properties.

TopicsQuantum Chemistry & Force Fields

Tagsproperty-prediction

arXiv categoriescond-mat.mtrl-sci, physics.comp-ph

arXiv abstract pagePDF