The bliss of dimensionality: how an unsupervised criterion identifies optimal low-resolution representations of high-dimensional datasets

Margherita Mele, Daniel Campos Moreno, Raffaello Potestio

arXiv:2603.05214·cond-mat.stat-mech·Published 2026-03-05

Selecting the optimal resolution for discretizing high-dimensional data is a central problem in physics and data analysis, particularly in unsupervised settings where the underlying distribution is unknown. The Relevance-Resolution (Res-Rel) framework addresses this issue through an information-theoretic trade-off between descriptive detail and statistical reliability. Here we provide a systematic validation of this approach by comparing its characteristic optima--maximum relevance and the -1 slope (information-theoretic) point--with the discretization that minimizes the Kullback-Leibler divergence from a known or physically motivated ground truth distribution. Across unstructured and structured synthetic datasets, Gaussian clones of MNIST, and molecular dynamics simulations of the alanine dipeptide, we find that as the dimensionality or informative content increases the KL-optimal discretization consistently lies within the Res-Rel optimality region. Furthermore, in high-dimensional regimes the -1 slope criterion closely matches the KL divergence minimum. These results establish the quantitative consistency of unsupervised information-theoretic selection with distribution-based optimality.

TopicsQuantum Chemistry & Force Fields

Tagsmolecular-dynamics

arXiv categoriescond-mat.stat-mech

arXiv abstract pagePDF