A Unified Subject Map for 130 Years of Physics

Khoa Nguyen, Pragyan Pandey, Sophie Li, Eric Y. Ma

arXiv:2606.14043·physics.hist-ph·Published 2026-06-12

More than a century of physics is recorded in the American Physical Society (APS) archive, but the corpus cannot be analyzed as a single, time-resolved object because its subject metadata are fragmented across eras with no shared vocabulary. We close this gap by using a frontier large language model to retrospectively assign the modern Physics Subject Headings (PhySH) to the historical archive, yielding a unified subject map for every APS paper from 1893 to 2025. The resulting map not only reproduces century-scale disciplinary arcs but also resolves the fine-grained lifecycles of individual ideas, materials, techniques, and discoveries across a vocabulary of over 3,000 PhySH Concepts. The map turns a fragmented archive into a quantitative substrate for systematic search and for data-driven studies of how physics evolves.

TopicsLarge Language Models & Materials

arXiv categoriesphysics.hist-ph, physics.data-an, physics.soc-ph

arXiv abstract pagePDF