Dynamical Decoupling of Generalization and Overfitting in Large Two-Layer Networks
Andrea Montanari, Pierfrancesco Urbani
arXiv:2502.21269·stat.ML·Published 2025-02-28·Updated 2025-10-29
Understanding the inductive bias and generalization properties of large overparametrized machine learning models requires to characterize the dynamics of the training algorithm. We study the learning dynamics of large two-layer neural networks via dynamical mean field theory, a well established technique of non-equilibrium statistical physics. We show that, for large network width $m$, and large number of samples per input dimension $n/d$, the training dynamics exhibits a separation of timescales which implies: $(i)$~The emergence of a slow time scale associated with the growth in Gaussian/Rademacher complexity of the network; $(ii)$~Inductive bias towards small complexity if the initialization has small enough complexity; $(iii)$~A dynamical decoupling between feature learning and overfitting regimes; $(iv)$~A non-monotone behavior of the test error, associated `feature unlearning' regime at large times.
TopicsScientific Machine Learning & PINNs
Tagsinductive-bias
arXiv categoriesstat.ML, cond-mat.dis-nn, cs.LG
arXiv abstract pagePDF