Title: Symmetries in Overparametrized Neural Networks: A Mean-Field View
Abstract: We develop a Mean-Field (MF) view of the learning dynamics of overparametrized Artificial Neural Networks (NN) under distributional symmetries of the data w.r.t. the action of a general compact group G. To this end we consider a class of generalized shallow NNs given by an ensemble of N multi-layer units, jointly trained using stochastic gradient descent (SGD) and possibly symmetry-leveraging (SL) techniques, such as Data Augmentation (DA), Feature Averaging (FA) or Equivariant Architectures (EA). We introduce the notions of weakly and strongly invariant laws on the parameter space of each single unit, corresponding, respectively, to G-invariant distributions, and to distributions supported on parameters fixed by the group action (which encode EA). This allows us to define symmetric models compatible with taking N → ∞ and give an interpretation of the asymptotic dynamics of DA, FA and EA in terms of Wasserstein Gradient Flows describing their MF limits. We study in this setting the questions of attainability of a global population risk optimum for these different SL techniques and the relations between their limiting training dynamics. We illustrate the validity of our findings as N gets larger in a teacher-student experimental setting and discuss some practical implications for finite networks in the studied class. Based on joint work with Javier Maass.