Andrea Agazzi
Title: Clustering dynamics in mean-field models of transformers
Abstract
We consider a family of mean-field interacting particle systems modeling the layerwise evolution of information (represented as a set of “tokens”) in transformers, a common architecture used in deep learning. In this setting, tokens are interpreted as particles on the d-dimensional sphere, and their distribution evolves according to a Vlasov-type equation, where time plays the role of network depth. Numerical experiments reveal the tendency of these particle systems to organize into clustered/synchronized states, offering a potential explanation for how meaning emerges in these models. In this talk, I will introduce both deterministic and stochastic variants of these models and provide a rigorous characterization of this phenomenon.