Prof Aad van der Vaart (Delft in the Netherlands): On the Bernstein-von Mises theorem
Abstract: The Bayesian statistical approach consists of updating a prior probability distribution of a set of unknown parameters to a posterior distribution by reweighting the prior with the likelihood of the observables. We are interested in this posterior distribution from the non-Bayesian point of view that the observables follow some fixed probability distribution. In this setting the classical Bernstein-von Mises theorem says that the posterior distribution of a parameter of a smoothly parametrised statistical model can be approximated by a certain normal distribution. A definitive mathematical formulation of the theorem was obtained in the 1960/70s by Lucien Le Cam, but the theorem goes back in some form to Laplace, almost contemporary with the conception of Bayesian statistical inference.
Besides making some historical remarks (also about the name of the theorem), we review the role of the theorem in justifying Bayesian uncertainty quantification in the non-Bayesian, general statistical setup. This consists of using the spread of the posterior distribution (often in the form of so-called “credible sets”) as a measure of statistical uncertainty. We next continue to discuss extensions of the theorem to cases of non-Gaussian limit experiments (e.g. densities with jumps, or missing data models) and to Gaussian limits in infinite-dimensional (e.g. regression function, density function, initial value or potential function in a noisy inverse problem), and uncertainty quantification in these setups.
For a genuinely infinite-dimensional parameter, equipped with a strong norm, the naive analogy of the theorem is known to fail, and credible sets must balance the centering and spread of the posterior distribution (bias and root-variance). However, versions of the Bernstein-von Mises theorem apply to smooth functionals of a parameter. The appropriate assertion can be understood from the point of view of semiparametric information calculus and efficiency. We give some examples, including some from noisy estimation of PDEs, and some involving the famous Dirichlet prior on a probability distribution, and discuss the (in)appropriateness of these results as a justification of uncertainty quantification of infinite-dimensional Bayesian methods.