I am Lecturer in the Department of Mathematics at Imperial College London. Prior to this, I was a Chapman Fellow in Mathematics at Imperial and I obtained my PhD from the University of Oxford under the supervision of Prof. Terry Lyons.
My research interests are in the areas of stochastic analysis, deep learning, and kernel methods. More precisely, I work in rough path theory, a modern mathematical framework providing a rigorous description of how oscillatory signals interact with non-linear, time-evolving stochastic systems, such as the brain or financial markets.
In practice, this theory can be used to develop algorithms for processing irregularly sampled, noisy and high-dimensional time series. The integration of these tools within existing machine learning pipelines can generate striking performance boosts in a wide range of contexts including stochastic control and optimisation, numerical methods for solving PDEs, time series generative modelling, causal discovery for stochastic processes, or even protein folding.
Moreover, using rough path theory, it is possible to show how a large class of neural network architectures for sequential data, such as RNNs, LSTMs or Transformers, can be interpreted as approximations of certain differential equations which, in the large width regime, behave like (signature) kernel machines.
LinkedIn / GitHub / Google Scholar.
Background on rough path theory
Lyons' theory of rough paths is a modern mathematical framework describing how oscillatory signals interact with non-linear dynamical systems. Notably, the theory extends Itô’s approach to SDEs beyond semimartingales and it has had a significant impact on the development of Hairer’s theory of regularity structures, providing a mathematically principled description of many stochastic PDEs arising from physics.
More recently, rough path theory has gained strong traction in many areas of machine learning, particularly those involving time series. The signature, a centrepiece of the theory, provides a top-down description of a signal; it captures essential information such as the order of events occurring across different channels, and filters out potentially superfluous information, such as the sampling rate of the signal.
A similar paradigm is provided by reservoir computing, where a trajectory is described through its interaction with a random dynamical system capable of storing information, while in rough path theory the random system is replaced by a deterministic system given by the signature.
To mitigate the exponential computational complexity in the dimension of the underlying streams, significant efforts have been made to scale signature methods to high dimensional signals. Signature kernels provide a somewhat "dual" view to learning with signatures that scales to high dimensional time series through well-designed kernel tricks, based on an interesting connection between signatures and some classes of PDE. Algorithms based on signature kernels have been used in a wide range of applications including neuroscience, finance, cybersecurity and weather forecasting.
The symbiosis between differential equations and deep learning has become an active research area in recent years, mainly through the introduction of hybrid models named neural differential equations arising from the realisation that many standard neural network architectures can in fact be interpreted as approximations to some differential equations. Insights from rough paths theory have enabled the development of continuous-time recurrent neural architectures dubbed neural controlled/rough differential equations, which offer memory efficiency, high expressivity for function approximation, and have established themselves as state-of-the-art models for handling irregular time series.
If you are interested in knowing more about applications of rough path theory to machine learning, please consider joining our Rough Paths Interest Group organised in partnership with The Alan Turing Institute and DataSig.
et al., 2021, The signature kernel is the solution of a Goursat PDE, Siam Journal on Mathematics of Data Science, Vol:3, ISSN:2577-0187, Pages:873-899
Salvi C, Lemercier M, Gerasimovics A, Neural stochastic PDEs: resolution-invariant learning of continuous spatiotemporal dynamics, Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022), NeurIPS
et al., Higher Order Kernel Mean Embeddings to Capture Filtrations of Stochastic Processes, Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021)
et al., 2021, SigGPDE: scaling sparse Gaussian processes on sequential data, Thirty-eighth International Conference on Machine Learning (ICML-2021), PMLR, Pages:6233-6242, ISSN:2640-3498
et al., Neural Rough Differential Equations for Long Time Series, Thirty-eighth International Conference on Machine Learning (ICML 2021)
et al., Distribution Regression for Sequential Data, The 24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021)
et al., Deep Signature Transforms, Advances in Neural Information Processing Systems 32 (NeurIPS 2019)