Understanding complex, irregularly sampled, high-dimensional streams of data is a central challenge in modern data science. Key to this understanding is the ability to extract actionable information from a stream and use it to make consequential decisions. Examples include summarising patients’ health records to evaluate the efficacy of treatments, or extracting information from the trajectories of stocks in order to design successful trading strategies.

Developed in the 90s by Terry Lyons as a robust solution theory for non-linear control systems driven by irregular signals, rough path theory (or more generally rough analysis, MSC2020 code 60Lxx) offers a deterministic toolbox from which it is possible to recover many classical results in stochastic analysis, without the need to use arguments specific to probability theory. Its theoretical footprint has been substantial in the study of random phenomena, notably through its presence in Martin Hairer’s Fields medal-winning work on regularity structures, which develops a rigorous framework to solve certain ill-posed stochastic PDEs.

Grounded in mathematical analysis but inspired from equations arising in probability theory, rough analysis has deep connections to branches of pure mathematics as diverse as differential geometry and algebraic combinatorics. More recently, rough analysis has become popular in Machine Learning (ML) to extract actionable information from high dimensional, irregular time series. It can be a game changer for learning tasks with time series data, as shown by the recent and ongoing explosion of ML papers using it. Successful applications of rough
analysis have been obtained in various areas of data science including healthcare, neuroscience, computer vision and quantitative finance.

One of the catalyst factors was the recent development of high-performance and scalable software libraries such as esig, iisignature and signatory. This two-days workshop will consist of 45 minute talks from invited speakers addressing 4 main topics (see the list below). Talks will be followed by questions and a discussion about future research directions.

In this workshop we wish to bring together experts and young researchers in diverse fields of mathematics and machine learning, who share the interest and expertise of applying techniques from rough analysis to challenges in data science. The main themes of the workshop will be:

  • Interplay between rough paths and machine learning.
  • Applications of rough paths to mathematical finance.
  • Theoretical aspects of rough paths.
  • Regularity structures, stochastic PDEs and machine learning.

We hope that an in-person workshop at a world-renowned institution such as Imperial College London will result in new international and multidisciplinary collaborations, and that it will generate original research that uses advanced mathematics to tackle real-world problems.


Cris Salvi (Imperial), Thomas Cass (Imperial), Blanka Horvarth (TUM), Emilio Ferrucci (Oxford).



Tuesday 26 July
08:50 – 09:00 Welcome
09:00 – 09:45 Patric Bonnier
09:45 – 10:30 Emilio Ferrucci
10:30 – 11:00 Coffee break
11:00 – 11:45 Josef Teichman (online)
11:45 – 12:30 Darrick Lee (online)
12:30 – 14:00 Lunch break
14:00 – 15:00 Martin Hairer
15:00 – 15:45 Maud Lemercier
15:45 – 16:15 Coffee break
16:15 – 17:00 Harald Oberhauser
17:00 – 17:45 James Foster
Wednesday 27 July
09:00 – 10:00 Terry Lyons
10:00 – 10:45 Joel Dyer
10:45 – 11:15 Coffee break
11:15 – 12:00 Christian Bayer
12:00 – 12:45 Horatio Boedihardjo
12:45 – 14:00 Lunch break
14:00 – 14:45 Nikolas Tapia (online)
14:45 – 15:30 Carlo Bellingeri (online)
15:30 – 16:00 Coffee break
16:00 – 16:45 Yue Wu
16:45 – 17:30 Andrew Alden


Titles and abstracts


Patrick Bonnier: Proper Scoring Rules, Divergences, and Entropies for Paths and Time Series
Many forecasts consist not of point predictions but concern the evolution of quantities. For example, a central bank might predict the interest rates during the next quarter, an epidemiologist might predict trajectories of infection rates, a clinician might predict the behaviour of medical markers over the next day, etc. The situation is further complicated since these forecasts sometimes only concern the approximate “shape of the future evolution” or “order of events”. Formally, such forecasts can be seen as probability measures on spaces of equivalence classes of paths modulo time-parametrization. We leverage the statistical framework of proper scoring rules with classical mathematical results to derive a principled approach to decision making with such forecasts. In particular, we introduce notions of gradients, entropy, and divergence that are tailor-made to respect the underlying non-Euclidean structure.

Emilio Ferrucci: On the Wiener Chaos Expansion of the Signature of a Gaussian Process
This talk is based on joint work with Thomas Cass. We compute the Wiener chaos decomposition of the signature for a class of Gaussian processes, which contains fractional Brownian motion (fBm) with Hurst parameter H in (1/4, 1). At level 0, our result yields an expression for the expected signature of such processes, which determine their law [CL16]. In particular, this formula simultaneously extends both the one for 1/2 < H-fBm [BC07] and the one for Brownian motion (H = 1/2) [Faw03], to the general case H > 1/4. Other processes studied include continuous and centred Gaussian semimartingales.

Josef Teichman:  A representation theoretic viewpoint on signatures with a view towards regularization
By means of representation theory we construct families of kernels, which can be approximated by random feature selection. This sheds new light on randomized signature and regularization of learning procedures for signature approximations.

Darrick Lee:  Mapping Space Signatures
We introduce the mapping space signature, a generalization of the path signature for maps from higher dimensional cubical domains, which is motivated by the topological perspective of iterated integrals by K. T. Chen. We show that the mapping space signature shares many of the analytic and algebraic properties of the path signature; in particular it is universal and characteristic with Jacobian equivalence classes of cubical maps. This is joint work with Chad Giusti, Vidit Nanda, and Harald Oberhauser.

Martin Hairer: A concise proof of the BPHZ theorem for regularity structures

Maud Lemercier: Neural Stochastic PDEs
Stochastic partial differential equations (SPDEs) are the mathematical tool of choice for modelling spatiotemporal PDE-dynamics under the influence of randomness. In this talk, I will present a novel neural architecture to learn solution operators of PDEs with (possibly stochastic) forcing from partially observed data. The proposed Neural SPDE model is capable of processing incoming sequential information arriving irregularly in time and observed at arbitrary spatial resolutions. By performing operations in the spectral domain, I will show how a Neural SPDE can be evaluated by solving a fixed point problem. I will present numerical experiments on various semilinear SPDEs, including the stochastic Navier-Stokes equations, which demonstrate how the Neural SPDE model is capable of learning complex spatiotemporal dynamics in a resolution-invariant way, with better accuracy and lighter training data requirements compared to alternative models, and up to 3 orders of magnitude faster than traditional solvers.

Harald Oberhauser: Capturing graphs with hypoelliptic diffusions
A common way to capture graph structures is through random walks. The distribution of these random walks evolves according to a diffusion equation defined using the graph Laplacian. We extend this approach by leveraging classic mathematical results about hypo-elliptic diffusions. This results in a novel tensor-valued graph operator, which we call the hypo-elliptic graph Laplacian. We provide theoretical guarantees and efficient low-rank approximation algorithms.

James Foster: Cubature vs Markov Chain Monte Carlo for Bayesian Inference
Markov Chain Monte Carlo (MCMC) is widely regarded as the “go-to” approach for computing integrals with respect to posterior distributions in Bayesian inference. Whilst there are a large variety of MCMC methods, many prominent algorithms can be viewed as approximations of stochastic differential equations (SDEs). For example, the unadjusted Langevin algorithm (ULA) is obtained as an Euler discretization of the Langevin diffusion and has seen particular interest due to its scalability and connections to the optimization literature. On the other hand, “Cubature on Wiener Space” (Lyons and Victoir, 2004) provides a powerful alternative to Monte Carlo for simulating SDEs. In the cubature paradigm, SDE solutions are represented as a cloud of particles and propagated via deterministic cubature formulae. However, such formulae can dramatically increase the number of particles, and thus SDE cubature requires efficient “distribution compression” to be practical. Fortunately, there are now a range of kernel-based compression algorithms available in the machine learning literature – such as kernel herding, thinning and recombination. In this talk, we will show that by applying cubature to ULA and employing kernel herding, we can obtain a gradient-based particle method for Bayesian inference. We shall discuss the theory underpinning this algorithm and the key properties of the Langevin diffusion that enable numerical errors to be controlled over long time horizons. Finally, we compare the proposed Langevin cubature algorithm to ULA on a simple mixture model and observe significant computational benefits.

Terry Lyons: From the mathematics of rough paths to more scalable data science
The mathematics of rough path theory creates a framework for understanding the interactions of complex and highly oscillatory systems and generalises the Newtonian framework of controlled differential equations to include rough multimodal evolving systems. A key feature of this theory is the development of a strong analytic theory capturing in concrete terms, the space of (polynomial) functions on these spaces of paths. This work was built on the ideas of KT Chen who studied these spaces of functions to develop a co-homology theory on loop space. The core analysis came from LC Young. The generating function for these polynomial functions on path space is known as the signature, and it was established by Hambly and Lyons (Annals of Math 2010) that the signature of the path is a complete invariant of a path of finite length modulo the appropriate notion of re-parametrisation. Boedihardjo and … (Advances in Maths 2016) extended this result to rough paths.
    These results provide a new perspective and a graded feature set for describing complex streamed data. The first few terms in this series expansion allow high quality local descriptions of streams. These features are expensive to compute. But crucially for machine learning, they only need to be computed once and can be used in every training cycle. In this way they can the form the basis for much more scalable machine learning algorithms. (Morrill, James, Cristopher Salvi, Patrick Kidger, and James Foster. Neural rough differential equations for long time series. In International Conference on Machine Learning, pp. 7829-7838. PMLR, 2021).
    We will survey this space.

Joel Dyer:  Simulation-based inference with path signatures
Computer simulations are used widely across scientific disciplines, often taking the form of stochastic black-box models consuming fixed parameters and generating a random output. In general for such models, no likelihood function is available, often due to the complexity of the simulators. Consequently, it is often convenient to adopt so-called likelihood-free or simulation-based inference methods that mimic conventional likelihood-based procedures using data simulated at different parameter values. While many such approaches exist for iid data, adapting these techniques to simulators that generate sequential data can be challenging. In this talk, we will discuss our recent work on simulation-based parameter inference for dynamic, stochastic simulators with the use of path signatures. We will argue that signatures and their recent kernelisation naturally and flexibly enable both approximate Bayesian and frequentist inference with time-series simulators of different kinds, with competitive empirical performance in a variety of benchmark experiments.

Christian Bayer: Optimal stopping with signatures
We propose a new method for solving optimal stopping problems (such as American option pricing in finance) under minimal assumptions on the underlying stochastic process X. We consider classic and randomized stopping times represented by linear and non-linear functionals of the rough path signature 𝕏<∞ associated to X, and prove that maximizing over these classes of signature stopping times, in fact, solves the original optimal stopping problem. Using the algebraic properties of the signature, we can then recast the problem as a (deterministic) optimization problem depending only on the (truncated) expected signature. By applying a deep neural network approach to approximate the non-linear signature functionals, we can efficiently solve the optimal stopping problem numerically. The only assumption on the process X is that it is a continuous (geometric) random rough path. Hence, the theory encompasses processes such as fractional Brownian motion, which fail to be either semi-martingales or Markov processes, and can be used, in particular, for American-type option pricing in fractional models, e.g. of financial or electricity markets. (Based on joint work with Paul Hager, Sebastian Riedel, and John Schoenmakers)

Horatio Boedihardjo:  A non-vanishing property for the signature of a bounded variation path
Given a bounded variation path, what can we say about it’s signature? It is classical that the signature is a group-like element and that the n-th term of the signature decay at the speed of n!. In this talk, we will show a third property, that the sequence of signature cannot contain infinitely many zeros. Together with the result of Chang, Lyons and Ni, this means the signature of reduced bounded variations paths have an exact decay rate n!. This work gives rise to many interesting questions, including what would be the complex version of the uniqueness theorem for signature and the analogous non-vanishing results for general geometric rough path (even though the nonvanishing property itself is not true for general rough paths).

Nikola Tapia: Stability of Deep Neural Networks via discrete rough paths
Using rough path techniques, we provide a priori estimates for the output of Deep Residual Neural Networks in terms of both the input data and the (trained) network weights. As trained network weights are typically very rough when seen as functions of the layer, we propose to derive stability bounds in terms of the total p-variation of trained weights for any p∈[1,3]. Unlike the C1-theory underlying the neural ODE literature, our estimates remain bounded even in the limiting case of weights behaving like Brownian motions, as suggested in [arXiv:2105.12245]. Mathematically, we interpret residual neural network as solutions to (rough) difference equations, and analyse them based on recent results of discrete time signatures and rough path theory.

Carlo Bellingeri: A Young-type Euler-Maclaurin Formula
Considered one of the key identities in classical analysis, the Euler-McLaurin formula is one of the standard tool to relate sums and integrals, with remarkable applications in many areas of mathematics, though with little use in stochastic analysis. In this talk, we will show how the notion of signature can generalize this identity in the context of Young’s integration and discuss some possible applications.

Yue Wu:  A NRDE-based model for solving path-dependent PDEs
The path-dependent partial differential equation (PPDE) was firstly introduced for path-dependent derivatives in the financial market such as Asian, barrier, and lookback options; its semilinear type was later identified as a non-Markovian BSDE. The solution of PPDE contains an infinite-dimensional spatial variable, which makes the solution approximation extremely challenging, if it is not impossible. In this talk, we propose a neural rough differential equation (NRDE) based model to learn (high dimensional) path-dependent parabolic PDEs. This resulting continuous-time model for the PDE solution has the advantage of memory efficiency and coping with variable time-frequency. Several numerical experiments are provided to validate the performance of the proposed model in compassion to strong baselines in the literature. This is joint work with Bowen Fang (University of Warwick, UK) and Hao Ni (UCL, UK).

Andrew Alden:  Model-Agnostic Pricing of Exotic Derivatives Using Signatures
Derivative pricing can be formulated as a higher-order distribution regression problem on stochastic processes. Pricing using this model-agnostic path-wise approach involves the use of the second-order maximum mean discrepancy (MMD), a notion of distances between stochastic processes based on path signatures. Computing this distance is resource-expensive and time-consuming. Motivated by the recent successes of using neural networks to price derivatives, we speed up the computation of the MMD to facilitate the use of neural networks to address the distribution regression problem. In this talk I will discuss how we reduce the run-time for computing the second-order MMD. I will also present the results which were obtained by combining distribution regression and neural networks to price three exotic derivatives. Finally, I will discuss the robustness of our path-wise pricing framework to stochastic model parameter misspecifications. This talk is based on joint work with Carmine Ventre, Blanka Horvath, and Gordon Lee.

Getting here