Publications

Lucchese L, Pakkanen M, Veraart A, 2024, The short-term predictability of returns in order book markets: a deep learning perspective, International Journal of Forecasting, ISSN: 0169-2070

This paper uses deep learning techniques to conduct a systematic large-scale analysis of order book-driven predictability in high-frequency returns. First, we introduce a new and robust representation of the order book, the volume representation. Next, we conduct an extensive empirical experiment to address various questions regarding predictability. We investigate if and how far ahead there is predictability, the importance of a robust data representation, the advantages of multi-horizon modeling, and the presence of universal trading patterns. We use model confidence sets, which provide a formalized statistical inference framework well suited to answer these questions. Our findings show that at high frequencies, predictability in mid-price returns is not just present but ubiquitous. The performance of the deep learning models is strongly dependent on the choice of order book representation, and in this respect, the volume representation appears to have multiple practical advantages.

Journal article

Leonte D, Veraart A, 2024, Simulation methods and error analysis for trawl processes and ambit fields, Mathematics and Computers in Simulation, Vol: 215, Pages: 518-542, ISSN: 0378-4754

Trawl processes are continuous-time, stationary and infinitely divisible processes which can describe a wide range of possible serial correlation patterns in data. In this paper, we introduce new simulation algorithms for trawl processes with monotonic trawl functions and establish their error bounds and convergence properties. We extensively analyse the computational complexity and practical implementation of these algorithms and discuss which one to use depending on the type of Lévy basis. We extend the above methodology to the simulation of kernel-weighted, volatility modulated trawl processes and develop a new simulation algorithm for ambit fields. Finally, we discuss how simulation schemes previously described in the literature can be combined with our methods for decreased computational cost.

Journal article

Nguyen M, Veraart A, Taisne B, Ting TC, Lallemant Det al., 2023, A dynamic extreme value model with applications to volcanic eruption forecasting, Mathematical Geosciences, ISSN: 1573-8868

Extreme events such as natural and economic disasters leave lasting impacts on society and motivate the analysis of extremes from data. While classical statistical tools based on Gaussian distributions focus on average behaviour and can lead to persistent biases when estimating extremes, extreme value theory (EVT) provides the mathematical foundations to accurately characterise extremes. This motivates the development of extreme value models for extreme event forecasting. In this paper, a dynamic extreme value model is proposed for forecasting volcano eruptions. This is inspired by one recently introduced for financial risk forecasting with high-frequency data. Using a case study of the Piton de la Fournaise volcano, it is shown that themodelling framework is widely applicable, flexible and holds strong promise for natural hazard forecasting. The value of using EVT-informed thresholds to identify and model extreme events is shown through forecast performance, and considerations to account for the range of observed events are discussed.

Journal article

Bennedsen M, Shephard N, Lunde A, Veraart Aet al., 2023, Inference and forecasting for continuous-time integer-valued trawl processes, Journal of Econometrics, Vol: 236, ISSN: 0304-4076

This paper develops likelihood-based methods for estimation, inference, model selection, and forecasting of continuous-time integer-valued trawl processes. The full likelihood of integer-valued trawl processes is, in general, highly intractable, motivating the use of composite likelihood methods, where we consider the pairwise likelihood in lieu of the full likelihood. Maximizing the pairwise likelihood of the data yields an estimator of the parameter vector of the model, and we prove consistency and, in the short memory case, asymptotic normality of this estimator. When the underlying trawl process has long memory, the asymptotic behaviour of the estimator is more involved; we present some partial results for this case. The pairwise approach further allows us to develop probabilistic forecasting methods, which can be used to construct the predictive distribution of integer-valued time series. In a simulation study, we document the good finite sample performance of the likelihood-based estimator and the associated model selection procedure. Lastly, the methods are illustrated in an application to modelling and forecasting financial bid–ask spread data, where we find that it is beneficial to carefully model both the marginal distribution and the autocorrelation structure of the data.

Journal article

Benth FE, Schroers D, Veraart A, 2023, A feasible central limit theorem for realised covariation of SPDEs in the context of functional data, Annals of Applied Probability, ISSN: 1050-5164

Journal article

Li Y, Pakkanen M, Veraart A, 2023, Limit theorems for the realised semicovariances of multivariateBrownian semistationary processes, Stochastic Processes and their Applications, Vol: 155, Pages: 202-231, ISSN: 0304-4149

In this article, we will introduce the realised semicovariance for Brownian semistationary (BSS) processes, which is obtained from the decomposition of the realised covariance matrix into components based on the signs of the returns and study its in-fill asymptotic properties. More precisely, weak convergence in the space of càdlàg functions endowed with the Skorohod topology for the realised semicovariance of a general Gaussian process with stationary increments is proved first. The proof is based on the Breuer–Major theorem and on a moment bound for sums of products of non-linearly transformed Gaussian vectors. Furthermore, we establish a corresponding stable convergence. Finally, a central limit theorem for the realised semicovariance of multivariate BSS processes is established. These results extend the limit theorems for the realised covariation to a result for non-linear functionals.

Journal article

Courgeau V, Veraart A, 2022, Asymptotic theory for the inference of the latent trawl model for extreme values, Scandinavian Journal of Statistics: theory and applications, Vol: 49, Pages: 1448-1495, ISSN: 0303-6898

This article develops statistical inference methods and their asymptotic theory for the latent trawl model for extremes, which captures serial dependence in the time series of exceedances above a threshold. We review two methods based on pairwise likelihood and show that they underestimate the serial dependence in the extremes. We propose two generalized method of moments procedures based on auto-covariance matching to overcome this shortcoming. Out of those four inference approaches, two are single-stage strategies while the others have two stages, and we provide central limit theorems in the sense of weakly approaching sequences of distributions for all of them. This additional flexibility ensures good behavior between the estimators and estimates of the limiting distribution. In an empirical illustration using London air pollution data, we find that the two-stage auto-covariance matching scheme yields a high-quality inference. It comprises two interpretable steps and correctly captures the serial dependence structure of extremes while performing on par with other methods in terms of marginal fit.

Journal article

Gandy A, Jana K, Veraart A, 2022, Scoring predictions at extreme quantiles, AStA Advances in Statistical Analysis, Vol: 106, Pages: 527-544, ISSN: 0002-6018

Prediction of quantiles at extreme tails is of interest in numerousapplications. Extreme value modelling provides various competing predictorsfor this point prediction problem. A common method of assessment of a setof competing predictors is to evaluate their predictive performance in a givensituation. However, due to the extreme nature of this inference problem, it canbe possible that the predicted quantiles are not seen in the historical records,particularly when the sample size is small. This situation poses a problem tothe validation of the prediction with its realisation. In this article, we proposetwo non-parametric scoring approaches to assess extreme quantile predictionmechanisms. The proposed assessment methods are based on predicting a sequence of equally extreme quantiles on different parts of the data. We thenuse the quantile scoring function to evaluate the competing predictors. Theperformance of the scoring methods is compared with the conventional scoring method and the superiority of the former methods are demonstrated in asimulation study. The methods are then applied to reanalyse cyber Netflowdata from Los Alamos National Laboratory and daily precipitation data at astation in California available from Global Historical Climatology Network.

Journal article

Courgeau V, Veraart A, 2022, High-frequency estimation of the Lévy-driven graph Ornstein-Uhlenbeck process, Electronic Journal of Statistics, Vol: 16, Pages: 4863-4925, ISSN: 1935-7524

We consider the Graph Ornstein-Uhlenbeck (GrOU) process observed on a non-uniform discrete timegrid and introduce discretised maximum likelihood estimators with parameters specific to the whole graph or specific to each component of the graph. Under a high-frequency sampling scheme, we study the asymptotic behaviour of those estimators as the mesh size of the observation grid goes to zero. We prove two stable central limit theorems to the same distribution as in the continuously-observed case under both finite and infinite jump activity for the Lévy driving noise. In addition to providing the consistency of the estimators, the stable convergence allows us to consider probabilistic sparse inference procedures on the edges themselves when a graph structure is not explicitly available. It also preserves its asymptotic properties. In particular, we also show the asymptotic normality and consistency of an Adaptive Lasso scheme. We apply the new estimators to wind capacity factor measurements, i.e. the ratio between the wind power produced locally compared to its rated peak power, across fifty locations in Northern Spain and Portugal. We compare those estimators to the standard least squares estimator through a simulation study extending known univariate results across graph configurations, noise types and amplitudes.

Journal article

Leonte D, Veraart A, 2022, Simulation methods and error analysis for trawl processes and ambit fields

Trawl processes are continuous-time, stationary and infinitely divisible processes which can describe a wide range of possible serial correlation patterns in data. In this paper, we introduce new simulation algorithms for trawl processes with monotonic trawl functions and establish their error bounds and convergence properties. We extensively analyse the computational complexity and practical implementation of these algorithms and discuss which one to use depending on the type of Lévy basis. We extend the above methodology to the simulation of kernel-weighted, volatility modulated trawl processes and develop a new simulation algorithm for ambit fields. Finally, we discuss how simulation schemes previously described in the literature can be combined with our methods for decreased computational cost.

Abstract
Cite

Working paper

Courgeau V, Veraart A, 2022, Likelihood theory for the Graph Ornstein-Uhlenbeck process, Statistical Inference for Stochastic Processes: an international journal devoted to time series analysis and the statistics of continuous time processes and dynamical systems, Vol: 25, Pages: 227-260, ISSN: 1387-0874

We consider the problem of modelling restricted interactions between continuously-observed time series as given by a known static graph (or network) structure. For thispurpose, we define a parametric multivariate Graph Ornstein-Uhlenbeck (GrOU) processdriven by a general L ́evy process to study the momentum and network effects amongstnodes, effects that quantify the impact of a node on itself and that of its neighbours,respectively. We derive the maximum likelihood estimators (MLEs) and their usual prop-erties (existence, uniqueness and efficiency) along with their asymptotic normality andconsistency. Additionally, an Adaptive Lasso approach, or a penalised likelihood scheme,infers both the graph structure along with the GrOU parameters concurrently and isshown to satisfy similar properties. Finally, we show that the asymptotic theory extendsto the case when stochastic volatility modulation of the driving L ́evy process is considered.

Journal article

Benth FE, Schroers D, Veraart A, 2022, A weak law of large numbers for realised covariation in a Hilbert space setting, Stochastic Processes and their Applications, Vol: 145, Pages: 241-268, ISSN: 0304-4149

This article generalises the concept of realised covariation to Hilbert-space-valued stochastic processes. More precisely, based on high-frequency functional data, we construct an estimator of the trace-class operator-valued integrated volatility process arising in general mild solutions of Hilbert space-valued stochastic evolution equations in the sense of Da Prato and Zabczyk (2014). We prove a weak law of large numbers for this estimator, where the convergence is uniform on compacts in probability with respect to the Hilbert–Schmidt norm. In addition, we determine convergence rates for common stochastic volatility models in Hilbert spaces.

Journal article

Rowinska P, Veraart A, Gruet P, 2021, A multi-factor approach to modelling the impact of wind energy on electricity spot prices, Energy Economics, Vol: 104, Pages: 1-14, ISSN: 0140-9883

We introduce a four-factor arithmetic model for electricity baseload spot prices in Germany and Austria. The model consists of a deterministic seasonality and trend function, both short- and long-term stochastic components, and exogenous factors such as the daily wind energy production forecasts, the residual demand and the wind penetration index. We describe the short-term stochastic factor by a Lévy semi-stationary (LSS) process, and the long-term component is modelled as a Lévy process with increments belonging to the class of generalised hyperbolic distributions.We derive the corresponding futures prices and develop an inference methodology for our multi-factor model. The methodology allows to infer the various factors in a step-wise procedure taking empirical spot prices, futures prices and wind energy production and total load data into account.Our empirical work shows that taking into account the impact of the wind energy generation on the prices improves the goodness of fit. Moreover, we demonstrate that the class of LSS processes can be used for modelling the exogenous variables including wind energy production, residual demand and the wind penetration index.

Journal article

Pakkanen MS, Passeggeri R, Sauri O, Veraart AEDet al., 2021, Limit theorems for trawl processes, Electronic Journal of Probability, Vol: 26, Pages: 1-36, ISSN: 1083-6489

In this work we derive limit theorems for trawl processes. First, we study the asymptotic behaviorof the partial sums of the discretized trawl process (Xi∆n)bntc−1i=0 , under the assumption that as n ↑ ∞,∆n ↓ 0 and n∆n → µ ∈ [0, +∞]. Second, we prove a general result on functional convergence indistribution of trawl processes. As an application of this result, we show that a trawl process whoseL´evy measure tends to infinity converges in distribution, under suitable rescaling, to a Gaussian movingaverage process.

Journal article

Mancarella P, Moriarty J, Philpott A, Veraart A, Zachary S, Zwart Bet al., 2021, Introduction: the mathematics of energy systems, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol: 379, Pages: 1-5, ISSN: 1364-503X

The urgent need to decarbonize energy systems gives rise to many challenging areas of interdisciplinary research, bringing together mathematicians, physicists, engineers and economists. Renewable generation, especially wind and solar, is inherently highly variable and difficult to predict. The need to keep power and energy systems balanced on a second-by-second basis gives rise to problems of control and optimization, together with those of the management of liberalized energy markets. On the longer time scales of planning and investment, there are problems of physical and economic design. The papers in the present issue are written by some of the participants in a programme on the mathematics of energy systems which took place at the Isaac Newton Institute for Mathematical Sciences in Cambridge from January to May 2019—see http://www.newton.ac.uk/event/mes.

Journal article

Heinrich C, Pakkanen MS, Veraart AED, 2019, Hybrid simulation scheme for volatility modulated moving average fields, Mathematics and Computers in Simulation, Vol: 166, Pages: 224-244, ISSN: 0378-4754

We develop a simulation scheme for a class of spatial stochastic processes called volatility modulated moving averages. A characteristic feature of this model is that the behaviour of the moving average kernel at zero governs the roughness of realisations, whereas its behaviour away from zero determines the global properties of the process, such as long range dependence. Our simulation scheme takes this into account and approximates the moving average kernel by a power function around zero and by a step function elsewhere. For this type of approach the authors of [8], who considered an analogous model in one dimension, coined the expression hybrid simulation scheme. We derive the asymptotic mean square error of the simulation scheme and compare it in a simulation study with several other simulation techniques and exemplify its favourable performance in a simulation study.

Journal article

Passeggeri R, Veraart A, 2019, Mixing properties of multivariate infinitely divisible random fields, Journal of Theoretical Probability, Vol: 32, Pages: 1845-1879, ISSN: 0894-9840

In this work we present different results concerning mixing properties of multivariate infinitely divis-ible (ID) stationary random fields. First, we derive some necessary and sufficient conditions for mixingof stationary ID multivariate random fields in terms of their spectral representation. Second, we provethat (linear combinations of independent) mixed moving average fields are mixing. Further, using a sim-ple modification of the proofs of our results we are able to obtain weak mixing versions of our results.Finally, we prove the equivalence of ergodicity and weak mixing for multivariate ID stationary randomfields.

Journal article

Passeggeri R, Veraart A, 2019, Limit theorems for multivariate Brownian semistationary processes and feasible results, Advances in Applied Probability, Vol: 51, Pages: 667-716, ISSN: 0001-8678

In this paper we introduce the multivariate Brownian semistationary (BSS) process and study the joint asymptotic behaviour of its realised covariation using in-fill asymptotics. First, we present a central limit theorem for general multivariate Gaussian processes with stationary increments, which are not necessarily semimartingales. Then, we show weak laws of large numbers, central limit theorems and feasible results for BSS processes. An explicit example based on the so-called gamma kernels is also provided.

Journal article

Granelli A, Veraart A, 2019, A central limit theorem for the realised covariation of a bivariate Brownian semistationary process, Bernoulli, Vol: 25, Pages: 2245-2278, ISSN: 1350-7265

This article presents a weak law of large numbers and a central limit theorem for the scaled realised covariation of a bivariate Brownian semistationary process. The novelty of our results lies in the fact that we derive the suitable asymptotic theory both in a multivariate setting and outside the classical semimartingale framework. The proofs rely heavily on recent developments in Malliavin calculus.

Journal article

Veraart A, 2019, Modeling, simulation and inference for multivariate time series of counts using trawl processes, Journal of Multivariate Analysis, Vol: 169, Pages: 110-129, ISSN: 0047-259X

This article presents a new continuous-time modeling framework for multivariate time series of counts which have an infinitely divisible marginal distribution. The model is based on a mixed moving average process driven by Lévy noise, called a trawl process, where the serial correlation and the cross-sectional dependence are modeled independently of each other. Such processes can exhibit short or long memory. We derive a stochastic simulation algorithm and a statistical inference method for such processes. The new methodology is then applied to high frequency financial data, where we investigate the relationship between the number of limit order submissions and deletions in a limit order book.

Journal article

Deschatre T, Veraart A, 2018, A JOINT MODEL FOR ELECTRICITY SPOT PRICES AND WINDPENETRATION WITH DEPENDENCE IN THE EXTREMES, Forecasting and risk management for renewable energy, Editors: Drobinski, Mougeot, Picard, Plougonven, Tankov

Cite

Book chapter

Passeggeri R, Veraart A, 2018, Mixing properties of multivariate infinitely divisible random fields, Journal of Theoretical Probability, Vol: 32, Pages: 1845-1879, ISSN: 0894-9840

In this work we present different results concerning mixing properties of multivariate infinitely divisible (ID) stationary random fields. First, we derive some necessary and sufficient conditions for mixing of stationary ID multivariate random fields in terms of their spectral representation. Second, we prove that (linear combinations of independent) mixed moving average fields are mixing. Further, using a simple modification of the proofs of our results, we are able to obtain weak mixing versions of our results. Finally, we prove the equivalence of ergodicity and weak mixing for multivariate ID stationary random fields.

Journal article

Noven R, Veraart A, Gandy A, 2018, A latent trawl process model for extreme values, Journal of Energy Markets, Vol: 11, Pages: 1-24, ISSN: 1756-3607

This paper presents a new model for characterising temporaldependence in exceedancesabove a threshold. The model is based on the class of trawl processes, which are stationary,infinitely divisible stochastic processes. The model for extreme values is constructed byembedding a trawl process in a hierarchical framework, which ensures that the marginaldistribution is generalised Pareto, as expected from classical extreme value theory. Wealso consider a modified version of this model that works witha wider class of generalisedPareto distributions, and has the advantage of separating marginal and temporal depen-dence properties. The model is illustrated by applicationsto environmental time series,and it is shown that the model offers considerable flexibilityin capturing the dependencestructure of extreme value data

Journal article

Nguyen M, Veraart A, 2018, Bridging between short-range and long-range dependence with mixed spatio-temporal Ornstein-Uhlenbeck processes, Stochastics: An International Journal of Probability and Stochastic Processes, Vol: 90, Pages: 1023-1052, ISSN: 1744-2508

While short-range dependence is widely assumed in the literature for its simplicity, long-range dependence is a featurethat has been observed in data from finance, hydrology, geophysics and economics. In this paper, we extend a L´evy-drivenspatio-temporal Ornstein-Uhlenbeck process by randomly varying its rate parameter to model both short-range and longrangedependence. This particular set-up allows for non-separable spatio-temporal correlations which are desirable forreal applications, as well as flexible spatial covariances which arise from the shapes of influence regions. Theoreticalproperties such as spatio-temporal stationarity and second-order moments are established. An isotropic g-class is alsoused to illustrate how the memory of the process is related to the probability distribution of the rate parameter. Wedevelop a simulation algorithm for the compound Poisson case which can be used to approximate other L´evy bases. Thegeneralised method of moments is used for inference and simulation experiments are conducted with a view towardsasymptotic properties.

Journal article

Rowinska P, Veraart A, Gruet P, 2018, A multifactor approach to modelling the impact of wind energy on electricity spot prices, Publisher: SSRN

We introduce a three-factor model of electricity spot prices, consisting of a determinis-tic seasonality and trend function as well as short- and long-term stochastic components,and derive a formula for futures prices. The long-term component is modelled as a L ́evyprocess with increments belonging to the class of generalised hyperbolic distributions. We de-scribe the short-term factor by L ́evy semistationary processes: we start from a CARMA(2,1),i.e. a continous-time ARMA model, and generalise it by adding a short-memory stochasticvolatility. We further modify the model by including the information about the wind energyproduction as an exogenous variable. We fit our models to German and Austrian data in-cluding spot and futures prices as well as the wind energy production and total load data.Empirical studies reveal that taking into account the impact of the wind energy generation onthe prices improves the goodness of fit.

Working paper

Granelli A, Veraart A, 2017, A weak law of large numbers for estimating the correlation in bivariate Brownian semistationary processes

Working paper

Veraart AED, 2017, Essentials of Probability Theory for Statisticians, JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, Vol: 112, Pages: 879-880, ISSN: 0162-1459

Journal article

Nguyen M, Veraart A, 2017, Modelling spatial heteroskedasticity by volatility modulated moving averages, Spatial Statistics, Vol: 20, Pages: 148-190, ISSN: 2211-6753

Spatial heteroskedasticity has been observed in many spatial data applications such as air pollution and vegetation. We propose a model, the volatility modulated moving average, to account for changing variances across space. This stochastic process is driven by Gaussian noise and involves a stochastic volatility field. It is conditionally non-stationary but unconditionally stationary: a useful property for theory and practice. We develop a discrete convolution algorithm as well as a two-step moments-matching estimation method for simulation and inference respectively. These are tested via simulation experiments and the consistency of the estimators is proved under suitable double asymptotics. To illustrate the advantages that this model has over the usual Gaussian moving average or process convolution, sea surface temperature anomaly data from the International Research Institute for Climate and Society are analysed.

Journal article

Veraart A, 2017, Book review of "Essentials of Probability Theory for Statisticians" by Michael A. Proschan and Pamela A. Shaw, Journal of the American Statistical Association, ISSN: 1537-274X

Cite

Journal article

Nguyen M, Veraart A, 2016, Spatio-temporal Ornstein-Uhlenbeck processes: theory, simulation and statistical inference, Scandinavian Journal of Statistics, Vol: 44, Pages: 46-80, ISSN: 1467-9469

Spatio-temporal modelling is an increasingly popular topic in Statistics. Our paper contributes to this line of researchby developing the theory, simulation and inference for a spatio-temporal Ornstein-Uhlenbeck process. We conduct detailedsimulation studies and demonstrate the practical relevance of these processes in an empirical study of radiationanomaly data. Finally, we describe how predictions can be carried out in the Gaussian setting.

Journal article

ProfessorAlmutVeraart

Contact

Location

Summary