Publications

Hilbers A, Brayshaw D, Gandy A, 2019, Importance subsampling: Improving power system planning under climate-based uncertainty, Applied Energy, Vol: 251, Pages: 1-12, ISSN: 0306-2619

Recent studies indicate that the effects of inter-annual climate-based variability in power system planning are significant and that long samples of demand & weather data (spanning multiple decades) should be considered. At the same time, modelling renewable generation such as solar and wind requires high temporal resolution to capture fluctuations in output levels. In many realistic power system models, using long samples at high temporal resolution is computationally unfeasible. This paper introduces a novel subsampling approach, referred to as importance subsampling, allowing the use of multiple decades of demand & weather data in power system planning models at reduced computational cost. The methodology can be applied in a wide class of optimisation-based power system simulations. A test case is performed on a model of the United Kingdom created using the open-source modelling framework Calliope and 36 years of hourly demand and wind data. Standard data reduction approaches such as using individual years or clustering into representative days lead to significant errors in estimates of optimal system design. Furthermore, the resultant power systems lead to supply capacity shortages, raising questions of generation capacity adequacy. In contrast, importance subsampling leads to accurate estimates of optimal system design at greatly reduced computational cost, with resultant power systems able to meet demand across all 36 years of demand & weather scenarios.

Journal article

Veraart LAM, Gandy A, 2019, Adjustable network reconstruction with applications to CDS exposures, Journal of Multivariate Analysis, Vol: 172, Pages: 193-209, ISSN: 0047-259X

This paper is concerned with reconstructing weighted directed networks from the total in- and out-weight of each node. This problem arises for example in the analysis of systemic risk of partially observed financial networks. Typically a wide range of networks is consistent with this partial information. We develop an empirical Bayesian methodology that can be adjusted such that the resulting networks are consistent with the observations and satisfy certain desired global topological properties such as a given mean density, extending the approach by Gandy and Veraart (2017). Furthermore we propose a new fitness-based model within this framework. We provide a case study based on a data set consisting of 89 fully observed financial networks of credit default swap exposures. We reconstruct those networks based on only partial information using the newly proposed as well as existing methods. To assess the quality of the reconstruction, we use a wide range of criteria, including measures on how well the degree distribution can be captured and higher order measures of systemic risk. We find that the empirical Bayesian approach performs best.

Journal article

Noven R, Veraart A, Gandy A, 2018, A latent trawl process model for extreme values, Journal of Energy Markets, Vol: 11, Pages: 1-24, ISSN: 1756-3607

This paper presents a new model for characterising temporaldependence in exceedancesabove a threshold. The model is based on the class of trawl processes, which are stationary,infinitely divisible stochastic processes. The model for extreme values is constructed byembedding a trawl process in a hierarchical framework, which ensures that the marginaldistribution is generalised Pareto, as expected from classical extreme value theory. Wealso consider a modified version of this model that works witha wider class of generalisedPareto distributions, and has the advantage of separating marginal and temporal depen-dence properties. The model is illustrated by applicationsto environmental time series,and it is shown that the model offers considerable flexibilityin capturing the dependencestructure of extreme value data

Journal article

Ding D, Gandy A, 2018, Tree-based Particle Smoothing Algorithms in a Hidden Markov Model

We provide a new strategy built on the divide-and-conquer approach byLindsten et al. (2017) to investigate the smoothing problem in a hidden Markovmodel. We employ this approach to decompose a hidden Markov model intosub-models with intermediate target distributions based on an auxiliary treestructure and produce independent samples from the sub-models at the leaf nodestowards the original model of interest at the root. We review the targetdistribution in the sub-models suggested by Lindsten et al. and propose two newclasses of target distributions, which are the estimates of the (joint)filtering distributions and the (joint) smoothing distributions. The firstproposed type is straightforwardly constructible by running a filteringalgorithm in advance. The algorithm using the second type of targetdistributions has an advantage of roughly retaining the marginals of all randomvariables invariant at all levels of the tree at the cost of approximating themarginal smoothing distributions in advance. We further propose theconstructions of these target distributions using pre-generated Monte Carlosamples. We show empirically the algorithms with the proposed intermediatetarget distributions give stable and comparable results as the conventionalsmoothing methods in a linear Gaussian model and a non-linear model.

Working paper

Gandy A, Veraart LAM, 2017, A Bayesian methodology for systemic risk assessment in financial networks, Management Science, Vol: 63, Pages: 4428-4446, ISSN: 0025-1909

We develop a Bayesian methodology for systemic risk assessment in financial networks such as theinterbank market. Nodes represent participants in the network and weighted directed edges representliabilities. Often, for every participant, only the total liabilities and total assets within this network areobservable. However, systemic risk assessment needs the individual liabilities. We propose a modelfor the individual liabilities, which, following a Bayesian approach, we then condition on the observedtotal liabilities and assets and, potentially, on certain observed individual liabilities. We construct aGibbs sampler to generate samples from this conditional distribution. These samples can be used instress testing, giving probabilities for the outcomes of interest. As one application we derive defaultprobabilities of individual banks and discuss their sensitivity with respect to prior information includedto model the network. An R-package implementing the methodology is provided.

Journal article

Gandy A, Kvaløy JT, 2017, spcadjust: an R package for adjusting for estimation error in control charts, The R Journal, Vol: 9, Pages: 458-476, ISSN: 2073-4859

In practical applications of control charts the in-control state and the corresponding chartparameters are usually estimated based on some past in-control data. The estimation error thenneeds to be accounted for. In this paper we present an R package,spcadjust, which implements abootstrap based method for adjusting monitoring schemes to take into account the estimation error.By bootstrapping the past data this method guarantees, with a certain probability, a conditionalperformance of the chart. Inspcadjustthe method is implement for various types of Shewhart,CUSUM and EWMA charts, various performance criteria, and both parametric and non-parametricbootstrap schemes. In addition to the basic charts, charts based on linear and logistic regressionmodels for risk adjusted monitoring are included, and it is easy for the user to add further charts. Useof the package is demonstrated by examples.

Journal article

Lau FDH, Gandy A, 2016, Enhancing football league tables, Significance, Vol: 13, Pages: 8-9, ISSN: 1740-9705

League tables are commonly used to represent the current state of a competition, in football and other sports. But they do not tell the full story. F. Din-Houn Lau and Axel Gandy suggest a few improvements.

Journal article

Gandy A, Lau F, 2016, The chopthin algorithm for resampling, IEEE Transactions on Signal Processing, Vol: 64, Pages: 4273-4281, ISSN: 1941-0476

Resampling is a standard step in particle filters and more generally sequential Monte Carlo methods. We present an algorithm, called chopthin, for resampling weighted particles. In contrast to standard resampling methods the algorithm does not produce a set of equally weighted particles; instead it merely enforces an upper bound on the ratio between the weights. Simulation studies show that the chopthin algorithm consistently outperforms standard resampling methods. The algorithms chops up particles with large weight and thins out particles with low weight, hence its name. It implicitly guarantees a lower bound on the effective sample size. The algorithm can be implemented efficiently, making it practically useful. We show that the expected computational effort is linear in the number of particles. Implementations for C++, R (on CRAN), Python and Matlab are available.

Journal article

Gandy A, Hahn G, 2016, QuickMMCTest -- quick multiple Monte Carlo testing, Statistics and Computing, Vol: 27, Pages: 823-832, ISSN: 1573-1375

Multiple hypothesis testing is widely used to evaluate scientific studiesinvolving statistical tests. However, for many of these tests, p-values are notavailable and are thus often approximated using Monte Carlo tests such aspermutation tests or bootstrap tests. This article presents a simple algorithmbased on Thompson Sampling to test multiple hypotheses. It works with arbitrarymultiple testing procedures, in particular with step-up and step-downprocedures. Its main feature is to sequentially allocate Monte Carlo effort,generating more Monte Carlo samples for tests whose decisions are so far lesscertain. A simulation study demonstrates that for a low computational effort,the new approach yields a higher power and a higher degree of reproducibilityof its results than previously suggested methods.

Journal article

Gandy A, Hahn G, 2016, A framework for Monte Carlo based multiple testing, Scandinavian Journal of Statistics, Vol: 43, Pages: 1046-1063, ISSN: 1467-9469

We are concerned with multiple testing in the setting where p-values areunknown and can only be approximated using Monte Carlo simulation. Thisscenario occurs widely in practice. We are interested in obtaining the samerejections and non-rejections as the ones obtained if the p-values for allhypotheses had been available. The present article introduces a framework forthis scenario by providing a generic algorithm for a general multiple testingprocedure. We establish conditions which guarantee that the rejections andnon-rejections obtained through Monte Carlo simulations are identical to theones obtained with the p-values. Our framework is applicable to a general classof step-up and step-down procedures which includes many established multipletesting corrections such as the ones of Bonferroni, Holm, Sidak, Hochberg orBenjamini-Hochberg. Moreover, we show how to use our framework to improvealgorithms available in the literature in such a way as to yield theoreticalguarantees on their results. These modifications can easily be implemented inpractice and lead to a particular way of reporting multiple testing results asthree sets together with an error bound on their correctness, demonstratedexemplarily using a real biological dataset.

Journal article

Lee MLT, Gail M, Pfeiffer R, Satten G, Cai T, Gandy Aet al., 2015, Risk assessment and evaluation of predictions, ISSN: 0930-0325

Cite

Conference paper

Gandy A, Hahn G, 2014, MMCTest-A Safe Algorithm for Implementing Multiple Monte Carlo Tests, SCANDINAVIAN JOURNAL OF STATISTICS, Vol: 41, Pages: 1083-1101, ISSN: 0303-6898

Author Web Link
Cite
Citations: 16

Journal article

Phinikettos I, Gandy A, 2014, An omnibus CUSUM chart for monitoring time to event data, LIFETIME DATA ANALYSIS, Vol: 20, Pages: 481-494, ISSN: 1380-7870

Author Web Link
Cite
Citations: 4

Journal article

Lau FD-H, Gandy A, 2014, RMCMC: A system for updating Bayesian models, Computational Statistics & Data Analysis, Vol: 80, Pages: 99-110, ISSN: 0167-9473

A system to update estimates from a sequence of probability distributions is presented. The aim of the system is to quickly produce estimates with a user-specified bound on the Monte Carlo error. The estimates are based upon weighted samples stored in a database. The stored samples are maintained such that the accuracy of the estimates and quality of the samples are satisfactory. This maintenance involves varying the number of samples in the database and updating their weights. New samples are generated, when required, by a Markov chain Monte Carlo algorithm. The system is demonstrated using a football league model that is used to predict the end of season table. The correctness of the estimates and their accuracy are shown in a simulation using a linear Gaussian model.

Journal article

Noven RC, Veraart AED, Gandy A, 2014, A Levy-driven rainfall model with applications to futures pricing

Journal article

Lau FD-H, Gandy A, 2013, Optimality of Non-Restarting CUSUM Charts, SEQUENTIAL ANALYSIS-DESIGN METHODS AND APPLICATIONS, Vol: 32, Pages: 458-468, ISSN: 0747-4946

Author Web Link
Cite
Citations: 1

Journal article

Gandy A, Kvaløy JT, 2013, Guaranteed Conditional Performance of Control Charts via Bootstrap Methods, Scandinavian Journal of Statistics, Vol: n/a, ISSN: 0303-6898

To use control charts in practice, the in-control state usually has to beestimated. This estimation has a detrimental effect on the performance ofcontrol charts, which is often measured for example by the false alarmprobability or the average run length. We suggest an adjustment of themonitoring schemes to overcome these problems. It guarantees, with a certainprobability, a conditional performance given the estimated in-control state.The suggested method is based on bootstrapping the data used to estimate thein-control state. The method applies to different types of control charts, andalso works with charts based on regression models, survival models, etc. If anonparametric bootstrap is used, the method is robust to model errors. We showlarge sample properties of the adjustment. The usefulness of our approach isdemonstrated through simulation studies.

Journal article

Gandy A, Lau FD-H, 2013, Non-restarting cumulative sum charts and control of the false discovery rate, BIOMETRIKA, Vol: 100, Pages: 261-268, ISSN: 0006-3444

Author Web Link
Cite
Citations: 22

Journal article

Lee MLT, Gail M, Pfeiffer R, Satten G, Cai T, Gandy Aet al., 2013, Preface, Pages: V-VI, ISSN: 0930-0325

Cite

Conference paper

Henrion M, Mortlock DJ, Hand DJ, Gandy Aet al., 2013, Classification and Anomaly Detection for Astronomical Survey Data, Springer Series in Astrostatistics, Pages: 149-184, ISBN: 9781461435075

We present two statistical techniques for astronomical problems: a star-galaxy separator for the UKIRT Infrared Deep Sky Survey (UKIDSS) and a novel anomaly detection method for cross-matched astronomical datasets. The star-galaxy separator is a statistical classification method which outputs class membership probabilities rather than class labels and allows the use of prior knowledge about the source populations. Deep Sloan Digital Sky Survey (SDSS) data from the multiply imaged Stripe 82 region are used to check the results from our classifier, which compares favourably with the UKIDSS pipeline classification algorithm. The anomaly detection method addresses the problem posed by objects having different sets of recorded variables in cross-matched datasets. This prevents the use of methods unable to handle missing values and makes direct comparison between objects difficult. For each source, our method computes anomaly scores in subspaces of the observed feature space and combines them to an overall anomaly score. The proposed technique is very general and can easily be used in applications other than astronomy. The properties and performance of our method are investigated using both real and simulated datasets.

Abstract
Cite
Citations: 2

Book chapter

Henrion M, Hand DJ, Gandy A, Mortlock DJet al., 2013, CASOS: a Subspace Method for Anomaly Detection in High Dimensional Astronomical Databases, STATISTICAL ANALYSIS AND DATA MINING, Vol: 6, Pages: 53-72, ISSN: 1932-1864

Author Web Link
Cite
Citations: 7

Journal article

Gandy A, Trotta R, 2013, Special Issue on Astrostatistics, STATISTICAL ANALYSIS AND DATA MINING, Vol: 6, Pages: 1-+, ISSN: 1932-1864

Journal article

Gandy A, Veraart L, 2012, THE EFFECT OF ESTIMATION IN HIGH-DIMENSIONAL PORTFOLIOS, Mathematical Finance

Journal article

Gandy A, 2012, Performance monitoring of credit portfolios using survival analysis, International Journal of Forecasting, Vol: 28, Pages: 139-144, ISSN: 0169-2070

Cite

Journal article

Ashby D, Bird SM, Hunt I, Grant R, King T, Atkinson AC, Riani M, Gandy A, Kvaloy JT, Caan W, Eames M, Arjas E, Boehning D, Campbell MJ, Jacques RM, Fotheringham J, Maheswaran R, Nicholl J, Chacon JE, Montanero J, Fienberg SE, Gelman A, Geskus RB, Jankowski HK, Longford NT, Louis TA, Mateu J, Mengersen K, Morton T, Playford G, Smith I, Militino AF, Ugarte MD, Porcu E, Alonso Malaver C, Zini A, Scott EM, Gemmell JC, Stein A, Woodall WHet al., 2012, Discussion on the paper by Spiegelhalter, Sherlaw-Johnson, Bardsley, Blunt, Wood and Grigg, JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, Vol: 175, Pages: 25-47, ISSN: 0964-1998

Journal article

Gandy A, Rubin-Delanchy P, 2011, An algorithm to compute the power of Monte Carlo tests with guaranteed precision

This article presents an algorithm that generates an exact (conservative)confidence interval of a specified length and coverage probability for thepower of a Monte Carlo test (such as a bootstrap or permutation test). It isthe first method that achieves this aim for almost any Monte Carlo test. Theexisting research on power estimation for Monte Carlo tests has focused onobtaining as accurate a result as possible for a fixed computational effort.However, the methods proposed do not provide any guarantee of precision, in thesense that they cannot report a confidence interval to accompany their estimateof the power. Conversely in this article the computational effort is random.The algorithm operates until a confidence interval can be constructed thatmeets the requirements of the user, in terms of length and coverageprobability. We show that, surprisingly, by generating two more datasets thatwhat might have been assumed to be sufficient, the expected number of stepsrequired by the algorithm is finite in many cases of practical interest. Theseinclude, for instance, any situation where the distribution of the p-value isabsolutely continuous or if it is discrete with finite support. The algorithmis implemented in the R package simctest.

Journal article

Phinikettos I, Gandy A, 2011, Fast computation of high-dimensional multivariate normal probabilities, COMPUTATIONAL STATISTICS & DATA ANALYSIS, Vol: 55, Pages: 1521-1529, ISSN: 0167-9473

Author Web Link
Cite
Citations: 7

Journal article

Henrion M, Mortlock DJ, Hand DJ, Gandy Aet al., 2011, A Bayesian approach to star-galaxy classification, Monthly Notices of the Royal Astronomical Society, Vol: 412, Pages: 2286-2302, ISSN: 0035-8711

Star–galaxy classification is one of the most fundamental data-processing tasks in survey astronomy and a critical starting point for the scientific exploitation of survey data. Star–galaxy classification for bright sources can be done with almost complete reliability, but for the numerous sources close to a survey’s detection limit each image encodes only limited morphological information about the source. In this regime, from which many of the new scientific discoveries are likely to come, it is vital to utilize all the available information about a source, both from multiple measurements and from prior knowledge about the star and galaxy populations. This also makes it clear that it is more useful and realistic to provide classification probabilities than decisive classifications. All these desiderata can be met by adopting a Bayesian approach to star–galaxy classification, and we develop a very general formalism for doing so. An immediate implication of applying Bayes’s theorem to this problem is that it is formally impossible to combine morphological measurements in different bands without using colour information as well; however, we develop several approximations that disregard colour information as much as possible. The resultant scheme is applied to data from the UKIRT Infrared Deep Sky Survey (UKIDSS) and tested by comparing the results to deep Sloan Digital Sky Survey (SDSS) Stripe 82 measurements of the same sources. The Bayesian classification probabilities obtained from the UKIDSS data agree well with the deep SDSS classifications both overall (a mismatch rate of 0.022 compared to 0.044 for the UKIDSS pipeline classifier) and close to the UKIDSS detection limit (a mismatch rate of 0.068 compared to 0.075 for the UKIDSS pipeline classifier). The Bayesian formalism developed here can be applied to improve the reliability of any star–galaxy classification schemes based on the measured values of morphology statistics alo

Journal article

Gandy A, Kvaloy JT, Bottle A, Zhou Fet al., 2010, Risk-adjusted monitoring of time to event, BIOMETRIKA, Vol: 97, Pages: 375-388, ISSN: 0006-3444

Author Web Link
Cite
Citations: 25

Journal article

Jen MH, Johnston R, Jones K, Harris R, Gandy Aet al., 2010, INTERNATIONAL VARIATIONS IN LIFE EXPECTANCY: A SPATIO-TEMPORAL ANALYSIS, TIJDSCHRIFT VOOR ECONOMISCHE EN SOCIALE GEOGRAFIE, Vol: 101, Pages: 73-90, ISSN: 0040-747X

Author Web Link
Cite
Citations: 6

Journal article

Professor Axel Gandy

Contact

Location

Summary