Imperial College London


Faculty of Natural SciencesDepartment of Mathematics

Lecturer in Statistics



+44 (0)20 7594 2936h.battey Website




545Huxley BuildingSouth Kensington Campus





Publication Type

19 results found

Rybak J, Battey H, 2021, Sparsity induced by covariance transformation: some deterministic and probabilistic results, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol: 477

Motivated by statistical challenges arising in modern scientific fields, notably genomics, this paper seeks embeddings in which relevant covariance models are sparse. The work exploits a bijective mapping between a strictly positive definite matrix and its orthonormal eigen-decomposition, and between an orthonormal eigenvector matrix and its principle matrix logarithm. This leads to a representation of covariance matrices in terms of skew-symmetric matrices, for which there is a natural basis representation, and through which sparsity is conveniently explored. This theoretical work establishes the possibility of exploiting sparsity in the new parameterisation and converting the conclusion back to the one of interest, a prospect of high relevance in statistics. The statistical aspects associated with this operation, while not a focus of the present work, are briefly discussed.

Journal article

Battey H, Cox DR, 2021, Some perspectives on inference and asymptotic analysis in high dimensions, Statistical Science

With very large amounts of data, important aspects of statistical analysis may appear largely descriptive in that the role of probability sometimes seems limited or totally absent. The main emphasis of the present paper lies on contexts where formulation in terms of a probabilistic model is feasible and fruitful but to be at all realistic large numbers of unknown parameters need consideration. Then many of the standard approaches to statistical analysis, for instance direct application of the method of maximum likelihood, or the use of flat priors, often encounter difficulties. After a brief discussion of broad conceptual issues and the use of asymptotic analysis in statistical inference, we provide some new perspectives on aspects of high-dimensional statistical theory, emphasizing particularly a number of important open problems.

Journal article

Nieto-Reyes A, Battey H, 2021, A topologically valid construction of depth for functional data, Journal of Multivariate Analysis

Journal article

Battey H, Cox DR, 2020, High-dimensional nuisance parameters: an example from parametric survival analysis, Information Geometry, Vol: 3, Pages: 119-148

Journal article

Beale N, Battey HS, Davison AC, MacKay RSet al., 2020, An unethical optimization principle, Royal Society Open Science, Vol: 7, Pages: 1-11

If an artificial intelligence aims to maximize risk-adjusted return, then under mild conditions it is disproportionately likely to pick an unethical strategy unless the objective function allows sufficiently for this risk. Even if the proportion η of available unethical strategies is small, the probability pU of picking an unethical strategy can become large; indeed, unless returns are fat-tailed pU tends to unity as the strategy space becomes large. We define an unethical odds ratio, Υ (capital upsilon), that allows us to calculate pU from η, and we derive a simple formula for the limit of Υ as the strategy space becomes large. We discuss the estimation of Υ and pU in finite cases and how to deal with infinite strategy spaces. We show how the principle can be used to help detect unethical strategies and to estimate η. Finally we sketch some policy implications of this work.

Journal article

Hoeltgebaum H, Battey H, 2020, HCmodelSets: An R package for specifying sets of well-fitting models in high dimensions, The R Journal, Vol: 11, Pages: 370-379

In the context of regression with a large number of explanatory variables, Cox and Battey(2017) emphasize that if there are alternative reasonable explanations of the data that are statisticallyindistinguishable, one should aim to specify as many of these explanations as is feasible. The standardpractice, by contrast, is to report a single model effective for prediction. The present paper illustratesthe R implementation of the new ideas in the packageHCmodelSets, using simple reproducibleexamples and real data. Results of some simulation experiments are also reported.

Journal article

Battey HS, 2019, On sparsity scales and covariance matrix transformations, Biometrika, Vol: 106, Pages: 605-617, ISSN: 0006-3444

We develop a theory of covariance and concentration matrix estimation on any given or estimated sparsity scale when the matrix dimension is larger than the sample size. Nonstandard sparsity scales are justified when such matrices are nuisance parameters, distinct from interest parameters, which should always have a direct subject-matter interpretation. The matrix logarithmic and inverse scales are studied as special cases, with the corollary that a constrained optimization-based approach is unnecessary for estimating a sparse concentration matrix. It is shown through simulations that for large unstructured covariance matrices, there can be appreciable advantages to estimating a sparse approximation to the log-transformed covariance matrix and converting the conclusions back to the scale of interest.

Journal article

Battey H, Cox DR, Jackson MV, 2019, On the linear in probability model for binary data, Royal Society Open Science, Vol: 6

Journal article

Battey HS, Cox DR, 2018, Large numbers of explanatory variables: a probabilistic assessment, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol: 474

Journal article

Avella M, Battey HS, Fan J, Li Qet al., 2018, Robust estimation of high-dimensional covariance and precision matrices, Biometrika, Vol: 105, Pages: 271-284, ISSN: 0006-3444

High-dimensional data are often most plausibly generated from distributions with complex structure and leptokurtosis in some or all components. Covariance and precision matrices provide a useful summary of such structure, yet the performance of popular matrix estimators typically hinges upon a sub-Gaussianity assumption. This paper presents robust matrix estimators whose performance is guaranteed for a much richer class of distributions. The proposed estimators, under a bounded fourth moment assumption, achieve the same minimax convergence rates as do existing methods under a sub-Gaussianity assumption. Consistency of the proposed estimators is also established under the weak assumption of bounded2+ϵmoments forϵ∈(0,2). The associated convergence rates depend onϵ.

Journal article

Battey HS, Fan J, Liu H, Lu J, Zhu Zet al., 2018, Distributed testing and estimation in sparse high dimensional models, Annals of Statistics, Vol: 46, Pages: 1352-1382, ISSN: 0090-5364

This paper studies hypothesis testing and parameter estimation in the context of the divide-and-conquer algorithm. In a unified likelihood-based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from k subsamples of size n/k, where n is the sample size. In both low dimensional and sparse high dimensional settings, we address the important question of how large k can be, as n grows large, such that the loss of efficiency due to the divide-and-conquer algorithm is negligible. In other words, the resulting estimators have the same inferential efficiencies and estimation rates as an oracle with access to the full sample. Thorough numerical results are provided to back up the theory.

Journal article

Cox DR, Battey HS, 2017, Large numbers of explanatory variables, a semi-descriptive analysis, Proceedings of the National Academy of Sciences of USA, Vol: 114, Pages: 8592-8595, ISSN: 0027-8424

Data with a relatively small number of study individuals and a very large number of potential explanatory features arise particularly, but by no means only, in genomics. A powerful method of analysis, the lasso [Tibshirani R (1996) J Roy Stat Soc B 58:267–288], takes account of an assumed sparsity of effects, that is, that most of the features are nugatory. Standard criteria for model fitting, such as the method of least squares, are modified by imposing a penalty for each explanatory variable used. There results a single model, leaving open the possibility that other sparse choices of explanatory features fit virtually equally well. The method suggested in this paper aims to specify simple models that are essentially equally effective, leaving detailed interpretation to the specifics of the particular study. The method hinges on the ability to make initially a very large number of separate analyses, allowing each explanatory feature to be assessed in combination with many other such features. Further stages allow the assessment of more complex patterns such as nonlinear and interactive dependences. The method has formal similarities to so-called partially balanced incomplete block designs introduced 80 years ago [Yates F (1936) J Agric Sci 26:424–455] for the study of large-scale plant breeding trials. The emphasis in this paper is strongly on exploratory analysis; the more formal statistical properties obtained under idealized assumptions will be reported separately.

Journal article

Battey HS, 2017, Eigen structure of a new class of structured covariance and inverse covariance matrices, Bernoulli, Vol: 23, Pages: 3166-3177

There is a one to one mapping between a p dimensional strictly positive definite covariancematrix Σ and its matrix logarithm L. We exploit this relationship to study thestructure induced on Σ through a sparsity constraint on L. Consider L as a randommatrix generated through a basis expansion, with the support of the basis coefficientstaken as a simple random sample of size s = s∗from the index set [p(p + 1)/2] ={1, . . . , p(p + 1)/2}. We find that the expected number of non-unit eigenvalues of Σ, denotedE[|A|], is approximated with near perfect accuracy by the solution of the equation4p + p(p − 1)2(p + 1)hlog pp − d −d2p(p − d)i− s∗ = 0.Furthermore, the corresponding eigenvectors are shown to possess only p − |Ac| nonzeroentries. We use this result to elucidate the precise structure induced on Σ and Σ−1.We demonstrate that a positive definite symmetric matrix whose matrix logarithm issparse is significantly less sparse in the original domain. This finding has importantimplications in high dimensional statistics where it is important to exploit structure inorder to construct consistent estimators in non-trivial norms. An estimator exploitingthe structure of the proposed class is presented.

Journal article

Nieto-Reyes A, Battey HS, 2017, Statistical functional depth, Functional Statistics and Related Fields, Editors: Aneiros, Bongiorno, Cao, Vieu, Publisher: Springer, Pages: 197-202

This presentation is a summary of the paper [14], which formalizes the definition of statistical functional depth, with some extensions on the matter.

Book chapter

Battey H, Feng Q, Smith RJ, 2016, Improving confidence set estimation when parameters are weakly identified, Statistics and Probability Letters, Vol: 118, Pages: 117-123

© 2016 Elsevier B.V. We consider inference in weakly identified moment condition models when additional partially identifying moment inequality constraints are available. We detail the limiting distribution of the estimation criterion function and consequently propose a confidence set estimator for the true parameter.

Journal article

Nieto-Reyes A, Battey H, 2016, A topologically valid definition of depth for functional data, Statistical Science, Vol: 31, Pages: 61-79

The main focus of this work is on providing a formal definition of statistical depth for functional data on the basis of six properties, recognising topological features such as continuity, smoothness and contiguity. Amongst our depth defining properties is one that addresses the delicate challenge of inherent partial observability of functional data, with fulfillment giving rise to a minimal guarantee on the performance of the empirical depth beyond the idealised and practically infeasible case of full observability. As an incidental product, functional depths satisfying our definition achieve a robustness that is commonly ascribed to depth, despite the absence of a formal guarantee in the multivariate definition of depth. We demonstrate the fulfillment or otherwise of our properties for six widely used functional depth proposals, thereby providing a systematic basis for selection of a depth function.

Journal article

Battey H, Linton O, 2013, Nonparametric estimation of multivariate elliptic densities via finite mixture sieves, Journal of Multivariate Analysis, Vol: 123, Pages: 43-67, ISSN: 0047-259X

This paper considers the class of p-dimensional elliptic distributions (p≥1) satisfying the consistency property (Kano, 1994) [23] and within this general framework presents a two-stage nonparametric estimator for the Lebesgue density based on Gaussian mixture sieves. Under the on-line Exponentiated Gradient (EG) algorithm of Helmbold et al. (1997) [20] and without restricting the mixing measure to have compact support, the estimator produces estimates converging uniformly in probability to the true elliptic density at a rate that is independent of the dimension of the problem, hence circumventing the familiar curse of dimensionality inherent to many semiparametric estimators. The rate performance of our estimator depends on the tail behaviour of the underlying mixing density (and hence that of the data) rather than smoothness properties. In fact, our method achieves a rate of at least Op(n−1/4), provided only some positive moment exists. When further moments exists, the rate improves reaching Op(n−3/8) as the tails of the true density converge to those of a normal. Unlike the elliptic density estimator of Liebscher (2005) [25], our sieve estimator always yields an estimate that is a valid density, and is also attractive from a practical perspective as it accepts data as a stream, thus significantly reducing computational and storage requirements. Monte Carlo experimentation indicates encouraging finite sample performance over a range of elliptic densities. The estimator is also implemented in a binary classification task using the well-known Wisconsin breast cancer dataset.

Journal article

Battey H, Sancetta A, 2013, Conditional estimation for dependent functional data, Journal of Multivariate Analysis, Vol: 120, Pages: 1-17

Suppose we observe a Markov chain taking values in a functional space. We are interested in exploiting the time series dependence in these infinite dimensional data in order to make non-trivial predictions about the future. Making use of the Karhunen–Loève (KL) representation of functional random variables in terms of the eigenfunctions of the covariance operator, we present a deliberately over-simplified nonparametric model, which allows us to achieve dimensionality reduction by considering one dimensional nearest neighbour (NN) estimators for the transition distribution of the random coefficients of the KL expansion. Under regularity conditions, we show that the NN estimator is consistent even when the coefficients of the KL expansion are estimated from the observations. This also allows us to deduce the consistency of conditional regression function estimators for functional data. We show via simulations and two empirical examples that the proposed NN estimator outperforms the state of the art when data are generated both by the functional autoregressive (FAR) model of Bosq (2000) [8] and by more general data generating mechanisms.

Journal article

Beale N, Rand DG, Battey H, Croxson K, May RM, Nowak MAet al., 2011, Individual versus systemic risk and the Regulator's Dilemma, Proceedings of the National Academy of Sciences, Vol: 108, Pages: 12647-12652

The global financial crisis of 2007–2009 exposed critical weaknesses in the financial system. Many proposals for financial reform address the need for systemic regulation—that is, regulation focused on the soundness of the whole financial system and not just that of individual institutions. In this paper, we study one particular problem faced by a systemic regulator: the tension between the distribution of assets that individual banks would like to hold and the distribution across banks that best supports system stability if greater weight is given to avoiding multiple bank failures. By diversifying its risks, a bank lowers its own probability of failure. However, if many banks diversify their risks in similar ways, then the probability of multiple failures can increase. As more banks fail simultaneously, the economic disruption tends to increase disproportionately. We show that, in model systems, the expected systemic cost of multiple failures can be largely explained by two global parameters of risk exposure and diversity, which can be assessed in terms of the risk exposures of individual actors. This observation hints at the possibility of regulatory intervention to promote systemic stability by incentivizing a more diverse diversification among banks. Such intervention offers the prospect of an additional lever in the armory of regulators, potentially allowing some combination of improved system stability and reduced need for additional capital.

Journal article

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00828472&limit=30&person=true