## Publications

16 results found

Battey H, Cox DR, 2020, High-dimensional nuisance parameters: an example from parametric survival analysis, *Information Geometry*

Beale N, Battey HS, Davison AC,
et al., 2020, An unethical optimization principle, *Royal Society Open Science*, Vol: 7, Pages: 1-11, ISSN: 2054-5703

If an artificial intelligence aims to maximize risk-adjusted return, then under mild conditions it is disproportionately likely to pick an unethical strategy unless the objective function allows sufficiently for this risk. Even if the proportion η of available unethical strategies is small, the probability pU of picking an unethical strategy can become large; indeed, unless returns are fat-tailed pU tends to unity as the strategy space becomes large. We define an unethical odds ratio, Υ (capital upsilon), that allows us to calculate pU from η, and we derive a simple formula for the limit of Υ as the strategy space becomes large. We discuss the estimation of Υ and pU in finite cases and how to deal with infinite strategy spaces. We show how the principle can be used to help detect unethical strategies and to estimate η. Finally we sketch some policy implications of this work.

Hoeltgebaum H, Battey H, 2020, HCmodelSets: An R package for specifying sets of well-fitting models in high dimensions, *The R Journal*, Vol: 11, Pages: 370-379, ISSN: 2073-4859

In the context of regression with a large number of explanatory variables, Cox and Battey(2017) emphasize that if there are alternative reasonable explanations of the data that are statisticallyindistinguishable, one should aim to specify as many of these explanations as is feasible. The standardpractice, by contrast, is to report a single model effective for prediction. The present paper illustratesthe R implementation of the new ideas in the packageHCmodelSets, using simple reproducibleexamples and real data. Results of some simulation experiments are also reported.

Battey HS, 2019, On sparsity scales and covariance matrix transformations, *Biometrika*, Vol: 106, Pages: 605-617

We develop a theory of covariance and concentration matrix estimation on any given or estimated sparsity scale when the matrix dimension is larger than the sample size. Nonstandard sparsity scales are justified when such matrices are nuisance parameters, distinct from interest parameters, which should always have a direct subject-matter interpretation. The matrix logarithmic and inverse scales are studied as special cases, with the corollary that a constrained optimization-based approach is unnecessary for estimating a sparse concentration matrix. It is shown through simulations that for large unstructured covariance matrices, there can be appreciable advantages to estimating a sparse approximation to the log-transformed covariance matrix and converting the conclusions back to the scale of interest.

Battey H, Cox DR, Jackson MV, 2019, On the linear in probability model for binary data, *Royal Society Open Science*, Vol: 6

Battey HS, Cox DR, 2018, Large numbers of explanatory variables: a probabilistic assessment, *Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences*, Vol: 474

Avella M, Battey HS, Fan J,
et al., 2018, Robust estimation of high-dimensional covariance and precision matrices, *Biometrika*, Vol: 105, Pages: 271-284

High-dimensional data are often most plausibly generated from distributions with complex structure and leptokurtosis in some or all components. Covariance and precision matrices provide a useful summary of such structure, yet the performance of popular matrix estimators typically hinges upon a sub-Gaussianity assumption. This paper presents robust matrix estimators whose performance is guaranteed for a much richer class of distributions. The proposed estimators, under a bounded fourth moment assumption, achieve the same minimax convergence rates as do existing methods under a sub-Gaussianity assumption. Consistency of the proposed estimators is also established under the weak assumption of bounded2+ϵmoments forϵ∈(0,2). The associated convergence rates depend onϵ.

Battey HS, Zhu Z, Fan J,
et al., 2018, Distributed testing and estimation in sparse high dimensional models, *Annals of Statistics*, Vol: 46, Pages: 1352-1382

Cox DR, Battey HS, 2017, Large numbers of explanatory variables, a semi-descriptive analysis, *Proceedings of the National Academy of Sciences of the United States of America*, Vol: 114, Pages: 8592-8595

Data with a relatively small number of study individuals and a very large number of potential explanatory features arise particularly, but by no means only, in genomics. A powerful method of analysis, the lasso [Tibshirani R (1996) J Roy Stat Soc B 58:267–288], takes account of an assumed sparsity of effects, that is, that most of the features are nugatory. Standard criteria for model fitting, such as the method of least squares, are modified by imposing a penalty for each explanatory variable used. There results a single model, leaving open the possibility that other sparse choices of explanatory features fit virtually equally well. The method suggested in this paper aims to specify simple models that are essentially equally effective, leaving detailed interpretation to the specifics of the particular study. The method hinges on the ability to make initially a very large number of separate analyses, allowing each explanatory feature to be assessed in combination with many other such features. Further stages allow the assessment of more complex patterns such as nonlinear and interactive dependences. The method has formal similarities to so-called partially balanced incomplete block designs introduced 80 years ago [Yates F (1936) J Agric Sci 26:424–455] for the study of large-scale plant breeding trials. The emphasis in this paper is strongly on exploratory analysis; the more formal statistical properties obtained under idealized assumptions will be reported separately.

Battey HS, 2017, Eigen structure of a new class of structured covariance and inverse covariance matrices, *Bernoulli*, Vol: 23, Pages: 3166-3177

There is a one to one mapping between a p dimensional strictly positive definite covariancematrix Σ and its matrix logarithm L. We exploit this relationship to study thestructure induced on Σ through a sparsity constraint on L. Consider L as a randommatrix generated through a basis expansion, with the support of the basis coefficientstaken as a simple random sample of size s = s∗from the index set [p(p + 1)/2] ={1, . . . , p(p + 1)/2}. We find that the expected number of non-unit eigenvalues of Σ, denotedE[|A|], is approximated with near perfect accuracy by the solution of the equation4p + p(p − 1)2(p + 1)hlog pp − d −d2p(p − d)i− s∗ = 0.Furthermore, the corresponding eigenvectors are shown to possess only p − |Ac| nonzeroentries. We use this result to elucidate the precise structure induced on Σ and Σ−1.We demonstrate that a positive definite symmetric matrix whose matrix logarithm issparse is significantly less sparse in the original domain. This finding has importantimplications in high dimensional statistics where it is important to exploit structure inorder to construct consistent estimators in non-trivial norms. An estimator exploitingthe structure of the proposed class is presented.

Nieto-Reyes A, Battey HS, 2017, Statistical functional depth, Functional Statistics and Related Fields, Editors: Aneiros, Bongiorno, Cao, Vieu, Publisher: Springer, Pages: 197-202

This presentation is a summary of the paper [14], which formalizes the definition of statistical functional depth, with some extensions on the matter.

Battey H, Feng Q, Smith RJ, 2016, Improving confidence set estimation when parameters are weakly identified, *Statistics and Probability Letters*, Vol: 118, Pages: 117-123

© 2016 Elsevier B.V. We consider inference in weakly identified moment condition models when additional partially identifying moment inequality constraints are available. We detail the limiting distribution of the estimation criterion function and consequently propose a confidence set estimator for the true parameter.

Nieto-Reyes A, Battey H, 2016, A Topologically Valid Definition of Depth for Functional Data, *Statistical Science*, Vol: 31, Pages: 61-79

The main focus of this work is on providing a formal definition of statistical depth for functional data on the basis of six properties, recognising topological features such as continuity, smoothness and contiguity. Amongst our depth defining properties is one that addresses the delicate challenge of inherent partial observability of functional data, with fulfillment giving rise to a minimal guarantee on the performance of the empirical depth beyond the idealised and practically infeasible case of full observability. As an incidental product, functional depths satisfying our definition achieve a robustness that is commonly ascribed to depth, despite the absence of a formal guarantee in the multivariate definition of depth. We demonstrate the fulfillment or otherwise of our properties for six widely used functional depth proposals, thereby providing a systematic basis for selection of a depth function.

Battey H, Linton O, 2013, Nonparametric estimation of multivariate elliptic densities via finite mixture sieves, *Journal of Multivariate Analysis*, Vol: 123, Pages: 43-67, ISSN: 0047-259X

This paper considers the class of p-dimensional elliptic distributions (p≥1) satisfying the consistency property (Kano, 1994) [23] and within this general framework presents a two-stage nonparametric estimator for the Lebesgue density based on Gaussian mixture sieves. Under the on-line Exponentiated Gradient (EG) algorithm of Helmbold et al. (1997) [20] and without restricting the mixing measure to have compact support, the estimator produces estimates converging uniformly in probability to the true elliptic density at a rate that is independent of the dimension of the problem, hence circumventing the familiar curse of dimensionality inherent to many semiparametric estimators. The rate performance of our estimator depends on the tail behaviour of the underlying mixing density (and hence that of the data) rather than smoothness properties. In fact, our method achieves a rate of at least Op(n−1/4), provided only some positive moment exists. When further moments exists, the rate improves reaching Op(n−3/8) as the tails of the true density converge to those of a normal. Unlike the elliptic density estimator of Liebscher (2005) [25], our sieve estimator always yields an estimate that is a valid density, and is also attractive from a practical perspective as it accepts data as a stream, thus significantly reducing computational and storage requirements. Monte Carlo experimentation indicates encouraging finite sample performance over a range of elliptic densities. The estimator is also implemented in a binary classification task using the well-known Wisconsin breast cancer dataset.

Battey H, Sancetta A, 2013, Conditional estimation for dependent functional data, *Journal of Multivariate Analysis*, Vol: 120, Pages: 1-17

Suppose we observe a Markov chain taking values in a functional space. We are interested in exploiting the time series dependence in these infinite dimensional data in order to make non-trivial predictions about the future. Making use of the Karhunen–Loève (KL) representation of functional random variables in terms of the eigenfunctions of the covariance operator, we present a deliberately over-simplified nonparametric model, which allows us to achieve dimensionality reduction by considering one dimensional nearest neighbour (NN) estimators for the transition distribution of the random coefficients of the KL expansion. Under regularity conditions, we show that the NN estimator is consistent even when the coefficients of the KL expansion are estimated from the observations. This also allows us to deduce the consistency of conditional regression function estimators for functional data. We show via simulations and two empirical examples that the proposed NN estimator outperforms the state of the art when data are generated both by the functional autoregressive (FAR) model of Bosq (2000) [8] and by more general data generating mechanisms.

Beale N, Rand DG, Battey H,
et al., 2011, Individual versus systemic risk and the Regulator's Dilemma, *Proceedings of the National Academy of Sciences*, Vol: 108, Pages: 12647-12652

The global financial crisis of 2007–2009 exposed critical weaknesses in the financial system. Many proposals for financial reform address the need for systemic regulation—that is, regulation focused on the soundness of the whole financial system and not just that of individual institutions. In this paper, we study one particular problem faced by a systemic regulator: the tension between the distribution of assets that individual banks would like to hold and the distribution across banks that best supports system stability if greater weight is given to avoiding multiple bank failures. By diversifying its risks, a bank lowers its own probability of failure. However, if many banks diversify their risks in similar ways, then the probability of multiple failures can increase. As more banks fail simultaneously, the economic disruption tends to increase disproportionately. We show that, in model systems, the expected systemic cost of multiple failures can be largely explained by two global parameters of risk exposure and diversity, which can be assessed in terms of the risk exposures of individual actors. This observation hints at the possibility of regulatory intervention to promote systemic stability by incentivizing a more diverse diversification among banks. Such intervention offers the prospect of an additional lever in the armory of regulators, potentially allowing some combination of improved system stability and reduced need for additional capital.

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.