Most of the members of this group are from the Statistics Section and Biomaths research group of the Department of Mathematics. Below you can find a list of research areas that members of this group are currently working on and/or would like to work on by applying their developed mathematical and statistical methods.

Research areas

Research areas


Publications

Search or filter publications

Filter by type:

Filter by publication type

Filter by year:

to

Results

  • Showing results for:
  • Reset all filters

Search results

  • Journal article
    Battey HS, 2019,

    On sparsity scales and covariance matrix transformations

    , Biometrika, ISSN: 0006-3444

    We develop a theory of covariance and concentration matrix estimation on any given or es-timated sparsity scale when the matrix dimension is larger than the sample size. Non-standardsparsity scales are justified when such matrices are nuisance parameters, distinct from interest pa-rameters which should always have a direct subject-matter interpretation. The matrix logarithmicand inverse scales are studied as special cases, with the corollary that a constrained optimization-10based approach is unnecessary for estimating a sparse concentration matrix. It is shown throughsimulations that, for large unstructured covariance matrices, there can be appreciable advantagesto estimating a sparse approximation to the log-transformed covariance matrix and convertingthe conclusions back to the scale of interest.

  • Journal article
    Clarke JM, Warren LR, Arora S, Barahona M, Darzi AWet al., 2018,

    Guiding interoperable electronic health records through patient-sharing networks

    , npj Digital Medicine, Vol: 1, ISSN: 2398-6352

    Effective sharing of clinical information between care providers is a critical component of a safe, efficient health system. National data-sharing systems may be costly, politically contentious and do not reflect local patterns of care delivery. This study examines hospital attendances in England from 2013 to 2015 to identify instances of patient sharing between hospitals. Of 19.6 million patients receiving care from 155 hospital care providers, 130 million presentations were identified. On 14.7 million occasions (12%), patients attended a different hospital to the one they attended on their previous interaction. A network of hospitals was constructed based on the frequency of patient sharing between hospitals which was partitioned using the Louvain algorithm into ten distinct data-sharing communities, improving the continuity of data sharing in such instances from 0 to 65–95%. Locally implemented data-sharing communities of hospitals may achieve effective accessibility of clinical information without a large-scale national interoperable information system.

  • Journal article
    Battey HS, Cox DR, 2018,

    Large numbers of explanatory variables: a probabilistic assessment

    , Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol: 474, ISSN: 1364-5021

    Recently, Cox and Battey (2017 Proc. Natl Acad. Sci. USA 114, 8592–8595 (doi:10.1073/pnas.1703764114)) outlined a procedure for regression analysis when there are a small number of study individuals and a large number of potential explanatory variables, but relatively few of the latter have a real effect. The present paper reports more formal statistical properties. The results are intended primarily to guide the choice of key tuning parameters.

  • Journal article
    Avella M, Battey HS, Fan J, Li Qet al., 2018,

    Robust estimation of high-dimensional covariance and precision matrices

    , Biometrika, Vol: 105, Pages: 271-284

    High-dimensional data are often most plausibly generated from distributions with complex structure and leptokurtosis in some or all components. Covariance and precision matrices provide a useful summary of such structure, yet the performance of popular matrix estimators typically hinges upon a sub-Gaussianity assumption. This paper presents robust matrix estimators whose performance is guaranteed for a much richer class of distributions. The proposed estimators, under a bounded fourth moment assumption, achieve the same minimax convergence rates as do existing methods under a sub-Gaussianity assumption. Consistency of the proposed estimators is also established under the weak assumption of bounded2+ϵmoments forϵ∈(0,2). The associated convergence rates depend onϵ.

  • Journal article
    Battey HS, Zhu Z, Fan J, Lu J, Liu Het al., 2018,

    Distributed testing and estimation in sparse high dimensional models

    , Annals of Statistics, Vol: 46, Pages: 1352-1382

    This paper studies hypothesis testing and parameter estimation in the context of the divide-and-conquer algorithm. In a unified likelihood-based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from k subsamples of size n/k, where n is the sample size. In both low dimensional and sparse high dimensional settings, we address the important question of how large k can be, as n grows large, such that the loss of efficiency due to the divide-and-conquer algorithm is negligible. In other words, the resulting estimators have the same inferential efficiencies and estimation rates as an oracle with access to the full sample. Thorough numerical results are provided to back up the theory.

  • Journal article
    Aryaman J, Johnston IG, Jones NS, 2017,

    Mitochondrial DNA Density Homeostasis Accounts for a Threshold Effect in a Cybrid Model of a Human Mitochondrial Disease

    , Biochemical Journal, Vol: 474, Pages: 4019-4034, ISSN: 1470-8728

    Mitochondrial dysfunction is involved in a wide array of devastating diseases, but the heterogeneity and complexity of the symptoms of these diseases challenges theoretical understanding of their causation. With the explosion of omics data, we have the unprecedented opportunity to gain deep understanding of the biochemical mechanisms of mitochondrial dysfunction. This goal raises the outstanding need to make these complex datasets interpretable. Quantitative modelling allows us to translate such datasets into intuition and suggest rational biomedical treatments. Taking an interdisciplinary approach, we use a recently published large-scale dataset and develop a descriptive and predictive mathematical model of progressive increase in mutant load of the MELAS 3243A>G mtDNA mutation. The experimentally observed behaviour is surprisingly rich, but we find that our simple, biophysically motivated model intuitively accounts for this heterogeneity and yields a wealth of biological predictions. Our findings suggest that cells attempt to maintain wild-type mtDNA density through cell volume reduction, and thus power demand reduction, until a minimum cell volume is reached. Thereafter, cells toggle from demand reduction to supply increase, up-regulating energy production pathways. Our analysis provides further evidence for the physiological significance of mtDNA density and emphasizes the need for performing single-cell volume measurements jointly with mtDNA quantification. We propose novel experiments to verify the hypotheses made here to further develop our understanding of the threshold effect and connect with rational choices for mtDNA disease therapies.

  • Journal article
    Fulcher B, Jones NS, 2017,

    hctsa: A computational framework for automated timeseriesphenotyping using massive feature extraction

    , Cell Systems, Vol: 5, Pages: 527-531.e3, ISSN: 2405-4712

    Phenotype measurements frequently take the form of time series, but we currently lack a systematic method for relating these complex data streams to scientifically meaningful outcomes, such as relating the movement dynamics of organisms to their genotype or measurements of brain dynamics of a patient to their disease diagnosis. Previous work addressed this problem by comparing implementations of thousands of diverse scientific time-series analysis methods in an approach termed highly comparative time-series analysis. Here, we introduce hctsa, a software tool for applying this methodological approach to data. hctsa includes an architecture for computing over 7,700 time-series features and a suite of analysis and visualization algorithms to automatically select useful and interpretable time-series features for a given application. Using exemplar applications to high-throughput phenotyping experiments, we show how hctsa allows researchers to leverage decades of time-series research to quantify and understand informative structure in time-series data.

  • Journal article
    Cox DR, Battey HS, 2017,

    Large numbers of explanatory variables, a semi-descriptive analysis

    , Proceedings of the National Academy of Sciences of the United States of America, Vol: 114, Pages: 8592-8595

    Data with a relatively small number of study individuals and a very large number of potential explanatory features arise particularly, but by no means only, in genomics. A powerful method of analysis, the lasso [Tibshirani R (1996) J Roy Stat Soc B 58:267–288], takes account of an assumed sparsity of effects, that is, that most of the features are nugatory. Standard criteria for model fitting, such as the method of least squares, are modified by imposing a penalty for each explanatory variable used. There results a single model, leaving open the possibility that other sparse choices of explanatory features fit virtually equally well. The method suggested in this paper aims to specify simple models that are essentially equally effective, leaving detailed interpretation to the specifics of the particular study. The method hinges on the ability to make initially a very large number of separate analyses, allowing each explanatory feature to be assessed in combination with many other such features. Further stages allow the assessment of more complex patterns such as nonlinear and interactive dependences. The method has formal similarities to so-called partially balanced incomplete block designs introduced 80 years ago [Yates F (1936) J Agric Sci 26:424–455] for the study of large-scale plant breeding trials. The emphasis in this paper is strongly on exploratory analysis; the more formal statistical properties obtained under idealized assumptions will be reported separately.

  • Journal article
    Griffie J, Shlomovich L, Williamson D, Shannon M, Aarons J, Khuon S, Burn G, Boelen L, Peters R, Cope A, Cohen E, Rubin-Delanchy P, Owen Det al., 2017,

    3D Bayesian cluster analysis of super-resolution data reveals LAT recruitment to the T cell synapse

    , Scientific Reports, Vol: 7, ISSN: 2045-2322

    Single-molecule localisation microscopy (SMLM) allows the localisation of fluorophores with a precision of 10–30 nm, revealing the cell’s nanoscale architecture at the molecular level. Recently, SMLM has been extended to 3D, providing a unique insight into cellular machinery. Although cluster analysis techniques have been developed for 2D SMLM data sets, few have been applied to 3D. This lack of quantification tools can be explained by the relative novelty of imaging techniques such as interferometric photo-activated localisation microscopy (iPALM). Also, existing methods that could be extended to 3D SMLM are usually subject to user defined analysis parameters, which remains a major drawback. Here, we present a new open source cluster analysis method for 3D SMLM data, free of user definable parameters, relying on a model-based Bayesian approach which takes full account of the individual localisation precisions in all three dimensions. The accuracy and reliability of the method is validated using simulated data sets. This tool is then deployed on novel experimental data as a proof of concept, illustrating the recruitment of LAT to the T-cell immunological synapse in data acquired by iPALM providing ~10 nm isotropic resolution.Introduction.

  • Journal article
    Aryaman J, hoitzing H, burgstaller J, johnston I, Jones NSet al., 2017,

    Mitochondrial heterogeneity, metabolic scaling and cell death

    , Bioessays, Vol: 39, ISSN: 1521-1878

    Heterogeneity in mitochondrial content has been previously suggested as a major contributor to cellular noise, with multiple studies indicating its direct involvement in biomedically important cellular phenomena. A recently published dataset explored the connection between mitochondrial functionality and cell physiology, where a non-linearity between mitochondrial functionality and cell size was found. Using mathematical models, we suggest that a combination of metabolic scaling and a simple model of cell death may account for these observations. However, our findings also suggest the existence of alternative competing hypotheses, such as a non-linearity between cell death and cell size. While we find that the proposed non-linear coupling between mitochondrial functionality and cell size provides a compelling alternative to previous attempts to link mitochondrial heterogeneity and cell physiology, we emphasise the need to account for alternative causal variables, including cell cycle, size, mitochondrial density and death, in future studies of mitochondrial physiology.

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-t4-html.jsp Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=959&limit=10&respub-action=search.html Current Millis: 1563927660304 Current Time: Wed Jul 24 01:21:00 BST 2019