Most of the members of this group are from the Statistics Section and Biomaths research group of the Department of Mathematics. Below you can find a list of research areas that members of this group are currently working on and/or would like to work on by applying their developed mathematical and statistical methods.

Research areas

Research areas


Search or filter publications

Filter by type:

Filter by publication type

Filter by year:



  • Showing results for:
  • Reset all filters

Search results

  • Journal article
    Beaney T, Clarke J, Woodcock T, McCarthy R, Saravanakumar K, Barahona M, Blair M, Hargreaves Det al., 2021,

    Patterns of healthcare utilisation in children and young people: a retrospective cohort study using routinely collected healthcare data in Northwest London

    , BMJ Open, Vol: 11, Pages: 1-14, ISSN: 2044-6055

    ObjectivesWith a growing role for health services in managing population health, there is a need for early identification of populations with high need. Segmentation approaches partition the population based on demographics, long-term conditions (LTCs) or healthcare utilisation but have mostly been applied to adults. Our study uses segmentation methods to distinguish patterns of healthcare utilisation in children and young people (CYP) and to explore predictors of segment membership.DesignRetrospective cohort study.SettingRoutinely collected primary and secondary healthcare data in Northwest London from the Discover database.Participants378,309 CYP aged 0-15 years registered to a general practice in Northwest London with one full year of follow-up.Primary and secondary outcome measuresAssignment of each participant to a segment defined by seven healthcare variables representing primary and secondary care attendances, and description of utilisation patterns by segment. Predictors of segment membership described by age, sex, ethnicity, deprivation and LTCs.ResultsParticipants were grouped into six segments based on healthcare utilisation. Three segments predominantly used primary care; two moderate utilisation segments differed in use of emergency or elective care, and a high utilisation segment, representing 16,632 (4.4%) children accounted for the highest mean presentations across all service types. The two smallest segments, representing 13.3% of the population, accounted for 62.5% of total costs. Younger age, residence in areas of higher deprivation, and presence of one or more LTCs were associated with membership of higher utilisation segments, but 75.0% of those in the highest utilisation segment had no LTC.ConclusionsThis article identifies six segments of healthcare utilisation in CYP and predictors of segment membership. Demographics and LTCs may not explain utilisation patterns as strongly as in adults which may limit the use of routine data in predicting ut

  • Journal article
    Liu Z, Peach R, Lawrance E, Noble A, Ungless M, Barahona Met al., 2021,

    Listening to mental health crisis needs at scale: using Natural Language Processing to understand and evaluate a mental health crisis text messaging service

    , Frontiers in Digital Health, Vol: 3, Pages: 1-14, ISSN: 2673-253X

    The current mental health crisis is a growing public health issue requiring a large-scale response that cannot be met with traditional services alone. Digital support tools are proliferating, yet most are not systematically evaluated, and we know little about their users and their needs. Shout is a free mental health text messaging service run by the charity Mental Health Innovations, which provides support for individuals in the UK experiencing mental or emotional distress and seeking help. Here we study a large data set of anonymised text message conversations and post-conversation surveys compiled through Shout. This data provides an opportunity to hear at scale from those experiencing distress; to better understand mental health needs for people not using traditional mental health services; and to evaluate the impact of a novel form of crisis support. We use natural language processing (NLP) to assess the adherence of volunteers to conversation techniques and formats, and to gain insight into demographic user groups and their behavioural expressions of distress. Our textual analyses achieve accurate classification of conversation stages (weighted accuracy = 88%), behaviours (1-hamming loss = 95%) and texter demographics (weighted accuracy = 96%), exemplifying how the application of NLP to frontline mental health data sets can aid with post-hoc analysis and evaluation of quality of service provision in digital mental health services.

  • Conference paper
    Liu Z, Barahona M, 2021,

    Similarity measure for sparse time course data based on Gaussian processes

    , Uncertainty in Artificial Intelligence 2021, Publisher: PMLR, Pages: 1332-1341

    We propose a similarity measure for sparsely sampled time course data in the form of a log-likelihood ratio of Gaussian processes (GP). The proposed GP similarity is similar to a Bayes factor and provides enhanced robustness to noise in sparse time series, such as those found in various biological settings, e.g., gene transcriptomics. We show that the GP measure is equivalent to the Euclidean distance when the noise variance in the GP is negligible compared to the noise variance of the signal. Our numerical experiments on both synthetic and real data show improved performance of the GP similarity when used in conjunction with two distance-based clustering methods.

  • Journal article
    Myall AC, Peach RL, Weiße AY, Davies F, Mookerjee S, Holmes A, Barahona Met al., 2021,

    Network memory in the movement of hospital patients carrying drug-resistant bacteria

    , Applied Network Science, Vol: 6, ISSN: 2364-8228

    Hospitals constitute highly interconnected systems that bring into contact anabundance of infectious pathogens and susceptible individuals, thus makinginfection outbreaks both common and challenging. In recent years, there hasbeen a sharp incidence of antimicrobial-resistance amongsthealthcare-associated infections, a situation now considered endemic in manycountries. Here we present network-based analyses of a data set capturing themovement of patients harbouring drug-resistant bacteria across three largeLondon hospitals. We show that there are substantial memory effects in themovement of hospital patients colonised with drug-resistant bacteria. Suchmemory effects break first-order Markovian transitive assumptions andsubstantially alter the conclusions from the analysis, specifically on noderankings and the evolution of diffusive processes. We capture variable lengthmemory effects by constructing a lumped-state memory network, which we then useto identify overlapping communities of wards. We find that these communities ofwards display a quasi-hierarchical structure at different levels of granularitywhich is consistent with different aspects of patient flows related to hospitallocations and medical specialties.

  • Journal article
    Saavedra-Garcia P, Roman-Trufero M, Al-Sadah HA, Blighe K, Lopez-Jimenez E, Christoforou M, Penfold L, Capece D, Xiong X, Miao Y, Parzych K, Caputo V, Siskos AP, Encheva V, Liu Z, Thiel D, Kaiser MF, Piazza P, Chaidos A, Karadimitris A, Franzoso G, Snijder AP, Keun HC, Oyarzún DA, Barahona M, Auner Het al., 2021,

    Systems level profiling of chemotherapy-induced stress resolution in cancer cells reveals druggable trade-offs

    , Proceedings of the National Academy of Sciences of USA, Vol: 118, ISSN: 0027-8424

    Cancer cells can survive chemotherapy-induced stress, but how they recover from it is not known.Using a temporal multiomics approach, we delineate the global mechanisms of proteotoxic stressresolution in multiple myeloma cells recovering from proteasome inhibition. Our observations definelayered and protracted programmes for stress resolution that encompass extensive changes acrossthe transcriptome, proteome, and metabolome. Cellular recovery from proteasome inhibitioninvolved protracted and dynamic changes of glucose and lipid metabolism and suppression ofmitochondrial function. We demonstrate that recovering cells are more vulnerable to specific insultsthan acutely stressed cells and identify the general control nonderepressable 2 (GCN2)-driven cellularresponse to amino acid scarcity as a key recovery-associated vulnerability. Using a transcriptomeanalysis pipeline, we further show that GCN2 is also a stress-independent bona fide target intranscriptional signature-defined subsets of solid cancers that share molecular characteristics. Thus,identifying cellular trade-offs tied to the resolution of chemotherapy-induced stress in tumour cellsmay reveal new therapeutic targets and routes for cancer therapy optimisation.

  • Journal article
    Battey HS, 2019,

    On sparsity scales and covariance matrix transformations

    , Biometrika, Vol: 106, Pages: 605-617, ISSN: 0006-3444

    We develop a theory of covariance and concentration matrix estimation on any given or estimated sparsity scale when the matrix dimension is larger than the sample size. Nonstandard sparsity scales are justified when such matrices are nuisance parameters, distinct from interest parameters, which should always have a direct subject-matter interpretation. The matrix logarithmic and inverse scales are studied as special cases, with the corollary that a constrained optimization-based approach is unnecessary for estimating a sparse concentration matrix. It is shown through simulations that for large unstructured covariance matrices, there can be appreciable advantages to estimating a sparse approximation to the log-transformed covariance matrix and converting the conclusions back to the scale of interest.

  • Journal article
    Clarke JM, Warren LR, Arora S, Barahona M, Darzi AWet al., 2018,

    Guiding interoperable electronic health records through patient-sharing networks.

    , NPJ digital medicine, Vol: 1, Pages: 65-65, ISSN: 2398-6352

    Effective sharing of clinical information between care providers is a critical component of a safe, efficient health system. National data-sharing systems may be costly, politically contentious and do not reflect local patterns of care delivery. This study examines hospital attendances in England from 2013 to 2015 to identify instances of patient sharing between hospitals. Of 19.6 million patients receiving care from 155 hospital care providers, 130 million presentations were identified. On 14.7 million occasions (12%), patients attended a different hospital to the one they attended on their previous interaction. A network of hospitals was constructed based on the frequency of patient sharing between hospitals which was partitioned using the Louvain algorithm into ten distinct data-sharing communities, improving the continuity of data sharing in such instances from 0 to 65-95%. Locally implemented data-sharing communities of hospitals may achieve effective accessibility of clinical information without a large-scale national interoperable information system.

  • Journal article
    Battey HS, Cox DR, 2018,

    Large numbers of explanatory variables: a probabilistic assessment

    , Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, Vol: 474
  • Journal article
    Avella M, Battey HS, Fan J, Li Qet al., 2018,

    Robust estimation of high-dimensional covariance and precision matrices

    , Biometrika, Vol: 105, Pages: 271-284, ISSN: 0006-3444

    High-dimensional data are often most plausibly generated from distributions with complex structure and leptokurtosis in some or all components. Covariance and precision matrices provide a useful summary of such structure, yet the performance of popular matrix estimators typically hinges upon a sub-Gaussianity assumption. This paper presents robust matrix estimators whose performance is guaranteed for a much richer class of distributions. The proposed estimators, under a bounded fourth moment assumption, achieve the same minimax convergence rates as do existing methods under a sub-Gaussianity assumption. Consistency of the proposed estimators is also established under the weak assumption of bounded2+ϵmoments forϵ∈(0,2). The associated convergence rates depend onϵ.

  • Journal article
    Battey HS, Fan J, Liu H, Lu J, Zhu Zet al., 2018,

    Distributed testing and estimation in sparse high dimensional models

    , Annals of Statistics, Vol: 46, Pages: 1352-1382, ISSN: 0090-5364

    This paper studies hypothesis testing and parameter estimation in the context of the divide-and-conquer algorithm. In a unified likelihood-based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from k subsamples of size n/k, where n is the sample size. In both low dimensional and sparse high dimensional settings, we address the important question of how large k can be, as n grows large, such that the loss of efficiency due to the divide-and-conquer algorithm is negligible. In other words, the resulting estimators have the same inferential efficiencies and estimation rates as an oracle with access to the full sample. Thorough numerical results are provided to back up the theory.

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=959&limit=10&respub-action=search.html Current Millis: 1660086537833 Current Time: Wed Aug 10 00:08:57 BST 2022