ProfessorMauricioBarahona

Faculty of Natural Sciences, Department of Mathematics

Director of Research, Chair in Biomathematics

Contact

m.barahona Website

Location

6M31Huxley BuildingSouth Kensington Campus

Summary

Publications

Myall A, Peach R, Wan Y, Mookerjee S, Jauneikaite E, Bolt F, Price J, Davies F, Weisse A, Holmes AH, Barahona Met al., 2022, Improved contact tracing using network analysis and spatial-temporal proximity, iMED conference, Publisher: Elsevier, Pages: S20-S20, ISSN: 1201-9712

Conference paper

Jha S, Mayer E, Barahona M, 2022, Improving information fusion on multimodal clinical data in classification settings, Pages: 154-159

Clinical data often exists in different forms across the lifetime of a patient's interaction with the healthcare system-structured, unstructured or semi-structured data in the form of laboratory readings, clinical notes, diagnostic codes, imaging and audio data of various kinds, and other observational data. Formulating a representation model that aggregates information from these heterogeneous sources may allow us to jointly model on data with more predictive signal than noise and help inform our model with useful constraints learned from better data. Multimodal fusion approaches help produce representations combined from heterogeneous modalities, which can be used for clinical prediction tasks. Representations produced through different fusion techniques require different training strategies. We investigate the advantage of adding narrative clinical text to structured modalities to classification tasks in the clinical domain. We show that while there is a competitive advantage in combined representations of clinical data, the approach can be helped by training guidance customized to each modality. We show empirical results across binary/multiclass settings, single/multitask settings and unified/multimodal learning rate settings for early and late information fusion of clinical data.

Abstract
Cite

Conference paper

Beaney T, Clarke J, Woodcock T, McCarthy R, Saravanakumar K, Barahona M, Blair M, Hargreaves Det al., 2021, Patterns of healthcare utilisation in children and young people: a retrospective cohort study using routinely collected healthcare data in Northwest London, BMJ Open, Vol: 11, Pages: 1-14, ISSN: 2044-6055

ObjectivesWith a growing role for health services in managing population health, there is a need for early identification of populations with high need. Segmentation approaches partition the population based on demographics, long-term conditions (LTCs) or healthcare utilisation but have mostly been applied to adults. Our study uses segmentation methods to distinguish patterns of healthcare utilisation in children and young people (CYP) and to explore predictors of segment membership.DesignRetrospective cohort study.SettingRoutinely collected primary and secondary healthcare data in Northwest London from the Discover database.Participants378,309 CYP aged 0-15 years registered to a general practice in Northwest London with one full year of follow-up.Primary and secondary outcome measuresAssignment of each participant to a segment defined by seven healthcare variables representing primary and secondary care attendances, and description of utilisation patterns by segment. Predictors of segment membership described by age, sex, ethnicity, deprivation and LTCs.ResultsParticipants were grouped into six segments based on healthcare utilisation. Three segments predominantly used primary care; two moderate utilisation segments differed in use of emergency or elective care, and a high utilisation segment, representing 16,632 (4.4%) children accounted for the highest mean presentations across all service types. The two smallest segments, representing 13.3% of the population, accounted for 62.5% of total costs. Younger age, residence in areas of higher deprivation, and presence of one or more LTCs were associated with membership of higher utilisation segments, but 75.0% of those in the highest utilisation segment had no LTC.ConclusionsThis article identifies six segments of healthcare utilisation in CYP and predictors of segment membership. Demographics and LTCs may not explain utilisation patterns as strongly as in adults which may limit the use of routine data in predicting ut

Journal article

Liu Z, Peach R, Lawrance E, Noble A, Ungless M, Barahona Met al., 2021, Listening to mental health crisis needs at scale: using Natural Language Processing to understand and evaluate a mental health crisis text messaging service, Frontiers in Digital Health, Vol: 3, Pages: 1-14, ISSN: 2673-253X

The current mental health crisis is a growing public health issue requiring a large-scale response that cannot be met with traditional services alone. Digital support tools are proliferating, yet most are not systematically evaluated, and we know little about their users and their needs. Shout is a free mental health text messaging service run by the charity Mental Health Innovations, which provides support for individuals in the UK experiencing mental or emotional distress and seeking help. Here we study a large data set of anonymised text message conversations and post-conversation surveys compiled through Shout. This data provides an opportunity to hear at scale from those experiencing distress; to better understand mental health needs for people not using traditional mental health services; and to evaluate the impact of a novel form of crisis support. We use natural language processing (NLP) to assess the adherence of volunteers to conversation techniques and formats, and to gain insight into demographic user groups and their behavioural expressions of distress. Our textual analyses achieve accurate classification of conversation stages (weighted accuracy = 88%), behaviours (1-hamming loss = 95%) and texter demographics (weighted accuracy = 96%), exemplifying how the application of NLP to frontline mental health data sets can aid with post-hoc analysis and evaluation of quality of service provision in digital mental health services.

Journal article

Liu Z, Barahona M, 2021, Similarity measure for sparse time course data based on Gaussian processes, Uncertainty in Artificial Intelligence 2021, Publisher: PMLR, Pages: 1332-1341

We propose a similarity measure for sparsely sampled time course data in the form of a log-likelihood ratio of Gaussian processes (GP). The proposed GP similarity is similar to a Bayes factor and provides enhanced robustness to noise in sparse time series, such as those found in various biological settings, e.g., gene transcriptomics. We show that the GP measure is equivalent to the Euclidean distance when the noise variance in the GP is negligible compared to the noise variance of the signal. Our numerical experiments on both synthetic and real data show improved performance of the GP similarity when used in conjunction with two distance-based clustering methods.

Conference paper

Ming DK, Myall AC, Hernandez B, Weiße AY, Peach RL, Barahona M, Rawson TM, Holmes AHet al., 2021, Informing antimicrobial management in the context of COVID-19: understanding the longitudinal dynamics of C-reactive protein and procalcitonin, BMC Infectious Diseases, Vol: 21

Background: To characterise the longitudinal dynamics of C-reactive protein (CRP) and Procalcitonin (PCT) in a cohort of hospitalised patients with COVID-19 and support antimicrobial decision-making. Methods: Longitudinal CRP and PCT concentrations and trajectories of 237 hospitalised patients with COVID-19 were modelled. The dataset comprised of 2,021 data points for CRP and 284 points for PCT. Pairwise comparisons were performed between: (i) those with or without significant bacterial growth from cultures, and (ii) those who survived or died in hospital. Results: CRP concentrations were higher over time in COVID-19 patients with positive microbiology (day 9: 236 vs 123 mg/L, p < 0.0001) and in those who died (day 8: 226 vs 152 mg/L, p < 0.0001) but only after day 7 of COVID-related symptom onset. Failure for CRP to reduce in the first week of hospital admission was associated with significantly higher odds of death. PCT concentrations were higher in patients with COVID-19 and positive microbiology or in those who died, although these differences were not statistically significant. Conclusions: Both the absolute CRP concentration and the trajectory during the first week of hospital admission are important factors predicting microbiology culture positivity and outcome in patients hospitalised with COVID-19. Further work is needed to describe the role of PCT for co-infection. Understanding relationships of these biomarkers can support development of risk models and inform optimal antimicrobial strategies.

Abstract
Open Access Link
Cite
Citations: 11

Journal article

Wan Y, Myall AC, Boonyasiri A, Bolt F, Ledda A, Mookerjee S, Weiße AY, Getino M, Turton JF, Abbas H, Prakapaite R, Sabnis A, Abdolrasoulia A, Malpartida-Cardenas K, Miglietta L, Donaldson H, Gilchrist M, Hopkins KL, Ellington MJ, Otter JA, Larrouy-Maumus G, Edwards AM, Rodriguez-Manzano J, Didelot X, Barahona M, Holmes AH, Jauneikaite E, Davies Fet al., 2021, Integrated analysis of patient networks and plasmid genomes reveals a regional, multi-species outbreak of carbapenemase-producing Enterobacterales carrying both<i>bla</i><sub>IMP</sub>and<i>mcr-9</i>genes

<jats:title>Abstract</jats:title><jats:sec><jats:title>Background</jats:title><jats:p>Carbapenemase-producing Enterobacterales (CPE) are challenging in the healthcare setting, with resistance to multiple classes of antibiotics and a high associated mortality. The incidence of CPE is rising globally, despite enhanced awareness and control efforts. This study describes an investigation of the emergence of IMP-encoding CPE amongst diverse Enterobacterales species between 2016 and 2019 in patients across a London regional hospital network.</jats:p></jats:sec><jats:sec><jats:title>Methods</jats:title><jats:p>We carried out a network analysis of patient pathways, using electronic health records, to identify contacts between IMP-encoding CPE positive patients. Genomes of IMP-encoding CPE isolates were analysed and overlayed with patient contacts to imply potential transmission events.</jats:p></jats:sec><jats:sec><jats:title>Results</jats:title><jats:p>Genomic analysis of 84 Enterobacterales isolates revealed diverse species (predominantly<jats:italic>Klebsiella</jats:italic>spp,<jats:italic>Enterobacter</jats:italic>spp,<jats:italic>E. coli</jats:italic>), of which 86% (72/84) harboured an IncHI2 plasmid, which carried both<jats:italic>bla</jats:italic><jats:sub>IMP</jats:sub>and the mobile colistin resistance gene<jats:italic>mcr-9</jats:italic>(68/72). Phylogenetic analysis of IncHI2 plasmids identified three lineages which showed significant association with patient contact and movements between four hospital sites and across medical specialities, which had been missed on initial investigations.</jats:p></jats:sec><jats:sec><jats:title>Conclusions</jats:title><jats:p>Combined, our patient network and plasmid analyses demonstrate an interspecies, plasmid-med

Journal article

Wong T, Barahona M, 2021, Dimensionality reduction and data integration for scRNA-seq data based on integrative hierarchical Poisson factorisation

<jats:title>Abstract</jats:title><jats:p>Single-cell RNA sequencing (scRNA-seq) data sets consist of high-dimensional, sparse and noisy feature vectors, and pose a challenge for classic methods for dimensionality reduction. Such problems are compounded when dealing with composite data sets formed by different batches. We introduce Integrative Hierarchical Poisson Factorisation (IHPF), an extension of HPF that makes use of a noise ratio hyper-parameter to tune the variability attributed to batches <jats:italic>vs</jats:italic>. biological sources (cell phenotypes). We exemplify the application of IHPF under different data integration scenarios with varying alignments of batches and cell diversity, and show that IHPF produces latent factors that can be advantageously applied for cell clustering and visualisation. In addition, the extracted factors have a dual block structure in both cell and gene spaces with enhanced biological interpretability.</jats:p>

Journal article

Mersmann S, Stromich L, Song F, Wu N, Vianello F, Barahona M, Yaliraki Set al., 2021, ProteinLens: a web-based application for the analysis of allosteric signalling on atomistic graphs of biomolecules, Nucleic Acids Research, Vol: 49, Pages: W551-W558, ISSN: 0305-1048

The investigation of allosteric effects in biomolecular structures is of great current interest in diverse areas, from fundamental biological enquiry to drug discovery. Here we present ProteinLens, a user-friendly and interactive web application for the investigation of allosteric signalling based on atomistic graph-theoretical methods. Starting from the PDB file of a biomolecule (or a biomolecular complex) ProteinLens obtains an atomistic, energy-weighted graph description of the structure of the biomolecule, and subsequently provides a systematic analysis of allosteric signalling and communication across the structure using two computationally efficient methods: Markov Transients and bond-to-bond propensities. ProteinLens scores and ranks every bond and residue according to the speed and magnitude of the propagation of fluctuations emanating from any site of choice (e.g. the active site). The results are presented through statistical quantile scores visualised with interactive plots and adjustable 3D structure viewers, which can also be downloaded. ProteinLens thus allows the investigation of signalling in biomolecular structures of interest to aid the detection of allosteric sites and pathways. ProteinLens is implemented in Python/SQL and freely available to use at: www.proteinlens.io.

Journal article

Chrysostomou S, Roy R, Prischi F, Thamlikitkul L, Chapman KL, Mufti U, Peach R, Ding L, Hancock D, Moore C, Molina-Arcas M, Mauri F, Pinato DJ, Abrahams JM, Ottaviani S, Castellano L, Giamas G, Pascoe J, Moonamale D, Pirrie S, Gaunt C, Billingham L, Steven NM, Cullen M, Hrouda D, Winkler M, Post J, Cohen P, Salpeter SJ, Bar V, Zundelevich A, Golan S, Leibovici D, Lara R, Klug DR, Yaliraki SN, Barahona M, Wang Y, Downward J, Skehel JM, Ali MMU, Seckl MJ, Pardo OEet al., 2021, Repurposed floxacins targeting RSK4 prevent chemoresistance and metastasis in lung and bladder cancer., Science translational medicine, Vol: 13, ISSN: 1946-6234

Lung and bladder cancers are mostly incurable because of the early development of drug resistance and metastatic dissemination. Hence, improved therapies that tackle these two processes are urgently needed to improve clinical outcome. We have identified RSK4 as a promoter of drug resistance and metastasis in lung and bladder cancer cells. Silencing this kinase, through either RNA interference or CRISPR, sensitized tumor cells to chemotherapy and hindered metastasis in vitro and in vivo in a tail vein injection model. Drug screening revealed several floxacin antibiotics as potent RSK4 activation inhibitors, and trovafloxacin reproduced all effects of RSK4 silencing in vitro and in/ex vivo using lung cancer xenograft and genetically engineered mouse models and bladder tumor explants. Through x-ray structure determination and Markov transient and Deuterium exchange analyses, we identified the allosteric binding site and revealed how this compound blocks RSK4 kinase activation through binding to an allosteric site and mimicking a kinase autoinhibitory mechanism involving the RSK4's hydrophobic motif. Last, we show that patients undergoing chemotherapy and adhering to prophylactic levofloxacin in the large placebo-controlled randomized phase 3 SIGNIFICANT trial had significantly increased (<i>P</i> = 0.048) long-term overall survival times. Hence, we suggest that RSK4 inhibition may represent an effective therapeutic strategy for treating lung and bladder cancer.

Abstract
Open Access Link
Cite
Citations: 10

Journal article

Laumann F, von Kuegelgen J, Barahona M, 2021, Kernel two-sample and independence tests for non-stationary random processes, ITISE 2021 (7th International conference on Time Series and Forecasting), Publisher: https://www.mdpi.com/2673-4591/5/1/31, Pages: 1-13

Two-sample and independence tests with the kernel-based MMD and HSIC haveshown remarkable results on i.i.d. data and stationary random processes.However, these statistics are not directly applicable to non-stationary randomprocesses, a prevalent form of data in many scientific disciplines. In thiswork, we extend the application of MMD and HSIC to non-stationary settings byassuming access to independent realisations of the underlying random process.These realisations - in the form of non-stationary time-series measured on thesame temporal grid - can then be viewed as i.i.d. samples from a multivariateprobability distribution, to which MMD and HSIC can be applied. We further showhow to choose suitable kernels over these high-dimensional spaces by maximisingthe estimated test power with respect to the kernel hyper-parameters. Inexperiments on synthetic data, we demonstrate superior performance of ourproposed approaches in terms of test power when compared to currentstate-of-the-art functional or multivariate two-sample and independence tests.Finally, we employ our methods on a real socio-economic dataset as an exampleapplication.

Conference paper

Clarke JM, Beaney T, Majeed A, Darzi A, Barahona Met al., 2021, Defining Integrated Care Systems Through Patient Data From Referral Networks in the English National Health Service: A Graph-Based Clustering Study.

<jats:title>Abstract</jats:title> <jats:p><jats:bold>Background </jats:bold>Integrated Care Systems (ICSs) are being introduced into the National Health Service (NHS) in England to replace Sustainability and Transformation Partnerships (STPs). They aim to improve care through place-based collaboration between primary, secondary and community providers. It is important that new organisational configurations adequately reflect existing patterns of patient care to minimise disruption resulting from patients crossing between ICSs. <jats:bold> </jats:bold><jats:bold>Methods </jats:bold>All planned outpatient hospital clinic appointments from 1st April 2017 to 31st March 2018 for patients resident in England to NHS hospitals in England were identified from Hospital Episode Statistics. Markov Multiscale Community Detection (MMCD), an unsupervised network clustering technique, was used to identify natural communities of GP practices, hospitals and geographic regions according to patterns of GP practice registration and outpatient clinic referral. Two primary measures of care coverage were calculated; the proportion of patients registered to a GP practice in a different community than they reside, and the proportion of outpatient clinic appointments to hospitals in a different community to the referring GP practice. <jats:bold> </jats:bold><jats:bold>Results </jats:bold>109,830,647 outpatient clinic appointments were identified for 20,992,695 patients. A configuration of 42 ICSs was identified from MMCD to match the 42 STPs of the current configuration. In the current STP configuration, 534,946 patients (2.6%) were registered to a GP practice in a different STP than their residence, compared to 334,192 (1.6%) in the optimal MMCD configuration. 16,110,267 hospital clinic appointments (14.7%) occurred in a different STP to the referring GP practice, compared to 11,518,735 (10.5%) in the MMCD c

Journal article

Myall AC, Peach RL, Weiße AY, Davies F, Mookerjee S, Holmes A, Barahona Met al., 2021, Network memory in the movement of hospital patients carrying drug-resistant bacteria, Applied Network Science, Vol: 6, ISSN: 2364-8228

Hospitals constitute highly interconnected systems that bring into contact anabundance of infectious pathogens and susceptible individuals, thus makinginfection outbreaks both common and challenging. In recent years, there hasbeen a sharp incidence of antimicrobial-resistance amongsthealthcare-associated infections, a situation now considered endemic in manycountries. Here we present network-based analyses of a data set capturing themovement of patients harbouring drug-resistant bacteria across three largeLondon hospitals. We show that there are substantial memory effects in themovement of hospital patients colonised with drug-resistant bacteria. Suchmemory effects break first-order Markovian transitive assumptions andsubstantially alter the conclusions from the analysis, specifically on noderankings and the evolution of diffusive processes. We capture variable lengthmemory effects by constructing a lumped-state memory network, which we then useto identify overlapping communities of wards. We find that these communities ofwards display a quasi-hierarchical structure at different levels of granularitywhich is consistent with different aspects of patient flows related to hospitallocations and medical specialties.

Journal article

Saavedra-Garcia P, Roman-Trufero M, Al-Sadah HA, Blighe K, Lopez-Jimenez E, Christoforou M, Penfold L, Capece D, Xiong X, Miao Y, Parzych K, Caputo V, Siskos AP, Encheva V, Liu Z, Thiel D, Kaiser MF, Piazza P, Chaidos A, Karadimitris A, Franzoso G, Snijder AP, Keun HC, Oyarzún DA, Barahona M, Auner Het al., 2021, Systems level profiling of chemotherapy-induced stress resolution in cancer cells reveals druggable trade-offs, Proceedings of the National Academy of Sciences of USA, Vol: 118, ISSN: 0027-8424

Cancer cells can survive chemotherapy-induced stress, but how they recover from it is not known.Using a temporal multiomics approach, we delineate the global mechanisms of proteotoxic stressresolution in multiple myeloma cells recovering from proteasome inhibition. Our observations definelayered and protracted programmes for stress resolution that encompass extensive changes acrossthe transcriptome, proteome, and metabolome. Cellular recovery from proteasome inhibitioninvolved protracted and dynamic changes of glucose and lipid metabolism and suppression ofmitochondrial function. We demonstrate that recovering cells are more vulnerable to specific insultsthan acutely stressed cells and identify the general control nonderepressable 2 (GCN2)-driven cellularresponse to amino acid scarcity as a key recovery-associated vulnerability. Using a transcriptomeanalysis pipeline, we further show that GCN2 is also a stress-independent bona fide target intranscriptional signature-defined subsets of solid cancers that share molecular characteristics. Thus,identifying cellular trade-offs tied to the resolution of chemotherapy-induced stress in tumour cellsmay reveal new therapeutic targets and routes for cancer therapy optimisation.

Journal article

Peach RL, Arnaudon A, Schmidt JA, Palasciano HA, Bernier NR, Jelfs KE, Yaliraki SN, Barahona Met al., 2021, HCGA: Highly comparative graph analysis for network phenotyping, Patterns, Vol: 2, Pages: 100227-100227, ISSN: 2666-3899

<jats:title>A<jats:sc>bstract</jats:sc></jats:title><jats:p>Networks are widely used as mathematical models of complex systems across many scientific disciplines, not only in biology and medicine but also in the social sciences, physics, computing and engineering. Decades of work have produced a vast corpus of research characterising the topological, combinatorial, statistical and spectral properties of graphs. Each graph property can be thought of as a feature that captures important (and some times overlapping) characteristics of a network. In the analysis of real-world graphs, it is crucial to integrate systematically a large number of diverse graph features in order to characterise and classify networks, as well as to aid network-based scientific discovery. In this paper, we introduce HCGA, a framework for highly comparative analysis of graph data sets that computes several thousands of graph features from any given network. HCGA also offers a suite of statistical learning and data analysis tools for automated identification and selection of important and interpretable features underpinning the characterisation of graph data sets. We show that HCGA outperforms other methodologies on supervised classification tasks on benchmark data sets whilst retaining the interpretability of network features. We also illustrate how HCGA can be used for network-based discovery through two examples where data is naturally represented as graphs: the clustering of a data set of images of neuronal morphologies, and a regression problem to predict charge transfer in organic semiconductors based on their structure. HCGA is an open platform that can be expanded to include further graph properties and statistical learning tools to allow researchers to leverage the wide breadth of graph-theoretical research to quantitatively analyse and draw insights from network data.</jats:p>

Journal article

Myall A, Peach RL, Wan Y, Mookerjee S, Jauneikaite E, Bolt F, Price J, Davies F, Weiße AY, Holmes A, Barahona Met al., 2021, Characterising contact in disease outbreaks via a network model of spatial-temporal proximity

<jats:title>ABSTRACT</jats:title><jats:p>Contact tracing is a key tool in epidemiology to identify and control outbreaks of infectious diseases. Existing contact tracing methodologies produce contact maps of individuals based on a binary definition of contact which can be hampered by missing data and indirect contacts. Here, we present a Spatial-temporal Epidemiological Proximity (StEP) model to recover contact maps in disease outbreaks based on movement data. The StEP model accounts for imperfect data by considering probabilistic contacts between individuals based on spatial-temporal proximity of their movement trajectories, creating a robust movement network despite possible missing data and unseen transmission routes. Using real-world data we showcase the potential of StEP for contact tracing with outbreaks of multidrug-resistant bacteria and COVID-19 in a large hospital group in London, UK. In addition to the core structure of contacts that can be recovered using traditional methods of contact tracing, the StEP model reveals missing contacts that connect seemingly separate outbreaks. Comparison with genomic data further confirmed that these recovered contacts indeed improve characterisation of disease transmission and so highlights how the StEP framework can inform effective strategies of infection control and prevention.</jats:p>

Journal article

Qian Y, Expert P, Panzarasa P, Barahona Met al., 2021, Geometric graphs from data to aid classification tasks with Graph Convolutional Networks, Patterns, Vol: 2, Pages: 100237-100237, ISSN: 2666-3899

Journal article

Maes A, Barahona M, Clopath C, 2021, Learning compositional sequences with multiple time scales through a hierarchical network of spiking neurons, PLoS Computational Biology, Vol: 17, ISSN: 1553-734X

Sequential behaviour is often compositional and organised across multiple time scales: a set of individual elements developing on short time scales (motifs) are combined to form longer functional sequences (syntax). Such organisation leads to a natural hierarchy that can be used advantageously for learning, since the motifs and the syntax can be acquired independently. Despite mounting experimental evidence for hierarchical structures in neuroscience, models for temporal learning based on neuronal networks have mostly focused on serial methods. Here, we introduce a network model of spiking neurons with a hierarchical organisation aimed at sequence learning on multiple time scales. Using biophysically motivated neuron dynamics and local plasticity rules, the model can learn motifs and syntax independently. Furthermore, the model can relearn sequences efficiently and store multiple sequences. Compared to serial learning, the hierarchical model displays faster learning, more flexible relearning, increased capacity, and higher robustness to perturbations. The hierarchical model redistributes the variability: it achieves high motif fidelity at the cost of higher variability in the between-motif timings.

Journal article

Kuntz Nussio J, Thomas P, Stan G, Barahona Met al., 2021, Approximations of countably-infinite linear programs over bounded measure spaces, SIAM Journal on Optimization, Vol: 31, Pages: 604-625, ISSN: 1052-6234

We study a class of countably-infinite-dimensional linear programs (CILPs)whose feasible sets are bounded subsets of appropriately defined spaces ofmeasures. The optimal value, optimal points, and minimal points of these CILPscan be approximated by solving finite-dimensional linear programs. We show howto construct finite-dimensional programs that lead to approximations witheasy-to-evaluate error bounds, and we prove that the errors converge to zero asthe size of the finite-dimensional programs approaches that of the originalproblem. We discuss the use of our methods in the computation of the stationarydistributions, occupation measures, and exit distributions of Markov~chains.

Journal article

Peach R, Greenbury S, Johnston I, Yaliraki S, Lefevre D, Barahona Met al., 2021, Understanding learner behaviour in online courses with Bayesian modelling and time series characterisation, Scientific Reports, Vol: 11, ISSN: 2045-2322

The intrinsic temporality of learning demands the adoption of methodologies capable of exploiting time-series information. In this study we leverage the sequence data framework and show how data-driven analysis of temporal sequences of task completion in online courses can be used to characterise personal and group learners’ behaviors, and to identify critical tasks and course sessions in a given course design. We also introduce a recently developed probabilistic Bayesian model to learn sequential behaviours of students and predict student performance. The application of our data-driven sequence-based analyses to data from learners undertaking an on-line Business Management course reveals distinct behaviors within the cohort of learners, identifying learners or groups of learners that deviate from the nominal order expected in the course. Using course grades a posteriori, we explore differences in behavior between high and low performing learners. We find that high performing learners follow the progression between weekly sessions more regularly than low performing learners, yet within each weekly session high performing learners are less tied to the nominal task order. We then model the sequences of high and low performance students using the probablistic Bayesian model and show that we can learn engagement behaviors associated with performance. We also show that the data sequence framework can be used for task-centric analysis; we identify critical junctures and differences among types of tasks within the course design. We find that non-rote learning tasks, such as interactive tasks or discussion posts, are correlated with higher performance. We discuss the application of such analytical techniques as an aid to course design, intervention, and student supervision.

Journal article

Dusad V, Thiel D, Barahona M, Keun H, Oyarzun Det al., 2021, Opportunities at the interface of network science and metabolic modelling, Frontiers in Bioengineering and Biotechnology, Vol: 8, ISSN: 2296-4185

Metabolism plays a central role in cell physiology because it provides the molecular machinery for growth. At the genome-scale, metabolism is made up of thousands of reactions interacting with one another. Untangling this complexity is key to understand how cells respond to genetic, environmental, or therapeutic perturbations. Here we discuss the roles of two complementary strategies for the analysis of genome-scale metabolic models: Flux Balance Analysis (FBA) and network science. While FBA estimates metabolic flux on the basis of an optimization principle, network approaches reveal emergent properties of the global metabolic connectivity. We highlight how the integration of both approaches promises to deliver insights on the structure and function of metabolic systems with wide-ranging implications in discovery science, precision medicine and industrial biotechnology.

Journal article

Altuncu T, Yaliraki S, Barahona M, 2021, Graph-based topic extraction from vector embeddings of text documents: application to a corpus of news articles, Complex Networks & Their Applications IX, Editors: Benito, Cherifi, Cherifi, Moro, Rocha, Sales-Pardo, Publisher: Springer International Publishing, Pages: 154-166, ISBN: 978-3-030-65351-4

Production of news content is growing at an astonishing rate. To help manage and monitor the sheer amount of text, there is an increasing need to develop efficient methods that can provide insights into emerging content areas, and stratify unstructured corpora of text into ‘topics’ that stem intrinsically from content similarity. Here we present an unsupervised framework that brings together powerful vector embeddings from natural language processing with tools from multiscale graph partitioning that can revealnatural partitions at different resolutions without making a priori assumptions about the number of clusters in the corpus. We show the advantages of graph-based clustering through end-to-end comparisons with other popular clustering and topic modelling methods, and also evaluate different text vector embeddings, from classic Bag-of-Words to Doc2Vec to the recent transformers based model Bert. This comparative work is showcased through an analysis of a corpus of US news coverage during the presidential election year of 2016.

Book chapter

Schreglmann SR, Wang D, Peach RL, Li J, Zhang X, Latorre A, Rhodes E, Panella E, Cassara AM, Boyden ES, Barahona M, Santaniello S, Rothwell J, Bhatia KP, Grossman Net al., 2021, Non-invasive suppression of essential tremor via phase-locked disruption of its temporal coherence, NATURE COMMUNICATIONS, Vol: 12

Journal article

Price JR, Mookerjee S, Dyakova E, Myall A, Leung W, Weiße AY, Shersing Y, Brannigan ET, Galletly T, Muir D, Randell P, Davies F, Bolt F, Barahona M, Otter JA, Holmes AHet al., 2021, Development and delivery of a real-time hospital-onset COVID-19 surveillance system using network analysis, Clinical Infectious Diseases, Vol: 72, Pages: 82-89, ISSN: 1058-4838

BackgroundUnderstanding nosocomial acquisition, outbreaks and transmission chains in real-time will be fundamental to ensuring infection prevention measures are effective in controlling COVID-19 in healthcare. We report the design and implementation of a hospital-onset COVID-19 infection (HOCI) surveillance system for an acute healthcare setting to target prevention interventions.MethodsThe study took place in a large teaching hospital group in London, UK. All patients tested for SARS-CoV-2 between 4th March and 14th April 2020 were included. Utilising data routinely collected through electronic healthcare systems we developed a novel surveillance system for determining and reporting HOCI incidence and providing real-time network analysis. We provided daily reports on incidence and trends over time to support HOCI investigation, and generated geo-temporal reports using network analysis to interrogate admission pathways for common epidemiological links to infer transmission chains. By working with stakeholders the reports were co-designed for end users.ResultsReal-time surveillance reports revealed: changing rates of HOCI throughout the course of the COVID-19 epidemic; key wards fuelling probable transmission events; HOCIs over-represented in particular specialities managing high-risk patients; the importance of integrating analysis of individual prior pathways; and the value of co-design in producing data visualisation. Our surveillance system can effectively support national surveillance.ConclusionsThrough early analysis of the novel surveillance system we have provided a description of HOCI rates and trends over time using real-time shifting denominator data. We demonstrate the importance of including the analysis of patient pathways and networks in characterising risk of transmission and targeting infection control interventions.

Journal article

Kuntz J, Thomas P, Stan G-B, Barahona Met al., 2021, Stationary Distributions of Continuous-Time Markov Chains: A Review of Theory and Truncation-Based Approximations, SIAM Review, Vol: 63, Pages: 3-64, ISSN: 0036-1445

Journal article

Clarke J, Murray A, Markar S, Barahona M, Kinross Jet al., 2020, A new geographic model of care to manage the post-COVID-19 elective surgery aftershock in England: a retrospective observational study, BMJ Open, Vol: 10, Pages: 1-9, ISSN: 2044-6055

Objectives The suspension of elective surgery during the COVID pandemic is unprecedented and has resulted in record volumes of patients waiting for operations. Novel approaches that maximise capacity and efficiency of surgical care are urgently required. This study applies Markov Multiscale Community Detection (MMCD), an unsupervised graph-based clustering framework, to identify new surgical care models based on pooled waiting lists delivered across an expanded network of surgical providers. DesignRetrospective observational study using Hospital Episode Statistics.SettingPublic and private hospitals providing surgical care to National Health Service (NHS) patients in England. ParticipantsAll adult patients resident in England undergoing NHS-funded planned surgical procedures between 1st April 2017 and 31st March 2018. Main outcome measuresThe identification of the most common planned surgical procedures in England (High Volume Procedures – HVP) and proportion of low, medium and high-risk patients undergoing each HVP. The mapping of hospitals providing surgical care onto optimised groupings based on patient usage data.ResultsA total of 7,811,891 planned operations were identified in 4,284,925 adults during the one-year period of our study. The 28 most common surgical procedures accounted for a combined 3,907,474 operations (50.0% of the total). 2,412,613 (61.7%) of these most common procedures involved ‘low risk’ patients. Patients travelled an average of 11.3 km for these procedures. Based on the data, MMCD partitioned England into 45, 16 and 7 mutually exclusive and collectively exhaustive natural surgical communities of increasing coarseness. The coarser partitions into 16 and 7 surgical communities were shown to be associated with balanced supply and demand for surgical care within communities.ConclusionsPooled waiting lists for low risk elective procedures and patients across integrated, expanded natural surgical community networks have the pot

Journal article

Clarke J, Beaney T, Majeed A, Darzi A, Barahona Met al., 2020, Identifying Naturally Occurring Communities of Primary Care Providers in the English National Health Service in London, AcademyHealth Annual Research Meeting (ARM), Publisher: WILEY, Pages: 107-108, ISSN: 0017-9124

Conference paper

Clarke J, Beaney T, Majeed A, Darzi A, Barahona Met al., 2020, METHODS RESEARCH : Identifying Naturally Occurring Communities of Primary Care Providers in the English National Health Service in London, Health services research, Vol: 55, Pages: 107-108, ISSN: 0017-9124

<h4>Research Objective</h4> Primary care networks (PCNs) are a new organizational hierarchy with wide‐ranging responsibilities introduced in the NHS Long Term Plan. The vision is that they represent “natural” communities of primary care practices (PCPs) with boundaries that make sense to practices, other health care providers, and local communities. Our study aims to identify natural communities of PCPs based on patient registration patterns using network analysis methods and unsupervised clustering to create catchments for these communities. <h4>Study Design</h4> We used a series of novel methods for unsupervised graph clustering. A cosine similarity matrix was constructed representing similarities between each PCP to each other, based on registration of patients in each Lower Super Output Area (LSOA)—a geographic division similar to census block groups. Unsupervised graph partitioning using Markov multiscale community detection was conducted to identify communities of PCPs. Catchment areas for each PCN were assigned based on the majority attendance from an LSOA. <h4>Population Studied</h4> Patients resident in and attending PCPs in London identified from Hospital Episode Statistics from 2017 to 2018. <h4>Principal Findings</h4> 3,428,322 unique patients attended 1,334 GPs in 4,835 LSOAs in London. Our model grouped 1,291 PCPs (96·8%) and 4,721 LSOAs (97·6%), into 165 mutually exclusive PCNs. The median PCN list size was 53,490, with a lower quartile of 38,079 patients and an upper quartile of 72,982 patients. A median of 70·1% of patients attended a GP within their allocated PCN, ranging from 44·6% to 91·4%. <h4>Conclusions</h4> With PCNs expected to take a role in population health management and with community providers expected to reconfigure around them, it is vital we recognize how PCNs represent their communities. We find that stable, representati

Abstract
Cite

Journal article

Clarke J, Beaney T, Majeed A, Darzi A, Barahona Met al., 2020, Identifying naturally occurring communities of primary care providers in the English National Health Service in London, BMJ Open, Vol: 10, Pages: 1-7, ISSN: 2044-6055

Objectives - Primary Care Networks (PCNs) are a new organisational hierarchy with wide-ranging responsibilities introduced in the National Health Service (NHS) Long Term Plan. The vision is that they represent ‘natural’ communities of general practices (GP practices) working together at scale and covering a geography that make sense to practices, other healthcare providers and local communities. Our study aims to identify natural communities of GP practices based on patient registration patterns using Markov Multiscale Community Detection, an unsupervised network-based clustering technique to create catchments for these communities.Design - Retrospective observational study using Hospital Episode Statistics – patient-level administrative records of inpatient, outpatient and emergency department attendances to hospital.Setting – General practices in the 32 Clinical Commissioning Groups of Greater London Participants - All adult patients resident in and registered to a GP practices in Greater London that had one or more outpatient encounters at NHS hospital trusts between 1st April 2017 and 31st March 2018.Main outcome measures The allocation of GP practices in Greater London to PCNs based on the registrations of patients resident in each Lower Super Output Area (LSOA) of Greater London. The population size and coverage of each proposed PCN. Results - 3,428,322 unique patients attended 1,334 GPs in 4,835 LSOAs in Greater London. Our model grouped 1,291 GPs (96.8%) and 4,721 LSOAs (97.6%), into 165 mutually exclusive PCNs. The median PCN list size was 53,490, with a lower quartile of 38,079 patients and an upper quartile of 72,982 patients. A median of 70.1% of patients attended a GP within their allocated PCN, ranging from 44.6% to 91.4%.Conclusions - With PCNs expected to take a role in population health management and with community providers expected to reconfigure around them, it is vital we recognise how PCNs represent their communities. O

Journal article

Arnaudon A, Peach R, Barahona M, 2020, Scale-dependent measure of network centrality from diffusion dynamics, Physical Review Research, Vol: 2, ISSN: 2643-1564

Classic measures of graph centrality capture distinct aspects of node importance, from the local (e.g., degree) to the global (e.g., closeness). Here we exploit the connection between diffusion and geometry to introduce a multiscale centrality measure. A node is defined to be central if it breaks the metricity of the diffusion as a consequence of the effective boundaries and inhomogeneities in the graph. Our measure is naturally multiscale, as it is computed relative to graph neighbourhoods within the varying time horizon of the diffusion. We find that the centrality of nodes can differ widely at different scales. In particular, our measure correlates with degree (i.e., hubs) at small scales and with closeness (i.e., bridges) at large scales, and also reveals the existence of multi-centric structures in complex networks. By examining centrality across scales, our measure thus provides an evaluation of node importance relative to local and global processes on the network.

Journal article

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: limit=30&id=00333972&person=true&page=2&respub-action=search.html