187 results found
Beaney T, Clarke J, Woodcock T, et al., 2021, Patterns of healthcare utilisation in children and young people: a retrospective cohort study using routinely collected healthcare data in Northwest London, BMJ Open, Vol: 11, Pages: 1-14, ISSN: 2044-6055
ObjectivesWith a growing role for health services in managing population health, there is a need for early identification of populations with high need. Segmentation approaches partition the population based on demographics, long-term conditions (LTCs) or healthcare utilisation but have mostly been applied to adults. Our study uses segmentation methods to distinguish patterns of healthcare utilisation in children and young people (CYP) and to explore predictors of segment membership.DesignRetrospective cohort study.SettingRoutinely collected primary and secondary healthcare data in Northwest London from the Discover database.Participants378,309 CYP aged 0-15 years registered to a general practice in Northwest London with one full year of follow-up.Primary and secondary outcome measuresAssignment of each participant to a segment defined by seven healthcare variables representing primary and secondary care attendances, and description of utilisation patterns by segment. Predictors of segment membership described by age, sex, ethnicity, deprivation and LTCs.ResultsParticipants were grouped into six segments based on healthcare utilisation. Three segments predominantly used primary care; two moderate utilisation segments differed in use of emergency or elective care, and a high utilisation segment, representing 16,632 (4.4%) children accounted for the highest mean presentations across all service types. The two smallest segments, representing 13.3% of the population, accounted for 62.5% of total costs. Younger age, residence in areas of higher deprivation, and presence of one or more LTCs were associated with membership of higher utilisation segments, but 75.0% of those in the highest utilisation segment had no LTC.ConclusionsThis article identifies six segments of healthcare utilisation in CYP and predictors of segment membership. Demographics and LTCs may not explain utilisation patterns as strongly as in adults which may limit the use of routine data in predicting ut
Liu Z, Peach R, Lawrance E, et al., 2021, Listening to mental health crisis needs at scale: using Natural Language Processing to understand and evaluate a mental health crisis text messaging service, Frontiers in Digital Health, Vol: 3, Pages: 1-14, ISSN: 2673-253X
The current mental health crisis is a growing public health issue requiring a large-scale response that cannot be met with traditional services alone. Digital support tools are proliferating, yet most are not systematically evaluated, and we know little about their users and their needs. Shout is a free mental health text messaging service run by the charity Mental Health Innovations, which provides support for individuals in the UK experiencing mental or emotional distress and seeking help. Here we study a large data set of anonymised text message conversations and post-conversation surveys compiled through Shout. This data provides an opportunity to hear at scale from those experiencing distress; to better understand mental health needs for people not using traditional mental health services; and to evaluate the impact of a novel form of crisis support. We use natural language processing (NLP) to assess the adherence of volunteers to conversation techniques and formats, and to gain insight into demographic user groups and their behavioural expressions of distress. Our textual analyses achieve accurate classification of conversation stages (weighted accuracy = 88%), behaviours (1-hamming loss = 95%) and texter demographics (weighted accuracy = 96%), exemplifying how the application of NLP to frontline mental health data sets can aid with post-hoc analysis and evaluation of quality of service provision in digital mental health services.
Liu Z, Barahona M, 2021, Similarity measure for sparse time course data based on Gaussian processes, Uncertainty in Artificial Intelligence 2021, Publisher: PMLR, Pages: 1332-1341
We propose a similarity measure for sparsely sampled time course data in the form of a log-likelihood ratio of Gaussian processes (GP). The proposed GP similarity is similar to a Bayes factor and provides enhanced robustness to noise in sparse time series, such as those found in various biological settings, e.g., gene transcriptomics. We show that the GP measure is equivalent to the Euclidean distance when the noise variance in the GP is negligible compared to the noise variance of the signal. Our numerical experiments on both synthetic and real data show improved performance of the GP similarity when used in conjunction with two distance-based clustering methods.
Ming DK, Myall AC, Hernandez B, et al., 2021, Informing antimicrobial management in the context of COVID-19: understanding the longitudinal dynamics of C-reactive protein and procalcitonin, BMC Infectious Diseases, Vol: 21
Background: To characterise the longitudinal dynamics of C-reactive protein (CRP) and Procalcitonin (PCT) in a cohort of hospitalised patients with COVID-19 and support antimicrobial decision-making. Methods: Longitudinal CRP and PCT concentrations and trajectories of 237 hospitalised patients with COVID-19 were modelled. The dataset comprised of 2,021 data points for CRP and 284 points for PCT. Pairwise comparisons were performed between: (i) those with or without significant bacterial growth from cultures, and (ii) those who survived or died in hospital. Results: CRP concentrations were higher over time in COVID-19 patients with positive microbiology (day 9: 236 vs 123 mg/L, p < 0.0001) and in those who died (day 8: 226 vs 152 mg/L, p < 0.0001) but only after day 7 of COVID-related symptom onset. Failure for CRP to reduce in the first week of hospital admission was associated with significantly higher odds of death. PCT concentrations were higher in patients with COVID-19 and positive microbiology or in those who died, although these differences were not statistically significant. Conclusions: Both the absolute CRP concentration and the trajectory during the first week of hospital admission are important factors predicting microbiology culture positivity and outcome in patients hospitalised with COVID-19. Further work is needed to describe the role of PCT for co-infection. Understanding relationships of these biomarkers can support development of risk models and inform optimal antimicrobial strategies.
Boonyasiri A, Myall AC, Wan Y, et al., 2021, Integrated patient network and genomic plasmid analysis reveal a regional, multi-species outbreak of carbapenemase-producing Enterobacterales carrying both blaIMP and mcr-9 genes
<jats:title>Abstract</jats:title><jats:p>The incidence of carbapenemase-producing Enterobacterales (CPE) is rising globally, yet Imipenemase (IMP) carbapenemases remain relatively rare. This study describes an investigation of the emergence of IMP-encoding CPE amongst diverse Enterobacterales species between 2016 and 2019 in patients across a London regional hospital network.</jats:p><jats:p>A network analysis approach to patient pathways, using routinely collected electronic health records, identified previously unrecognised contacts between patients who were IMP CPE positive on screening, implying potential bacterial transmission events. Whole genome sequencing of 85 Enterobacterales isolates from these patients revealed that 86% (73/85) were diverse species (predominantly <jats:italic>Klebsiella</jats:italic> spp, <jats:italic>Enterobacter</jats:italic> spp, <jats:italic>E. coli</jats:italic>) and harboured an IncHI2 plasmid, which carried both <jats:italic>bla</jats:italic><jats:sub>IMP</jats:sub> and the putative mobile colistin resistance gene <jats:italic>mcr-9</jats:italic>. Detailed phylogenetic analysis identified two distinct IncHI2 plasmid lineages, A and B, both of which showed significant association with patient movements between four hospital sites and across medical specialities.</jats:p><jats:p>Combined, our patient network and plasmid analyses demonstrate an interspecies, plasmid-mediated outbreak of <jats:italic>bla</jats:italic><jats:sub>IMP</jats:sub>CPE, which remained unidentified during standard microbiology and infection control investigations. With whole genome sequencing (WGS) technologies and large-data incorporation, the outbreak investigation approach proposed here provides a framework for real-time identification of key factors causing pathogen spread. Analysing outbreaks at the plasmid level reveal
Myall A, Price JR, Peach RL, et al., 2021, Predicting hospital-onset COVID-19 infections using dynamic networks of patient contacts: an observational study
<jats:title>ABSTRACT</jats:title><jats:sec><jats:title>Background</jats:title><jats:p>Real-time prediction is key to prevention and control of healthcare-associated infections. Contacts between individuals drive infections, yet most prediction frameworks fail to capture the dynamics of contact. We develop a real-time machine learning framework that incorporates dynamic patient contact networks to predict patient-level hospital-onset COVID-19 infections (HOCIs), which we test and validate on international multi-site datasets spanning epidemic and endemic periods.</jats:p></jats:sec><jats:sec><jats:title>Methods</jats:title><jats:p>Our framework extracts dynamic contact networks from routinely collected hospital data and combines them with patient clinical attributes and background contextual hospital data to forecast the infection status of individual patients. We train and test the HOCI prediction framework using 51,157 hospital patients admitted to a UK (London) National Health Service (NHS) Trust from 01 April 2020 to 01 April 2021, spanning UK COVID-19 surges 1 and 2. We then validate the framework by applying it to data from a non-UK (Geneva) hospital site during an epidemic surge (40,057 total inpatients) and to data from the same London Trust from a subsequent period post surge 2, when COVID-19 had become endemic (43,375 total inpatients).</jats:p></jats:sec><jats:sec><jats:title>Findings</jats:title><jats:p>Based on the training data (London data spanning surges 1 and 2), the framework achieved high predictive performance using all variables (AUC-ROC 0·89 [0·88-0·90]) but was almost as predictive using only contact network variables (AUC-ROC 0·88 [0·86-0·90]), and more so than using only hospital contextual (AUC-ROC 0·82 [0·80-0·84]) or patient clinical (AUC-ROC 0·64 [0·62-0&mi
Maes A, Barahona M, Clopath C, 2021, Long- and short-term history effects in a spiking network model of statistical learning
<jats:title>ABSTRACT</jats:title><jats:p>The statistical structure of the environment is often important when making decisions. There are multiple theories of how the brain represents statistical structure. One such theory states that neural activity spontaneously samples from probability distributions. In other words, the network spends more time in states which encode high-probability stimuli. Existing spiking network models implementing sampling lack the ability to learn the statistical structure from observed stimuli and instead often hard-code a dynamics. Here, we focus on how arbitrary prior knowledge about the external world can both be learned and spontaneously recollected. We present a model based upon learning the inverse of the cumulative distribution function. Learning is entirely unsupervised using biophysical neurons and biologically plausible learning rules. We show how this prior knowledge can then be accessed to compute expectations and signal surprise in downstream networks. Sensory history effects emerge from the model as a consequence of ongoing learning.</jats:p>
<jats:title>Abstract</jats:title><jats:p>Single-cell RNA sequencing (scRNA-seq) data sets consist of high-dimensional, sparse and noisy feature vectors, and pose a challenge for classic methods for dimensionality reduction. We show that application of Hierarchical Poisson Factorisation (HPF) to scRNA-seq data produces robust factors, and outperforms other popular methods. To account for batch variability in composite data sets, we introduce Integrative Hierarchical Poisson Factorisation (IHPF), an extension of HPF that makes use of a noise ratio hyper-parameter to tune the variability attributed to technical (batches) <jats:italic>vs</jats:italic>. biological (cell phenotypes) sources. We exemplify the advantageous application of IHPF under data integration scenarios with varying alignments of technical noise and cell diversity, and show that IHPF produces latent factors with a dual block structure in both cell and gene spaces for enhanced biological interpretability.</jats:p>
Mersmann S, Stromich L, Song F, et al., 2021, ProteinLens: a web-based application for the analysis of allosteric signalling on atomistic graphs of biomolecules, Nucleic Acids Research, Vol: 49, Pages: W551-W558, ISSN: 0305-1048
The investigation of allosteric effects in biomolecular structures is of great current interest in diverse areas, from fundamental biological enquiry to drug discovery. Here we present ProteinLens, a user-friendly and interactive web application for the investigation of allosteric signalling based on atomistic graph-theoretical methods. Starting from the PDB file of a biomolecule (or a biomolecular complex) ProteinLens obtains an atomistic, energy-weighted graph description of the structure of the biomolecule, and subsequently provides a systematic analysis of allosteric signalling and communication across the structure using two computationally efficient methods: Markov Transients and bond-to-bond propensities. ProteinLens scores and ranks every bond and residue according to the speed and magnitude of the propagation of fluctuations emanating from any site of choice (e.g. the active site). The results are presented through statistical quantile scores visualised with interactive plots and adjustable 3D structure viewers, which can also be downloaded. ProteinLens thus allows the investigation of signalling in biomolecular structures of interest to aid the detection of allosteric sites and pathways. ProteinLens is implemented in Python/SQL and freely available to use at: www.proteinlens.io.
Chrysostomou S, Roy R, Prischi F, et al., 2021, Repurposed floxacins targeting RSK4 prevent chemoresistance and metastasis in lung and bladder cancer., Science translational medicine, Vol: 13, ISSN: 1946-6234
Lung and bladder cancers are mostly incurable because of the early development of drug resistance and metastatic dissemination. Hence, improved therapies that tackle these two processes are urgently needed to improve clinical outcome. We have identified RSK4 as a promoter of drug resistance and metastasis in lung and bladder cancer cells. Silencing this kinase, through either RNA interference or CRISPR, sensitized tumor cells to chemotherapy and hindered metastasis in vitro and in vivo in a tail vein injection model. Drug screening revealed several floxacin antibiotics as potent RSK4 activation inhibitors, and trovafloxacin reproduced all effects of RSK4 silencing in vitro and in/ex vivo using lung cancer xenograft and genetically engineered mouse models and bladder tumor explants. Through x-ray structure determination and Markov transient and Deuterium exchange analyses, we identified the allosteric binding site and revealed how this compound blocks RSK4 kinase activation through binding to an allosteric site and mimicking a kinase autoinhibitory mechanism involving the RSK4's hydrophobic motif. Last, we show that patients undergoing chemotherapy and adhering to prophylactic levofloxacin in the large placebo-controlled randomized phase 3 SIGNIFICANT trial had significantly increased (<i>P</i> = 0.048) long-term overall survival times. Hence, we suggest that RSK4 inhibition may represent an effective therapeutic strategy for treating lung and bladder cancer.
Laumann F, Kügelgen JV, Barahona M, 2021, Kernel two-sample and independence tests for non-stationary random processes, ITISE 2021 (7th International conference on Time Series and Forecasting), Publisher: https://www.mdpi.com/2673-4591/5/1/31, Pages: 1-13
Two-sample and independence tests with the kernel-based MMD and HSIC haveshown remarkable results on i.i.d. data and stationary random processes.However, these statistics are not directly applicable to non-stationary randomprocesses, a prevalent form of data in many scientific disciplines. In thiswork, we extend the application of MMD and HSIC to non-stationary settings byassuming access to independent realisations of the underlying random process.These realisations - in the form of non-stationary time-series measured on thesame temporal grid - can then be viewed as i.i.d. samples from a multivariateprobability distribution, to which MMD and HSIC can be applied. We further showhow to choose suitable kernels over these high-dimensional spaces by maximisingthe estimated test power with respect to the kernel hyper-parameters. Inexperiments on synthetic data, we demonstrate superior performance of ourproposed approaches in terms of test power when compared to currentstate-of-the-art functional or multivariate two-sample and independence tests.Finally, we employ our methods on a real socio-economic dataset as an exampleapplication.
Peach R, Arnaudon A, Barahona M, 2021, Relative, local and global dimension in complex networks
<jats:title>Abstract</jats:title> <jats:p>Dimension is a fundamental property of objects and the space in which they are embedded. Yet ideal notions of dimension, as in Euclidean spaces, do not always translate to physical spaces, which can be constrained by boundaries and distorted by inhomogeneities, or to intrinsically discrete systems such as networks. To take into account locality, finiteness and discreteness, dynamical processes can be used to probe the space geometry and define its dimension. Here we show that each point in space can be assigned a relative dimension with respect to the source of a diffusive process, a concept that provides a scale-dependent definition for local and global dimension also applicable to networks. To showcase its application to physical systems, we demonstrate that the local dimension of structural protein graphs correlates with structural flexibility, and the relative dimension with respect to the active site uncovers regions involved in allosteric communication. In simple models of epidemics on networks, the relative dimension is predictive of the spreading capability of nodes, and identifies scales at which the graph structure is predictive of infectivity.</jats:p>
Myall AC, Peach RL, Weiße AY, et al., 2021, Network memory in the movement of hospital patients carrying drug-resistant bacteria, Applied Network Science, Vol: 6, ISSN: 2364-8228
Hospitals constitute highly interconnected systems that bring into contact anabundance of infectious pathogens and susceptible individuals, thus makinginfection outbreaks both common and challenging. In recent years, there hasbeen a sharp incidence of antimicrobial-resistance amongsthealthcare-associated infections, a situation now considered endemic in manycountries. Here we present network-based analyses of a data set capturing themovement of patients harbouring drug-resistant bacteria across three largeLondon hospitals. We show that there are substantial memory effects in themovement of hospital patients colonised with drug-resistant bacteria. Suchmemory effects break first-order Markovian transitive assumptions andsubstantially alter the conclusions from the analysis, specifically on noderankings and the evolution of diffusive processes. We capture variable lengthmemory effects by constructing a lumped-state memory network, which we then useto identify overlapping communities of wards. We find that these communities ofwards display a quasi-hierarchical structure at different levels of granularitywhich is consistent with different aspects of patient flows related to hospitallocations and medical specialties.
Saavedra-Garcia P, Roman-Trufero M, Al-Sadah HA, et al., 2021, Systems level profiling of chemotherapy-induced stress resolution in cancer cells reveals druggable trade-offs, Proceedings of the National Academy of Sciences of USA, Vol: 118, ISSN: 0027-8424
Cancer cells can survive chemotherapy-induced stress, but how they recover from it is not known.Using a temporal multiomics approach, we delineate the global mechanisms of proteotoxic stressresolution in multiple myeloma cells recovering from proteasome inhibition. Our observations definelayered and protracted programmes for stress resolution that encompass extensive changes acrossthe transcriptome, proteome, and metabolome. Cellular recovery from proteasome inhibitioninvolved protracted and dynamic changes of glucose and lipid metabolism and suppression ofmitochondrial function. We demonstrate that recovering cells are more vulnerable to specific insultsthan acutely stressed cells and identify the general control nonderepressable 2 (GCN2)-driven cellularresponse to amino acid scarcity as a key recovery-associated vulnerability. Using a transcriptomeanalysis pipeline, we further show that GCN2 is also a stress-independent bona fide target intranscriptional signature-defined subsets of solid cancers that share molecular characteristics. Thus,identifying cellular trade-offs tied to the resolution of chemotherapy-induced stress in tumour cellsmay reveal new therapeutic targets and routes for cancer therapy optimisation.
Myall A, Peach RL, Wan Y, et al., 2021, Characterising contact in disease outbreaks via a network model of spatial-temporal proximity
<jats:title>ABSTRACT</jats:title><jats:p>Contact tracing is a key tool in epidemiology to identify and control outbreaks of infectious diseases. Existing contact tracing methodologies produce contact maps of individuals based on a binary definition of contact which can be hampered by missing data and indirect contacts. Here, we present a Spatial-temporal Epidemiological Proximity (StEP) model to recover contact maps in disease outbreaks based on movement data. The StEP model accounts for imperfect data by considering probabilistic contacts between individuals based on spatial-temporal proximity of their movement trajectories, creating a robust movement network despite possible missing data and unseen transmission routes. Using real-world data we showcase the potential of StEP for contact tracing with outbreaks of multidrug-resistant bacteria and COVID-19 in a large hospital group in London, UK. In addition to the core structure of contacts that can be recovered using traditional methods of contact tracing, the StEP model reveals missing contacts that connect seemingly separate outbreaks. Comparison with genomic data further confirmed that these recovered contacts indeed improve characterisation of disease transmission and so highlights how the StEP framework can inform effective strategies of infection control and prevention.</jats:p>
Peach RL, Arnaudon A, Schmidt JA, et al., 2021, HCGA: Highly comparative graph analysis for network phenotyping, Patterns, Vol: 2, Pages: 100227-100227, ISSN: 2666-3899
<jats:title>A<jats:sc>bstract</jats:sc></jats:title><jats:p>Networks are widely used as mathematical models of complex systems across many scientific disciplines, not only in biology and medicine but also in the social sciences, physics, computing and engineering. Decades of work have produced a vast corpus of research characterising the topological, combinatorial, statistical and spectral properties of graphs. Each graph property can be thought of as a feature that captures important (and some times overlapping) characteristics of a network. In the analysis of real-world graphs, it is crucial to integrate systematically a large number of diverse graph features in order to characterise and classify networks, as well as to aid network-based scientific discovery. In this paper, we introduce HCGA, a framework for highly comparative analysis of graph data sets that computes several thousands of graph features from any given network. HCGA also offers a suite of statistical learning and data analysis tools for automated identification and selection of important and interpretable features underpinning the characterisation of graph data sets. We show that HCGA outperforms other methodologies on supervised classification tasks on benchmark data sets whilst retaining the interpretability of network features. We also illustrate how HCGA can be used for network-based discovery through two examples where data is naturally represented as graphs: the clustering of a data set of images of neuronal morphologies, and a regression problem to predict charge transfer in organic semiconductors based on their structure. HCGA is an open platform that can be expanded to include further graph properties and statistical learning tools to allow researchers to leverage the wide breadth of graph-theoretical research to quantitatively analyse and draw insights from network data.</jats:p>
Qian Y, Expert P, Panzarasa P, et al., 2021, Geometric graphs from data to aid classification tasks with Graph Convolutional Networks, Patterns, Vol: 2, Pages: 100237-100237, ISSN: 2666-3899
Maes A, Barahona M, Clopath C, 2021, Learning compositional sequences with multiple time scales through a hierarchical network of spiking neurons, PLoS Computational Biology, Vol: 17, ISSN: 1553-734X
Sequential behaviour is often compositional and organised across multiple time scales: a set of individual elements developing on short time scales (motifs) are combined to form longer functional sequences (syntax). Such organisation leads to a natural hierarchy that can be used advantageously for learning, since the motifs and the syntax can be acquired independently. Despite mounting experimental evidence for hierarchical structures in neuroscience, models for temporal learning based on neuronal networks have mostly focused on serial methods. Here, we introduce a network model of spiking neurons with a hierarchical organisation aimed at sequence learning on multiple time scales. Using biophysically motivated neuron dynamics and local plasticity rules, the model can learn motifs and syntax independently. Furthermore, the model can relearn sequences efficiently and store multiple sequences. Compared to serial learning, the hierarchical model displays faster learning, more flexible relearning, increased capacity, and higher robustness to perturbations. The hierarchical model redistributes the variability: it achieves high motif fidelity at the cost of higher variability in the between-motif timings.
Kuntz Nussio J, Thomas P, Stan G, et al., 2021, Approximations of countably-infinite linear programs over bounded measure spaces, SIAM Journal on Optimization, Vol: 31, Pages: 604-625, ISSN: 1052-6234
We study a class of countably-infinite-dimensional linear programs (CILPs)whose feasible sets are bounded subsets of appropriately defined spaces ofmeasures. The optimal value, optimal points, and minimal points of these CILPscan be approximated by solving finite-dimensional linear programs. We show howto construct finite-dimensional programs that lead to approximations witheasy-to-evaluate error bounds, and we prove that the errors converge to zero asthe size of the finite-dimensional programs approaches that of the originalproblem. We discuss the use of our methods in the computation of the stationarydistributions, occupation measures, and exit distributions of Markov~chains.
Peach R, Greenbury S, Johnston I, et al., 2021, Understanding learner behaviour in online courses with Bayesian modelling and time series characterisation, Scientific Reports, Vol: 11, ISSN: 2045-2322
The intrinsic temporality of learning demands the adoption of methodologies capable of exploiting time-series information. In this study we leverage the sequence data framework and show how data-driven analysis of temporal sequences of task completion in online courses can be used to characterise personal and group learners’ behaviors, and to identify critical tasks and course sessions in a given course design. We also introduce a recently developed probabilistic Bayesian model to learn sequential behaviours of students and predict student performance. The application of our data-driven sequence-based analyses to data from learners undertaking an on-line Business Management course reveals distinct behaviors within the cohort of learners, identifying learners or groups of learners that deviate from the nominal order expected in the course. Using course grades a posteriori, we explore differences in behavior between high and low performing learners. We find that high performing learners follow the progression between weekly sessions more regularly than low performing learners, yet within each weekly session high performing learners are less tied to the nominal task order. We then model the sequences of high and low performance students using the probablistic Bayesian model and show that we can learn engagement behaviors associated with performance. We also show that the data sequence framework can be used for task-centric analysis; we identify critical junctures and differences among types of tasks within the course design. We find that non-rote learning tasks, such as interactive tasks or discussion posts, are correlated with higher performance. We discuss the application of such analytical techniques as an aid to course design, intervention, and student supervision.
Dusad V, Thiel D, Barahona M, et al., 2021, Opportunities at the interface of network science and metabolic modelling, Frontiers in Bioengineering and Biotechnology, Vol: 8, ISSN: 2296-4185
Metabolism plays a central role in cell physiology because it provides the molecular machinery for growth. At the genome-scale, metabolism is made up of thousands of reactions interacting with one another. Untangling this complexity is key to understand how cells respond to genetic, environmental, or therapeutic perturbations. Here we discuss the roles of two complementary strategies for the analysis of genome-scale metabolic models: Flux Balance Analysis (FBA) and network science. While FBA estimates metabolic flux on the basis of an optimization principle, network approaches reveal emergent properties of the global metabolic connectivity. We highlight how the integration of both approaches promises to deliver insights on the structure and function of metabolic systems with wide-ranging implications in discovery science, precision medicine and industrial biotechnology.
Altuncu T, Yaliraki S, Barahona M, 2021, Graph-based topic extraction from vector embeddings of text documents: application to a corpus of news articles, Complex Networks & Their Applications IX, Editors: Benito, Cherifi, Cherifi, Moro, Rocha, Sales-Pardo, Publisher: Springer International Publishing, Pages: 154-166, ISBN: 978-3-030-65351-4
Production of news content is growing at an astonishing rate. To help manage and monitor the sheer amount of text, there is an increasing need to develop efficient methods that can provide insights into emerging content areas, and stratify unstructured corpora of text into ‘topics’ that stem intrinsically from content similarity. Here we present an unsupervised framework that brings together powerful vector embeddings from natural language processing with tools from multiscale graph partitioning that can revealnatural partitions at different resolutions without making a priori assumptions about the number of clusters in the corpus. We show the advantages of graph-based clustering through end-to-end comparisons with other popular clustering and topic modelling methods, and also evaluate different text vector embeddings, from classic Bag-of-Words to Doc2Vec to the recent transformers based model Bert. This comparative work is showcased through an analysis of a corpus of US news coverage during the presidential election year of 2016.
Schreglmann SR, Wang D, Peach RL, et al., 2021, Non-invasive suppression of essential tremor via phase-locked disruption of its temporal coherence, NATURE COMMUNICATIONS, Vol: 12
Qian Y, Expert P, Rieu T, et al., 2021, Quantifying the alignment of graph and features in deep learning, IEEE Transactions on Neural Networks and Learning Systems, Pages: 1-10, ISSN: 1045-9227
We show that the classification performance of graph convolutional networks (GCNs) is related to the alignment between features, graph, and ground truth, which we quantify using a subspace alignment measure (SAM) corresponding to the Frobenius norm of the matrix of pairwise chordal distances between three subspaces associated with features, graph, and ground truth. The proposed measure is based on the principal angles between subspaces and has both spectral and geometrical interpretations. We showcase the relationship between the SAM and the classification performance through the study of limiting cases of GCNs and systematic randomizations of both features and graph structure applied to a constructive example and several examples of citation networks of different origins. The analysis also reveals the relative importance of the graph and features for classification purposes.
Price JR, Mookerjee S, Dyakova E, et al., 2021, Development and delivery of a real-time hospital-onset COVID-19 surveillance system using network analysis, Clinical Infectious Diseases, Vol: 72, Pages: 82-89, ISSN: 1058-4838
BackgroundUnderstanding nosocomial acquisition, outbreaks and transmission chains in real-time will be fundamental to ensuring infection prevention measures are effective in controlling COVID-19 in healthcare. We report the design and implementation of a hospital-onset COVID-19 infection (HOCI) surveillance system for an acute healthcare setting to target prevention interventions.MethodsThe study took place in a large teaching hospital group in London, UK. All patients tested for SARS-CoV-2 between 4th March and 14th April 2020 were included. Utilising data routinely collected through electronic healthcare systems we developed a novel surveillance system for determining and reporting HOCI incidence and providing real-time network analysis. We provided daily reports on incidence and trends over time to support HOCI investigation, and generated geo-temporal reports using network analysis to interrogate admission pathways for common epidemiological links to infer transmission chains. By working with stakeholders the reports were co-designed for end users.ResultsReal-time surveillance reports revealed: changing rates of HOCI throughout the course of the COVID-19 epidemic; key wards fuelling probable transmission events; HOCIs over-represented in particular specialities managing high-risk patients; the importance of integrating analysis of individual prior pathways; and the value of co-design in producing data visualisation. Our surveillance system can effectively support national surveillance.ConclusionsThrough early analysis of the novel surveillance system we have provided a description of HOCI rates and trends over time using real-time shifting denominator data. We demonstrate the importance of including the analysis of patient pathways and networks in characterising risk of transmission and targeting infection control interventions.
Kuntz J, Thomas P, Stan G-B, et al., 2021, Stationary Distributions of Continuous-Time Markov Chains: A Review of Theory and Truncation-Based Approximations, SIAM Review, Vol: 63, Pages: 3-64, ISSN: 0036-1445
Strömich L, Wu N, Barahona M, et al., 2020, Allosteric hotspots in the main protease of SARS-CoV-2
<jats:title>Abstract</jats:title> <jats:p>Inhibiting the main protease of SARS-CoV-2 is of great interest in tackling the COVID-19 pandemic caused by the virus. Most efforts have been centred on inhibiting the binding site of the enzyme. However, considering allosteric sites, distant from the active or orthosteric site, broadens the search space for drug candidates and confers the advantages of allosteric drug targeting. Here, we report the allosteric communication pathways in the main protease dimer by using two novel fully atomistic graph theoretical methods: bond-to-bond propensity analysis, which has been previously successful in identifying allosteric sites without <jats:italic>a priori</jats:italic> knowledge in benchmark data sets, and, Markov transient analysis, which has previously aided in finding novel drug targets in catalytic protein families. We further score the highest-ranking sites against random sites in similar distances through statistical bootstrapping and identify four statistically significant putative allosteric sites as good candidates for alternative drug targeting.</jats:p>
Clarke J, Murray A, Markar S, et al., 2020, A new geographic model of care to manage the post-COVID-19 elective surgery aftershock in England: a retrospective observational study, BMJ Open, Vol: 10, Pages: 1-9, ISSN: 2044-6055
Objectives The suspension of elective surgery during the COVID pandemic is unprecedented and has resulted in record volumes of patients waiting for operations. Novel approaches that maximise capacity and efficiency of surgical care are urgently required. This study applies Markov Multiscale Community Detection (MMCD), an unsupervised graph-based clustering framework, to identify new surgical care models based on pooled waiting lists delivered across an expanded network of surgical providers. DesignRetrospective observational study using Hospital Episode Statistics.SettingPublic and private hospitals providing surgical care to National Health Service (NHS) patients in England. ParticipantsAll adult patients resident in England undergoing NHS-funded planned surgical procedures between 1st April 2017 and 31st March 2018. Main outcome measuresThe identification of the most common planned surgical procedures in England (High Volume Procedures – HVP) and proportion of low, medium and high-risk patients undergoing each HVP. The mapping of hospitals providing surgical care onto optimised groupings based on patient usage data.ResultsA total of 7,811,891 planned operations were identified in 4,284,925 adults during the one-year period of our study. The 28 most common surgical procedures accounted for a combined 3,907,474 operations (50.0% of the total). 2,412,613 (61.7%) of these most common procedures involved ‘low risk’ patients. Patients travelled an average of 11.3 km for these procedures. Based on the data, MMCD partitioned England into 45, 16 and 7 mutually exclusive and collectively exhaustive natural surgical communities of increasing coarseness. The coarser partitions into 16 and 7 surgical communities were shown to be associated with balanced supply and demand for surgical care within communities.ConclusionsPooled waiting lists for low risk elective procedures and patients across integrated, expanded natural surgical community networks have the pot
Clarke J, Beaney T, Majeed A, et al., 2020, Identifying Naturally Occurring Communities of Primary Care Providers in the English National Health Service in London, AcademyHealth Annual Research Meeting (ARM), Publisher: WILEY, Pages: 107-108, ISSN: 0017-9124
Clarke J, Beaney T, Majeed A, et al., 2020, Identifying naturally occurring communities of primary care providers in the English National Health Service in London, BMJ Open, Vol: 10, Pages: 1-7, ISSN: 2044-6055
Objectives - Primary Care Networks (PCNs) are a new organisational hierarchy with wide-ranging responsibilities introduced in the National Health Service (NHS) Long Term Plan. The vision is that they represent ‘natural’ communities of general practices (GP practices) working together at scale and covering a geography that make sense to practices, other healthcare providers and local communities. Our study aims to identify natural communities of GP practices based on patient registration patterns using Markov Multiscale Community Detection, an unsupervised network-based clustering technique to create catchments for these communities.Design - Retrospective observational study using Hospital Episode Statistics – patient-level administrative records of inpatient, outpatient and emergency department attendances to hospital.Setting – General practices in the 32 Clinical Commissioning Groups of Greater London Participants - All adult patients resident in and registered to a GP practices in Greater London that had one or more outpatient encounters at NHS hospital trusts between 1st April 2017 and 31st March 2018.Main outcome measures The allocation of GP practices in Greater London to PCNs based on the registrations of patients resident in each Lower Super Output Area (LSOA) of Greater London. The population size and coverage of each proposed PCN. Results - 3,428,322 unique patients attended 1,334 GPs in 4,835 LSOAs in Greater London. Our model grouped 1,291 GPs (96.8%) and 4,721 LSOAs (97.6%), into 165 mutually exclusive PCNs. The median PCN list size was 53,490, with a lower quartile of 38,079 patients and an upper quartile of 72,982 patients. A median of 70.1% of patients attended a GP within their allocated PCN, ranging from 44.6% to 91.4%.Conclusions - With PCNs expected to take a role in population health management and with community providers expected to reconfigure around them, it is vital we recognise how PCNs represent their communities. O
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.