Imperial College London

ProfessorMauricioBarahona

Faculty of Natural SciencesDepartment of Mathematics

Chair in Biomathematics
 
 
 
//

Contact

 

m.barahona Website

 
 
//

Location

 

6M31Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

173 results found

Saavedra-Garcia P, Roman-Trufero M, Al-Sadah HA, Blighe K, Lopez-Jimenez E, Christoforou M, Penfold L, Capece D, Xiong X, Miao Y, Parzych K, Caputo V, Siskos AP, Encheva V, Liu Z, Thiel D, Kaiser MF, Piazza P, Chaidos A, Karadimitris A, Franzoso G, Snijder AP, Keun HC, Oyarzún DA, Barahona M, Auner Het al., 2021, Systems level profiling of chemotherapy-induced stress resolution in cancer cells reveals druggable trade-offs, Proceedings of the National Academy of Sciences of USA, Vol: 118, ISSN: 0027-8424

Cancer cells can survive chemotherapy-induced stress, but how they recover from it is not known.Using a temporal multiomics approach, we delineate the global mechanisms of proteotoxic stressresolution in multiple myeloma cells recovering from proteasome inhibition. Our observations definelayered and protracted programmes for stress resolution that encompass extensive changes acrossthe transcriptome, proteome, and metabolome. Cellular recovery from proteasome inhibitioninvolved protracted and dynamic changes of glucose and lipid metabolism and suppression ofmitochondrial function. We demonstrate that recovering cells are more vulnerable to specific insultsthan acutely stressed cells and identify the general control nonderepressable 2 (GCN2)-driven cellularresponse to amino acid scarcity as a key recovery-associated vulnerability. Using a transcriptomeanalysis pipeline, we further show that GCN2 is also a stress-independent bona fide target intranscriptional signature-defined subsets of solid cancers that share molecular characteristics. Thus,identifying cellular trade-offs tied to the resolution of chemotherapy-induced stress in tumour cellsmay reveal new therapeutic targets and routes for cancer therapy optimisation.

Journal article

Mersmann S, Stromich L, Song F, Wu N, Vianello F, Barahona M, Yaliraki Set al., 2021, ProteinLens: a web-based application for the analysis of allosteric signalling on atomistic graphs of biomolecules, Nucleic Acids Research, ISSN: 0305-1048

Journal article

Myall A, Peach RL, Wan Y, Mookerjee S, Jauneikaite E, Bolt F, Price J, Davies F, Weiße AY, Holmes A, Barahona Met al., 2021, Characterising contact in disease outbreaks via a network model of spatial-temporal proximity

<jats:title>ABSTRACT</jats:title><jats:p>Contact tracing is a key tool in epidemiology to identify and control outbreaks of infectious diseases. Existing contact tracing methodologies produce contact maps of individuals based on a binary definition of contact which can be hampered by missing data and indirect contacts. Here, we present a Spatial-temporal Epidemiological Proximity (StEP) model to recover contact maps in disease outbreaks based on movement data. The StEP model accounts for imperfect data by considering probabilistic contacts between individuals based on spatial-temporal proximity of their movement trajectories, creating a robust movement network despite possible missing data and unseen transmission routes. Using real-world data we showcase the potential of StEP for contact tracing with outbreaks of multidrug-resistant bacteria and COVID-19 in a large hospital group in London, UK. In addition to the core structure of contacts that can be recovered using traditional methods of contact tracing, the StEP model reveals missing contacts that connect seemingly separate outbreaks. Comparison with genomic data further confirmed that these recovered contacts indeed improve characterisation of disease transmission and so highlights how the StEP framework can inform effective strategies of infection control and prevention.</jats:p>

Journal article

Qian Y, Expert P, Panzarasa P, Barahona Met al., 2021, Geometric graphs from data to aid classification tasks with Graph Convolutional Networks, Patterns, Vol: 2, Pages: 100237-100237, ISSN: 2666-3899

Journal article

Peach RL, Arnaudon A, Schmidt JA, Palasciano HA, Bernier NR, Jelfs KE, Yaliraki SN, Barahona Met al., 2021, HCGA: Highly comparative graph analysis for network phenotyping, Patterns, Vol: 2, Pages: 100227-100227, ISSN: 2666-3899

<jats:title>A<jats:sc>bstract</jats:sc></jats:title><jats:p>Networks are widely used as mathematical models of complex systems across many scientific disciplines, not only in biology and medicine but also in the social sciences, physics, computing and engineering. Decades of work have produced a vast corpus of research characterising the topological, combinatorial, statistical and spectral properties of graphs. Each graph property can be thought of as a feature that captures important (and some times overlapping) characteristics of a network. In the analysis of real-world graphs, it is crucial to integrate systematically a large number of diverse graph features in order to characterise and classify networks, as well as to aid network-based scientific discovery. In this paper, we introduce HCGA, a framework for highly comparative analysis of graph data sets that computes several thousands of graph features from any given network. HCGA also offers a suite of statistical learning and data analysis tools for automated identification and selection of important and interpretable features underpinning the characterisation of graph data sets. We show that HCGA outperforms other methodologies on supervised classification tasks on benchmark data sets whilst retaining the interpretability of network features. We also illustrate how HCGA can be used for network-based discovery through two examples where data is naturally represented as graphs: the clustering of a data set of images of neuronal morphologies, and a regression problem to predict charge transfer in organic semiconductors based on their structure. HCGA is an open platform that can be expanded to include further graph properties and statistical learning tools to allow researchers to leverage the wide breadth of graph-theoretical research to quantitatively analyse and draw insights from network data.</jats:p>

Journal article

Myall AC, Peach RL, Weiße AY, Davies F, Mookerjee S, Holmes A, Barahona Met al., 2021, Network memory in the movement of hospital patients carrying drug-resistant bacteria, Applied Network Science, ISSN: 2364-8228

Hospitals constitute highly interconnected systems that bring into contact anabundance of infectious pathogens and susceptible individuals, thus makinginfection outbreaks both common and challenging. In recent years, there hasbeen a sharp incidence of antimicrobial-resistance amongsthealthcare-associated infections, a situation now considered endemic in manycountries. Here we present network-based analyses of a data set capturing themovement of patients harbouring drug-resistant bacteria across three largeLondon hospitals. We show that there are substantial memory effects in themovement of hospital patients colonised with drug-resistant bacteria. Suchmemory effects break first-order Markovian transitive assumptions andsubstantially alter the conclusions from the analysis, specifically on noderankings and the evolution of diffusive processes. We capture variable lengthmemory effects by constructing a lumped-state memory network, which we then useto identify overlapping communities of wards. We find that these communities ofwards display a quasi-hierarchical structure at different levels of granularitywhich is consistent with different aspects of patient flows related to hospitallocations and medical specialties.

Journal article

Liu Z, Barahona M, 2021, Similarity Measure for Sparse Time Course Data Based on Gaussian Processes

<jats:title>Abstract</jats:title><jats:p>We propose a similarity measure for sparsely sampled time course data in the form of a loglikelihood ratio of Gaussian processes (GP). The proposed GP similarity is similar to a Bayes factor and provides enhanced robustness to noise in sparse time series, such as those found in various biological settings, e.g., gene transcriptomics. We show that the GP measure is equivalent to the Euclidean distance when the noise variance in the GP is negligible compared to the noise variance of the signal. Our numerical experiments on both synthetic and real data show improved performance of the GP similarity when used in conjunction with two distance-based clustering methods.</jats:p>

Journal article

Maes A, Barahona M, Clopath C, 2021, Learning compositional sequences with multiple time scales through a hierarchical network of spiking neurons, PLOS COMPUTATIONAL BIOLOGY, Vol: 17, ISSN: 1553-734X

Journal article

Kuntz Nussio J, Thomas P, Stan G, Barahona Met al., 2021, Approximations of countably-infinite linear programs over bounded measure spaces, SIAM Journal on Optimization, Vol: 31, Pages: 604-625, ISSN: 1052-6234

We study a class of countably-infinite-dimensional linear programs (CILPs)whose feasible sets are bounded subsets of appropriately defined spaces ofmeasures. The optimal value, optimal points, and minimal points of these CILPscan be approximated by solving finite-dimensional linear programs. We show howto construct finite-dimensional programs that lead to approximations witheasy-to-evaluate error bounds, and we prove that the errors converge to zero asthe size of the finite-dimensional programs approaches that of the originalproblem. We discuss the use of our methods in the computation of the stationarydistributions, occupation measures, and exit distributions of Markov~chains.

Journal article

Peach R, Greenbury S, Johnston I, Yaliraki S, Lefevre D, Barahona Met al., 2021, Understanding learner behaviour in online courses with Bayesian modelling and time series characterisation, Scientific Reports, Vol: 11, ISSN: 2045-2322

The intrinsic temporality of learning demands the adoption of methodologies capable of exploiting time-series information. In this study we leverage the sequence data framework and show how data-driven analysis of temporal sequences of task completion in online courses can be used to characterise personal and group learners’ behaviors, and to identify critical tasks and course sessions in a given course design. We also introduce a recently developed probabilistic Bayesian model to learn sequential behaviours of students and predict student performance. The application of our data-driven sequence-based analyses to data from learners undertaking an on-line Business Management course reveals distinct behaviors within the cohort of learners, identifying learners or groups of learners that deviate from the nominal order expected in the course. Using course grades a posteriori, we explore differences in behavior between high and low performing learners. We find that high performing learners follow the progression between weekly sessions more regularly than low performing learners, yet within each weekly session high performing learners are less tied to the nominal task order. We then model the sequences of high and low performance students using the probablistic Bayesian model and show that we can learn engagement behaviors associated with performance. We also show that the data sequence framework can be used for task-centric analysis; we identify critical junctures and differences among types of tasks within the course design. We find that non-rote learning tasks, such as interactive tasks or discussion posts, are correlated with higher performance. We discuss the application of such analytical techniques as an aid to course design, intervention, and student supervision.

Journal article

Dusad V, Thiel D, Barahona M, Keun H, Oyarzun Det al., 2021, Opportunities at the interface of network science and metabolic modelling, Frontiers in Bioengineering and Biotechnology, Vol: 8, ISSN: 2296-4185

Metabolism plays a central role in cell physiology because it provides the molecular machinery for growth. At the genome-scale, metabolism is made up of thousands of reactions interacting with one another. Untangling this complexity is key to understand how cells respond to genetic, environmental, or therapeutic perturbations. Here we discuss the roles of two complementary strategies for the analysis of genome-scale metabolic models: Flux Balance Analysis (FBA) and network science. While FBA estimates metabolic flux on the basis of an optimization principle, network approaches reveal emergent properties of the global metabolic connectivity. We highlight how the integration of both approaches promises to deliver insights on the structure and function of metabolic systems with wide-ranging implications in discovery science, precision medicine and industrial biotechnology.

Journal article

Altuncu T, Yaliraki S, Barahona M, 2021, Graph-based topic extraction from vector embeddings of text documents: application to a corpus of news articles, Complex Networks & Their Applications IX, Editors: Benito, Cherifi, Cherifi, Moro, Rocha, Sales-Pardo, Publisher: Springer International Publishing, Pages: 154-166, ISBN: 978-3-030-65351-4

Production of news content is growing at an astonishing rate. To help manage and monitor the sheer amount of text, there is an increasing need to develop efficient methods that can provide insights into emerging content areas, and stratify unstructured corpora of text into ‘topics’ that stem intrinsically from content similarity. Here we present an unsupervised framework that brings together powerful vector embeddings from natural language processing with tools from multiscale graph partitioning that can revealnatural partitions at different resolutions without making a priori assumptions about the number of clusters in the corpus. We show the advantages of graph-based clustering through end-to-end comparisons with other popular clustering and topic modelling methods, and also evaluate different text vector embeddings, from classic Bag-of-Words to Doc2Vec to the recent transformers based model Bert. This comparative work is showcased through an analysis of a corpus of US news coverage during the presidential election year of 2016.

Book chapter

Schreglmann SR, Wang D, Peach RL, Li J, Zhang X, Latorre A, Rhodes E, Panella E, Cassara AM, Boyden ES, Barahona M, Santaniello S, Rothwell J, Bhatia KP, Grossman Net al., 2021, Non-invasive suppression of essential tremor via phase-locked disruption of its temporal coherence, NATURE COMMUNICATIONS, Vol: 12, ISSN: 2041-1723

Journal article

Qian Y, Expert P, Rieu T, Panzarasa P, Barahona Met al., 2021, Quantifying the alignment of graph and features in deep learning, IEEE Transactions on Neural Networks and Learning Systems, Pages: 1-10, ISSN: 1045-9227

We show that the classification performance of graph convolutional networks (GCNs) is related to the alignment between features, graph, and ground truth, which we quantify using a subspace alignment measure (SAM) corresponding to the Frobenius norm of the matrix of pairwise chordal distances between three subspaces associated with features, graph, and ground truth. The proposed measure is based on the principal angles between subspaces and has both spectral and geometrical interpretations. We showcase the relationship between the SAM and the classification performance through the study of limiting cases of GCNs and systematic randomizations of both features and graph structure applied to a constructive example and several examples of citation networks of different origins. The analysis also reveals the relative importance of the graph and features for classification purposes.

Journal article

Price JR, Mookerjee S, Dyakova E, Myall A, Leung W, Weiße AY, Shersing Y, Brannigan ET, Galletly T, Muir D, Randell P, Davies F, Bolt F, Barahona M, Otter JA, Holmes AHet al., 2021, Development and delivery of a real-time hospital-onset COVID-19 surveillance system using network analysis, Clinical Infectious Diseases, Vol: 72, Pages: 82-89, ISSN: 1058-4838

BackgroundUnderstanding nosocomial acquisition, outbreaks and transmission chains in real-time will be fundamental to ensuring infection prevention measures are effective in controlling COVID-19 in healthcare. We report the design and implementation of a hospital-onset COVID-19 infection (HOCI) surveillance system for an acute healthcare setting to target prevention interventions.MethodsThe study took place in a large teaching hospital group in London, UK. All patients tested for SARS-CoV-2 between 4th March and 14th April 2020 were included. Utilising data routinely collected through electronic healthcare systems we developed a novel surveillance system for determining and reporting HOCI incidence and providing real-time network analysis. We provided daily reports on incidence and trends over time to support HOCI investigation, and generated geo-temporal reports using network analysis to interrogate admission pathways for common epidemiological links to infer transmission chains. By working with stakeholders the reports were co-designed for end users.ResultsReal-time surveillance reports revealed: changing rates of HOCI throughout the course of the COVID-19 epidemic; key wards fuelling probable transmission events; HOCIs over-represented in particular specialities managing high-risk patients; the importance of integrating analysis of individual prior pathways; and the value of co-design in producing data visualisation. Our surveillance system can effectively support national surveillance.ConclusionsThrough early analysis of the novel surveillance system we have provided a description of HOCI rates and trends over time using real-time shifting denominator data. We demonstrate the importance of including the analysis of patient pathways and networks in characterising risk of transmission and targeting infection control interventions.

Journal article

Kuntz J, Thomas P, Stan G-B, Barahona Met al., 2021, Stationary Distributions of Continuous-Time Markov Chains: A Review of Theory and Truncation-Based Approximations, SIAM Review, Vol: 63, Pages: 3-64, ISSN: 0036-1445

Journal article

Strömich L, Wu N, Barahona M, Yaliraki Set al., 2020, Allosteric hotspots in the main protease of SARS-CoV-2

<jats:title>Abstract</jats:title> <jats:p>Inhibiting the main protease of SARS-CoV-2 is of great interest in tackling the COVID-19 pandemic caused by the virus. Most efforts have been centred on inhibiting the binding site of the enzyme. However, considering allosteric sites, distant from the active or orthosteric site, broadens the search space for drug candidates and confers the advantages of allosteric drug targeting. Here, we report the allosteric communication pathways in the main protease dimer by using two novel fully atomistic graph theoretical methods: bond-to-bond propensity analysis, which has been previously successful in identifying allosteric sites without <jats:italic>a priori</jats:italic> knowledge in benchmark data sets, and, Markov transient analysis, which has previously aided in finding novel drug targets in catalytic protein families. We further score the highest-ranking sites against random sites in similar distances through statistical bootstrapping and identify four statistically significant putative allosteric sites as good candidates for alternative drug targeting.</jats:p>

Journal article

Clarke J, Murray A, Markar S, Barahona M, Kinross Jet al., 2020, A new geographic model of care to manage the post-COVID-19 elective surgery aftershock in England: a retrospective observational study, BMJ Open, Vol: 10, Pages: 1-9, ISSN: 2044-6055

Objectives The suspension of elective surgery during the COVID pandemic is unprecedented and has resulted in record volumes of patients waiting for operations. Novel approaches that maximise capacity and efficiency of surgical care are urgently required. This study applies Markov Multiscale Community Detection (MMCD), an unsupervised graph-based clustering framework, to identify new surgical care models based on pooled waiting lists delivered across an expanded network of surgical providers. DesignRetrospective observational study using Hospital Episode Statistics.SettingPublic and private hospitals providing surgical care to National Health Service (NHS) patients in England. ParticipantsAll adult patients resident in England undergoing NHS-funded planned surgical procedures between 1st April 2017 and 31st March 2018. Main outcome measuresThe identification of the most common planned surgical procedures in England (High Volume Procedures – HVP) and proportion of low, medium and high-risk patients undergoing each HVP. The mapping of hospitals providing surgical care onto optimised groupings based on patient usage data.ResultsA total of 7,811,891 planned operations were identified in 4,284,925 adults during the one-year period of our study. The 28 most common surgical procedures accounted for a combined 3,907,474 operations (50.0% of the total). 2,412,613 (61.7%) of these most common procedures involved ‘low risk’ patients. Patients travelled an average of 11.3 km for these procedures. Based on the data, MMCD partitioned England into 45, 16 and 7 mutually exclusive and collectively exhaustive natural surgical communities of increasing coarseness. The coarser partitions into 16 and 7 surgical communities were shown to be associated with balanced supply and demand for surgical care within communities.ConclusionsPooled waiting lists for low risk elective procedures and patients across integrated, expanded natural surgical community networks have the pot

Journal article

Laumann F, Kügelgen JV, Barahona M, 2020, Kernel Two-Sample and Independence Tests for Non-Stationary Random Processes

Two-sample and independence tests with the kernel-based MMD and HSIC haveshown remarkable results on i.i.d. data and stationary random processes.However, these statistics are not directly applicable to non-stationary randomprocesses, a prevalent form of data in many scientific disciplines. In thiswork, we extend the application of MMD and HSIC to non-stationary settings byassuming access to independent realisations of the underlying random process.These realisations - in the form of non-stationary time-series measured on thesame temporal grid - can then be viewed as i.i.d. samples from a multivariateprobability distribution, to which MMD and HSIC can be applied. We further showhow to choose suitable kernels over these high-dimensional spaces by maximisingthe estimated test power with respect to the kernel hyper-parameters. Inexperiments on synthetic data, we demonstrate superior performance of ourproposed approaches in terms of test power when compared to currentstate-of-the-art functional or multivariate two-sample and independence tests.Finally, we employ our methods on a real socio-economic dataset as an exampleapplication.

Journal article

Clarke J, Beaney T, Majeed A, Darzi A, Barahona Met al., 2020, Identifying Naturally Occurring Communities of Primary Care Providers in the English National Health Service in London, AcademyHealth Annual Research Meeting (ARM), Publisher: WILEY, Pages: 107-108, ISSN: 0017-9124

Conference paper

Clarke J, Beaney T, Majeed A, Darzi A, Barahona Met al., 2020, Identifying naturally occurring communities of primary care providers in the English National Health Service in London, BMJ Open, Vol: 10, Pages: 1-7, ISSN: 2044-6055

Objectives - Primary Care Networks (PCNs) are a new organisational hierarchy with wide-ranging responsibilities introduced in the National Health Service (NHS) Long Term Plan. The vision is that they represent ‘natural’ communities of general practices (GP practices) working together at scale and covering a geography that make sense to practices, other healthcare providers and local communities. Our study aims to identify natural communities of GP practices based on patient registration patterns using Markov Multiscale Community Detection, an unsupervised network-based clustering technique to create catchments for these communities.Design - Retrospective observational study using Hospital Episode Statistics – patient-level administrative records of inpatient, outpatient and emergency department attendances to hospital.Setting – General practices in the 32 Clinical Commissioning Groups of Greater London Participants - All adult patients resident in and registered to a GP practices in Greater London that had one or more outpatient encounters at NHS hospital trusts between 1st April 2017 and 31st March 2018.Main outcome measures The allocation of GP practices in Greater London to PCNs based on the registrations of patients resident in each Lower Super Output Area (LSOA) of Greater London. The population size and coverage of each proposed PCN. Results - 3,428,322 unique patients attended 1,334 GPs in 4,835 LSOAs in Greater London. Our model grouped 1,291 GPs (96.8%) and 4,721 LSOAs (97.6%), into 165 mutually exclusive PCNs. The median PCN list size was 53,490, with a lower quartile of 38,079 patients and an upper quartile of 72,982 patients. A median of 70.1% of patients attended a GP within their allocated PCN, ranging from 44.6% to 91.4%.Conclusions - With PCNs expected to take a role in population health management and with community providers expected to reconfigure around them, it is vital we recognise how PCNs represent their communities. O

Journal article

Arnaudon A, Peach R, Barahona M, 2020, Scale-dependent measure of network centrality from diffusion dynamics, Physical Review Research, Vol: 2, ISSN: 2643-1564

Classic measures of graph centrality capture distinct aspects of node importance, from the local (e.g., degree) to the global (e.g., closeness). Here we exploit the connection between diffusion and geometry to introduce a multiscale centrality measure. A node is defined to be central if it breaks the metricity of the diffusion as a consequence of the effective boundaries and inhomogeneities in the graph. Our measure is naturally multiscale, as it is computed relative to graph neighbourhoods within the varying time horizon of the diffusion. We find that the centrality of nodes can differ widely at different scales. In particular, our measure correlates with degree (i.e., hubs) at small scales and with closeness (i.e., bridges) at large scales, and also reveals the existence of multi-centric structures in complex networks. By examining centrality across scales, our measure thus provides an evaluation of node importance relative to local and global processes on the network.

Journal article

Yu YW, Delvenne J-C, Yaliraki SN, Barahona Met al., 2020, Severability of mesoscale components and local time scales in dynamical networks

A major goal of dynamical systems theory is the search for simplifieddescriptions of the dynamics of a large number of interacting states. Foroverwhelmingly complex dynamical systems, the derivation of a reduceddescription on the entire dynamics at once is computationally infeasible. Othercomplex systems are so expansive that despite the continual onslaught of newdata only partial information is available. To address this challenge, wedefine and optimise for a local quality function severability for measuring thedynamical coherency of a set of states over time. The theoretical underpinningsof severability lie in our local adaptation of the Simon-Ando-Fisher time-scaleseparation theorem, which formalises the intuition of local wells in the Markovlandscape of a dynamical process, or the separation between a microscopic and amacroscopic dynamics. Finally, we demonstrate the practical relevance ofseverability by applying it to examples drawn from power networks, imagesegmentation, social networks, metabolic networks, and word association.

Journal article

Beaney T, Clarke J, Barahona M, Majeed Aet al., 2020, A primary care network analysis: natural communities of general practices in London, Publisher: Royal College of General Practitioners, ISSN: 0960-1643

BACKGROUND: Primary care networks (PCNs) are a new organisational hierarchy introduced in the NHS Long Term Plan with wide-ranging responsibilities. The vision is that they represent 'natural' communities of general practices with boundaries that make sense to practices, other healthcare providers, and local communities. AIM: Our study aims to identify natural communities of general practices based on patient registration patterns, using network analysis methods and unsupervised clustering to create catchments for these communities. METHOD: Patients resident in and attending GP practices in London were identified from Hospital Episode Statistics from 2017 to 2018. We used a series of novel methods for unsupervised graph clustering. A cosine similarity matrix was constructed representing similarities between each general practice to each other, based on registration of patients in each Lower Super Output Area (LSOA). Unsupervised graph partitioning using Markov Multiscale Community Detection was conducted to identify communities of general practices. Catchments were assigned to each PCN based on the majority attendance from an LSOA. RESULTS: In total 3 428 322 unique patients attended 1334 GPs in general practices LSOAs in London. The model grouped 1291 general practices (96.8%) and 4721 LSOAs (97.6%), into 165 mutually exclusive PCNs. The median PCN list size was 53 490 and a median of 70.1% of patients attended a general practice within their allocated PCN, ranging from 44.6% to 91.4%. CONCLUSION: With PCNs expected to take a role in population health management and with community providers expected to reconfigure around them, it is vital we recognise how PCNs represent their communities. This method may be used by policymakers to understand the populations and geography shared between networks.

Conference paper

Lamprinakou S, McCoy E, Barahona M, Gandy A, Flaxman S, Filippi Set al., 2020, BART-based inference for Poisson processes

The effectiveness of Bayesian Additive Regression Trees (BART) has beendemonstrated in a variety of contexts including non parametric regression andclassification. Here we introduce a BART scheme for estimating the intensity ofinhomogeneous Poisson Processes. Poisson intensity estimation is a vital taskin various applications including medical imaging, astrophysics and networktraffic analysis. Our approach enables full posterior inference of theintensity in a nonparametric regression setting. We demonstrate the performanceof our scheme through simulation studies on synthetic and real datasets in oneand two dimensions, and compare our approach to alternative approaches.

Journal article

Laumann F, Kügelgen JV, Barahona M, 2020, Non-linear interlinkages and key objectives amongst the Paris Agreement and the Sustainable Development Goals

The United Nations' ambitions to combat climate change and prosper humandevelopment are manifested in the Paris Agreement and the SustainableDevelopment Goals (SDGs), respectively. These are inherently inter-linked asprogress towards some of these objectives may accelerate or hinder progresstowards others. We investigate how these two agendas influence each other bydefining networks of 18 nodes, consisting of the 17 SDGs and climate change,for various groupings of countries. We compute a non-linear measure ofconditional dependence, the partial distance correlation, given any subset ofthe remaining 16 variables. These correlations are treated as weights on edges,and weighted eigenvector centralities are calculated to determine the mostimportant nodes. We find that SDG 6, clean water and sanitation, and SDG 4,quality education, are most central across nearly all groupings of countries.In developing regions, SDG 17, partnerships for the goals, is stronglyconnected to the progress of other objectives in the two agendas whilst,somewhat surprisingly, SDG 8, decent work and economic growth, is not asimportant in terms of eigenvector centrality.

Journal article

Gosztolai A, Barahona M, 2020, Cellular memory enhances bacterial chemotactic navigation in rugged environments, Communications Physics, Vol: 3, ISSN: 2399-3650

The response of microbes to external signals is mediated by biochemical networks with intrinsic time scales. These time scales give rise to a memory that impacts cellular behaviour. Here we study theoretically the role of cellular memory in Escherichia coli chemotaxis. Using an agent-based model, we show that cells with memory navigating rugged chemoattractant landscapes can enhance their drift speed by extracting information from environmental correlations. Maximal advantage is achieved when the memory is comparable to the time scale of fluctuations as perceived during swimming. We derive an analytical approximation for the drift velocity in rugged landscapes that explains the enhanced velocity, and recovers standard Keller–Segel gradient-sensing results in the limits when memory and fluctuation time scales are well separated. Our numerics also show that cellular memory can induce bet-hedging at the population level resulting in long-lived, multi-modal distributions in heterogeneous landscapes.

Journal article

Peach RL, Arnaudon A, Barahona M, 2020, Semi-supervised classification on graphs using explicit diffusion dynamics, Foundations of Data Science, Vol: 2, Pages: 19-33, ISSN: 2639-8001

Classification tasks based on feature vectors can be significantly improved by including within deep learning a graph that summarises pairwise relationships between the samples. Intuitively, the graph acts as a conduit to channel and bias the inference of class labels. Here, we study classification methods that consider the graph as the originator of an explicit graph diffusion. We show that appending graph diffusion to feature-based learning as a posteriori refinement achieves state-of-the-art classification accuracy. This method, which we call Graph Diffusion Reclassification (GDR), uses overshooting events of a diffusive graph dynamics to reclassify individual nodes. The method uses intrinsic measures of node influence, which are distinct for each node, and allows the evaluation of the relationship and importance of features and graph for classification. We also present diff-GCN, a simple extension of Graph Convolutional Neural Network (GCN) architectures that leverages explicit diffusion dynamics, and allows the natural use of directed graphs. To showcase our methods, we use benchmark datasets of documents with associated citation data.

Journal article

Peach RL, Saman D, Yaliraki SN, Klug DR, Ying L, Willison KR, Barahona Met al., 2020, Unsupervised Graph-Based Learning Predicts Mutations That Alter Protein Dynamics

<jats:title>A<jats:sc>bstract</jats:sc></jats:title><jats:p>Proteins exhibit complex dynamics across a vast range of time and length scales, from the atomistic to the conformational. Adenylate kinase (ADK) showcases the biological relevance of such inherently coupled dynamics across scales: single mutations can affect large-scale protein motions and enzymatic activity. Here we present a combined computational and experimental study of multiscale structure and dynamics in proteins, using ADK as our system of choice. We show how a computationally efficient method for unsupervised graph partitioning can be applied to atomistic graphs derived from protein structures to reveal intrinsic, biochemically relevant substructures at all scales, without re-parameterisation or<jats:italic>a priori</jats:italic>coarse-graining. We subsequently perform full alanine and arginine<jats:italic>in silico</jats:italic>mutagenesis scans of the protein, and score all mutations according to the disruption they induce on the large-scale organisation. We use our calculations to guide Förster Resonance Energy Transfer (FRET) experiments on ADK, and show that mutating residue D152 to alanine or residue V164 to arginine induce a large dynamical shift of the protein structure towards a closed state, in accordance with our predictions. Our computations also predict a graded effect of different mutations at the D152 site as a result of increased coherence between the core and binding domains, an effect confirmed quantitatively through a high correlation (<jats:italic>R</jats:italic><jats:sup>2</jats:sup>= 0.93) with the FRET ratio between closed and open populations measured on six mutants.</jats:p>

Journal article

Greenbury S, Barahona M, Johnston I, 2020, HyperTraPS: Inferring probabilistic patterns of trait acquisition in evolutionary and disease progression pathways, Cell Systems, Vol: 10, Pages: 39-51, ISSN: 2405-4712

The explosion of data throughout the biomedical sciences provides unprecedented opportunities to learn about the dynamics of evolution and disease progression, but harnessing these large and diverse datasets remains challenging. Here, we describe a highly generalisable statistical platform to infer the dynamic pathways by which many, potentially interacting, discrete traits are acquired or lost over time in biomedical systems. The platform uses HyperTraPS (hypercubic transition path sampling) to learn progression pathways from cross-sectional, longitudinal, or phylogenetically-linked data with unprecedented efficiency, readily distinguishing multiple competing pathways, and identifying the most parsimonious mechanisms underlying given observations. Its Bayesian structure quantifies uncertainty in pathway structure and allows interpretable predictions of behaviours, such as which symptom a patient will acquire next. We exploit the model’s topology to provide visualisation tools for intuitive assessment of multiple, variable pathways. We apply the method to ovarian cancer progression and the evolution of multidrug resistance in tuberculosis, demonstrating its power to reveal previously undetected dynamic pathways.

Journal article

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00333972&limit=30&person=true