Publications

Journal article

Hoffmann T, Peel L, Lambiotte R, Jones Net al., 2020,

Community detection in networks without observing edges

, Science Advances, Vol: 6, ISSN: 2375-2548

We develop a Bayesian hierarchical model to identify communities of time series. Fitting the model provides an end-to-end community detection algorithmthat does not extract information as a sequence of point estimates but propagates uncertainties from the raw data to the community labels. Our approachnaturally supports multiscale community detection as well as the selection ofan optimal scale using model comparison. We study the properties of the algorithm using synthetic data and apply it to daily returns of constituents of theS&P100 index as well as climate data from US cities.

Journal article

Liu Z, Barahona M, 2020,

Graph-based data clustering via multiscale community detection

, Applied Network Science, Vol: 5, Pages: 1-20, ISSN: 2364-8228

We present a graph-theoretical approach to data clustering, which combines the creation of a graph from the data with Markov Stability, a multiscale community detection framework. We show how the multiscale capabilities of the method allow the estimation of the number of clusters, as well as alleviating the sensitivity to the parameters in graph construction. We use both synthetic and benchmark real datasets to compare and evaluate several graph construction methods and clustering algorithms, and show that multiscale graph-based clustering achieves improved performance compared to popular clustering methods without the need to set externally the number of clusters.

Journal article

Tonn MK, Thomas P, Barahona M, Oyarzún DAet al., 2020,

Computation of Single-Cell Metabolite Distributions Using Mixture Models.

, Front Cell Dev Biol, Vol: 8, ISSN: 2296-634X

Metabolic heterogeneity is widely recognized as the next challenge in our understanding of non-genetic variation. A growing body of evidence suggests that metabolic heterogeneity may result from the inherent stochasticity of intracellular events. However, metabolism has been traditionally viewed as a purely deterministic process, on the basis that highly abundant metabolites tend to filter out stochastic phenomena. Here we bridge this gap with a general method for prediction of metabolite distributions across single cells. By exploiting the separation of time scales between enzyme expression and enzyme kinetics, our method produces estimates for metabolite distributions without the lengthy stochastic simulations that would be typically required for large metabolic models. The metabolite distributions take the form of Gaussian mixture models that are directly computable from single-cell expression data and standard deterministic models for metabolic pathways. The proposed mixture models provide a systematic method to predict the impact of biochemical parameters on metabolite distributions. Our method lays the groundwork for identifying the molecular processes that shape metabolic heterogeneity and its functional implications in disease.

Journal article

Hodges M, Yaliraki SN, Barahona M, 2019,

Edge-based formulation of elastic network models

, Physical Review Research, Pages: 033211-033211

We present an edge-based framework for the study of geometric elastic networkmodels to model mechanical interactions in physical systems. We use aformulation in the edge space, instead of the usual node-centric approach, tocharacterise edge fluctuations of geometric networks defined in d- dimensionalspace and define the edge mechanical embeddedness, an edge mechanicalsusceptibility measuring the force felt on each edge given a force applied onthe whole system. We further show that this formulation can be directly relatedto the infinitesimal rigidity of the network, which additionally permits three-and four-centre forces to be included in the network description. We exemplifythe approach in protein systems, at both the residue and atomistic levels ofdescription.

Journal article

McGrath T, Spreckley E, Rodriguez A, Viscomi C, Alamshah A, Akalestou E, Murphy K, Jones Net al., 2019,

The homeostatic dynamics of feeding behaviour identify novel mechanisms of anorectic agents

, PLoS Biology, Vol: 17, Pages: 1-30, ISSN: 1544-9173

Better understanding of feeding behaviour will be vital in reducing obesity and metabolic syndrome, but we lack a standard model that capturesthe complexity of feeding behaviour. We construct an accurate stochasticmodel of rodent feeding at the bout level in order to perform quantitativebehavioural analysis. Analysing the different effects on feeding behaviour ofPYY3-36, lithium chloride, GLP-1 and leptin shows the precise behaviouralchanges caused by each anorectic agent. Our analysis demonstrates that thechanges in feeding behaviour evoked by the anorectic agents investigated donot mimic the behaviour of well-fed animals, and that the intermeal intervalis influenced by fullness. We show how robust homeostatic control of feedingthwarts attempts to reduce food intake, and how this might be overcome. Insilico experiments suggest that introducing a minimum intermeal interval ormodulating upper gut emptying can be as effective as anorectic drug administration.

Journal article

Latorre-Pellicer A, Lechuga-Vieco AV, Johnston IG, Hämäläinen RH, Pellico J, Justo-Méndez R, Fernández-Toro JM, Clavería C, Guaras A, Sierra R, Llop J, Torres M, Criado LM, Suomalainen A, Jones NS, Ruíz-Cabello J, Enríquez JAet al., 2019,

Regulation of mother-to-offspring transmission of mtDNA heteroplasmy

, Cell Metabolism, Vol: 30, Pages: 1120-1130.e5, ISSN: 1550-4131

mtDNA is present in multiple copies in each cell derived from the expansions of those in the oocyte. Heteroplasmy, more than one mtDNA variant, may be generated by mutagenesis, paternal mtDNA leakage, and novel medical technologies aiming to prevent inheritance of mtDNA-linked diseases. Heteroplasmy phenotypic impact remains poorly understood. Mouse studies led to contradictory models of random drift or haplotype selection for mother-to-offspring transmission of mtDNA heteroplasmy. Here, we show that mtDNA heteroplasmy affects embryo metabolism, cell fitness, and induced pluripotent stem cell (iPSC) generation. Thus, genetic and pharmacological interventions affecting oxidative phosphorylation (OXPHOS) modify competition among mtDNA haplotypes during oocyte development and/or at early embryonic stages. We show that heteroplasmy behavior can fall on a spectrum from random drift to strong selection, depending on mito-nuclear interactions and metabolic factors. Understanding heteroplasmy dynamics and its mechanisms provide novel knowledge of a fundamental biological process and enhance our ability to mitigate risks in clinical applications affecting mtDNA transmission.

Book chapter

Schaub MT, Delvenne J-C, Lambiotte R, Barahona Met al., 2019,

Structured networks and coarse-grained descriptions: a dynamical perspective

, Advances in Network Clustering and Blockmodeling, Editors: Doreian, Batagelj, Ferligoj, Publisher: John Wiley and Sons, Ltd, Pages: 333-361, ISBN: 9781119224709

This chapter discusses the interplay between structure and dynamics in complex networks. Given a particular network with an endowed dynamics, our goal is to find partitions aligned with the dynamical process acting on top of the network. We thus aim to gain a reduced description of the system that takes into account both its structure and dynamics. In the first part, we introduce the general mathematical setup for the types of dynamics we consider throughout the chapter. We provide two guiding examples, namely consensus dynamics and diffusion processes (random walks), motivating their connection to social network analysis, and provide a brief discussion on the general dynamical framework and its possible extensions. In the second part, we focus on the influence of graph structure on the dynamics taking place on the network, focusing on three concepts that allow us to gain insight into this notion. First, we describe how time scale separation can appear in the dynamics on a network as a consequence of graph structure. Second, we discuss how the presence of particular symmetries in the network give rise to invariant dynamical subspaces that can be precisely described by graph partitions. Third, we show how this dynamical viewpoint can be extended to study dynamics on networks with signed edges, which allow us to discuss connections to concepts in social network analysis, such as structural balance. In the third part, we discuss how to use dynamical processes unfolding on the network to detect meaningful network substructures. We then show how such dynamical measures can be related to seemingly different algorithm for community detection and coarse-graining proposed in the literature. We conclude with a brief summary and highlight interesting open future directions.

Journal article

Lubba CH, Sethi SS, Knaute P, Schultz SR, Fulcher BD, Jones NSet al., 2019,

catch22: CAnonical time-series CHaracteristics

, Data Mining and Knowledge Discovery, Vol: 33, Pages: 1821-1852, ISSN: 1384-5810

Capturing the dynamical properties of time series concisely as interpretable feature vectors can enable efficient clustering and classification for time-series applications across science and industry. Selecting an appropriate feature-based representation of time series for a given application can be achieved through systematic comparison across a comprehensive time-series feature library, such as those in the hctsa toolbox. However, this approach is computationally expensive and involves evaluating many similar features, limiting the widespread adoption of feature-based representations of time series for real-world applications. In this work, we introduce a method to infer small sets of time-series features that (i) exhibit strong classification performance across a given collection of time-series problems, and (ii) are minimally redundant. Applying our method to a set of 93 time-series classification datasets (containing over 147,000 time series) and using a filtered version of the hctsa feature library (4791 features), we introduce a set of 22 CAnonical Time-series CHaracteristics, catch22, tailored to the dynamics typically encountered in time-series data-mining tasks. This dimensionality reduction, from 4791 to 22, is associated with an approximately 1000-fold reduction in computation time and near linear scaling with time-series length, despite an average reduction in classification accuracy of just 7%. catch22 captures a diverse and interpretable signature of time series in terms of their properties, including linear and non-linear autocorrelation, successive differences, value distributions and outliers, and fluctuation scaling properties. We provide an efficient implementation of catch22, accessible from many programming environments, that facilitates feature-based time-series analysis for scientific, industrial, financial and medical applications using a common language of interpretable time-series properties.

Journal article

Peach R, Yaliraki S, Lefevre D, Barahona Met al., 2019,

Data-driven unsupervised clustering of online learner behaviour

, npj Science of Learning, Vol: 4, ISSN: 2056-7936

The widespread adoption of online courses opens opportunities for analysing learner behaviour and optimising web-based learning adapted to observed usage. Here we introduce a mathematical framework for the analysis of time series of online learner engagement, which allows the identification of clusters of learners with similar online temporal behaviour directly from the raw data without prescribing a priori subjective reference behaviours. The method uses a dynamic time warping kernel to create a pairwise similarity between time series of learner actions, and combines it with an unsupervised multiscale graph clustering algorithm to identify groups of learners with similar temporal behaviour. To showcase our approach, we analyse task completion data from a cohort of learners taking an online post-graduate degree at Imperial Business School. Our analysis reveals clusters of learners with statistically distinct patterns of engagement, from distributed to massed learning, with different levels of regularity, adherence to pre-planned course structure and task completion. The approach also reveals outlier learners with highly sporadic behaviour. A posteriori comparison against student performance shows that, whereas high performing learners are spread across clusters with diverse temporal engagement, low performers are located significantly in the massed learning cluster, and our unsupervised clustering identifies low performers more accurately than common machine learning classification methods trained on temporal statistics of the data. Finally, we test the applicability of the method by analysing two additional datasets: a different cohort of the same course, and time series of different format from another university.

Book chapter

Altuncu MT, Sorin E, Symons JD, Mayer E, Yaliraki SN, Toni F, Barahona Met al., 2019,

Extracting information from free text through unsupervised graph-based clustering: an application to patient incident records

The large volume of text in electronic healthcare records often remainsunderused due to a lack of methodologies to extract interpretable content. Herewe present an unsupervised framework for the analysis of free text thatcombines text-embedding with paragraph vectors and graph-theoretical multiscalecommunity detection. We analyse text from a corpus of patient incident reportsfrom the National Health Service in England to find content-based clusters ofreports in an unsupervised manner and at different levels of resolution. Ourunsupervised method extracts groups with high intrinsic textual consistency andcompares well against categories hand-coded by healthcare personnel. We alsoshow how to use our content-driven clusters to improve the supervisedprediction of the degree of harm of the incident based on the text of thereport. Finally, we discuss future directions to monitor reports over time, andto detect emerging trends outside pre-existing categories.

Imperial College London

Latest News

Biomathematics Group

Community detection in networks without observing edges

Graph-based data clustering via multiscale community detection

Computation of Single-Cell Metabolite Distributions Using Mixture Models.

Edge-based formulation of elastic network models

The homeostatic dynamics of feeding behaviour identify novel mechanisms of anorectic agents

Regulation of mother-to-offspring transmission of mtDNA heteroplasmy

Structured networks and coarse-grained descriptions: a dynamical perspective

catch22: CAnonical time-series CHaracteristics

Data-driven unsupervised clustering of online learner behaviour

Extracting information from free text through unsupervised graph-based clustering: an application to patient incident records

Publications

Search or filter publications

Filter by type:

Filter by year:

Results

Search results

Extracting information from free text through unsupervised graph-based clustering: an application to patient incident records