127 results found
Altuncu MT, Mayer E, Yaliraki SN, et al., 2018, From Text to Topics in Healthcare Records: An Unsupervised Graph Partitioning Methodology, 2018 KDD Conference Proceedings - MLMH: Machine Learning for Medicine and Healthcare
Electronic Healthcare Records contain large volumes of unstructured data,including extensive free text. Yet this source of detailed information oftenremains under-used because of a lack of methodologies to extract interpretablecontent in a timely manner. Here we apply network-theoretical tools to analysefree text in Hospital Patient Incident reports from the National HealthService, to find clusters of documents with similar content in an unsupervisedmanner at different levels of resolution. We combine deep neural networkparagraph vector text-embedding with multiscale Markov Stability communitydetection applied to a sparsified similarity graph of document vectors, andshowcase the approach on incident reports from Imperial College Healthcare NHSTrust, London. The multiscale community structure reveals different levels ofmeaning in the topics of the dataset, as shown by descriptive terms extractedfrom the clusters of records. We also compare a posteriori against hand-codedcategories assigned by healthcare personnel, and show that our approachoutperforms LDA-based models. Our content clusters exhibit good correspondencewith two levels of hand-coded categories, yet they also provide further medicaldetail in certain areas and reveal complementary descriptors of incidentsbeyond the external classification taxonomy.
Altuncu MT, Yaliraki SN, Barahona M, 2018, Content-driven, unsupervised clustering of news articles through multiscale graph partitioning, KDD 2018 - Workshop on Data Science Journalism & Media (DSJM)
The explosion in the amount of news and journalistic content being generatedacross the globe, coupled with extended and instantaneous access to informationthrough online media, makes it difficult and time-consuming to monitor newsdevelopments and opinion formation in real time. There is an increasing needfor tools that can pre-process, analyse and classify raw text to extractinterpretable content; specifically, identifying topics and content-drivengroupings of articles. We present here such a methodology that brings togetherpowerful vector embeddings from Natural Language Processing with tools fromGraph Theory that exploit diffusive dynamics on graphs to reveal naturalpartitions across scales. Our framework uses a recent deep neural network textanalysis methodology (Doc2vec) to represent text in vector form and thenapplies a multi-scale community detection method (Markov Stability) topartition a similarity graph of document vectors. The method allows us toobtain clusters of documents with similar content, at different levels ofresolution, in an unsupervised manner. We showcase our approach with theanalysis of a corpus of 9,000 news articles published by Vox Media over oneyear. Our results show consistent groupings of documents according to contentwithout a priori assumptions about the number or type of clusters to be found.The multilevel clustering reveals a quasi-hierarchy of topics and subtopicswith increased intelligibility and improved topic coherence as compared toexternal taxonomy services and standard topic detection methods.
Cells adapt their metabolic fluxes in response to changes in the environment. We present a framework for the systematic construction of flux-based graphs derived from organism-wide metabolic networks. Our graphs encode the directionality of metabolic flows via edges that represent the flow of metabolites from source to target reactions. The methodology can be applied in the absence of a specific biological context by modelling fluxes probabilistically, or can be tailored to different environmental conditions by incorporating flux distributions computed through constraint-based approaches such as Flux Balance Analysis. We illustrate our approach on the central carbon metabolism of Escherichia coli and on a metabolic model of human hepatocytes. The flux-dependent graphs under various environmental conditions and genetic perturbations exhibit systemic changes in their topological and community structure, which capture the re-routing of metabolic flows and the varying importance of specific reactions and pathways. By integrating constraint-based models and tools from network science, our framework allows the study of context-specific metabolic responses at a system level beyond standard pathway descriptions.
Hodges M, Barahona M, Yaliraki SN, 2018, Allostery and cooperativity in multimeric proteins: bond-to-bond propensities in ATCase, SCIENTIFIC REPORTS, Vol: 8, ISSN: 2045-2322
Kuntz J, Thomas P, Stan G-B, et al., 2018, The exit time finite state projection scheme: bounding exit distributions and occupation measures of continuous-time Markov chains
We introduce the exit time finite state projection (ETFSP) scheme, atruncation-based method that yields approximations to the exit distribution andoccupation measure associated with the time of exit from a domain (i.e., thetime of first passage to the complement of the domain) of a continuous-timeMarkov chain. We prove that: (i) the computed approximations bound the desiredmeasures from below; (ii) the total variation distances between theapproximations and the measures decrease monotonically as states are added tothe truncation; and (iii) the scheme converges in the sense that as thetruncation tends to the entire state space, the total variation distances tendto zero. Furthermore, we give a computable bound on the total variationdistance between the exit distribution and its approximation, and we delineatethe cases in which the bound is sharp. To establish the theoretical propertiesof the ETFSP scheme, we revisit the related finite state projection schemetaking a probabilistic viewpoint and proving the latter's convergence. Wedemonstrate the use of the ETFSP scheme by applying it to two biologicalexamples: the computation of the first passage time associated with theexpression of a gene, and the fixation times of competing species subject todemographic noise.
O'Clery N, Yuan Y, Stan G-B, et al., 2018, Global Network Prediction from Local Node Dynamics
The study of dynamical systems on networks, describing complex interactiveprocesses, provides insight into how network structure affects globalbehaviour. Yet many methods for network dynamics fail to cope with large orpartially-known networks, a ubiquitous situation in real-world applications.Here we propose a localised method, applicable to a broad class of dynamicalmodels on networks, whereby individual nodes monitor and store the evolution oftheir own state and use these values to approximate, via a simple computation,their own steady state solution. Hence the nodes predict their own final statewithout actually reaching it. Furthermore, the localised formulation enablesnodes to compute global network metrics without knowledge of the full networkstructure. The method can be used to compute global rankings in the networkfrom local information; to detect community detection from fast, localtransient dynamics; and to identify key nodes that compute global networkmetrics ahead of others. We illustrate some of the applications of thealgorithm by efficiently performing web-page ranking for a large internetnetwork and identifying the dynamic roles of inter-neurons in the C. Elegansneural network. The mathematical formulation is simple, widely applicable andeasily scalable to real-world datasets suggesting how local computation canprovide an approach to the study of large-scale network dynamics.
Schaub MT, Delvenne J-C, Lambiotte R, et al., 2018, Structured networks and coarse-grained descriptions: a dynamical perspective
This chapter discusses the interplay between structure and dynamics incomplex networks. Given a particular network with an endowed dynamics, our goalis to find partitions aligned with the dynamical process acting on top of thenetwork. We thus aim to gain a reduced description of the system that takesinto account both its structure and dynamics. In the first part, we introducethe general mathematical setup for the types of dynamics we consider throughoutthe chapter. We provide two guiding examples, namely consensus dynamics anddiffusion processes (random walks), motivating their connection to socialnetwork analysis, and provide a brief discussion on the general dynamicalframework and its possible extensions. In the second part, we focus on theinfluence of graph structure on the dynamics taking place on the network,focusing on three concepts that allow us to gain insight into this notion.First, we describe how time scale separation can appear in the dynamics on anetwork as a consequence of graph structure. Second, we discuss how thepresence of particular symmetries in the network give rise to invariantdynamical subspaces that can be precisely described by graph partitions. Third,we show how this dynamical viewpoint can be extended to study dynamics onnetworks with signed edges, which allow us to discuss connections to conceptsin social network analysis, such as structural balance. In the third part, wediscuss how to use dynamical processes unfolding on the network to detectmeaningful network substructures. We then show how such dynamical measures canbe related to seemingly different algorithm for community detection andcoarse-graining proposed in the literature. We conclude with a brief summaryand highlight interesting open future directions.
Schaub MT, Delvenne J-C, Lambiotte R, et al., 2018, Multiscale dynamical embeddings of complex networks
Complex systems and relational data are often abstracted as dynamicalprocesses on networks. To understand, predict and control their behavior acrucial step is to extract reduced descriptions of such networks. Inspired bynotions from Control Theory, here we propose a time-dependent dynamicalsimilarity measure between nodes, which quantifies the effect that a node inputhas on the network over time. This dynamical similarity induces an embeddingthat can be employed for several analysis tasks. Here we focus on (i)dimensionality reduction, by projecting nodes onto a low dimensional spacecapturing dynamic similarity at different time scales, and (ii) how to exploitour embeddings to uncover functional modules. We exemplify our ideas throughcase studies focusing on directed networks without strong connectivity, andsigned networks. We further highlight how several ideas from communitydetection can be generalized in terms of our embedding perspective and linkedto ideas from Control Theory.
Tomazou M, Barahona M, Polizzi KM, et al., 2018, Computational Re-design of Synthetic Genetic Oscillators for Independent Amplitude and Frequency Modulation, CELL SYSTEMS, Vol: 6, Pages: 508-+, ISSN: 2405-4712
Social media are being increasingly used for health promotion, yet thelandscape of users, messages and interactions in such fora is poorlyunderstood. Studies of social media and diabetes have focused mostly onpatients, or public agencies addressing it, but have not looked broadly at allthe participants or the diversity of content they contribute. We study Twitterconversations about diabetes through the systematic analysis of 2.5 milliontweets collected over 8 months and the interactions between their authors. Weaddress three questions: (1) what themes arise in these tweets?; (2) who arethe most influential users?; (3) which type of users contribute to whichthemes? We answer these questions using a mixed-methods approach, integratingtechniques from anthropology, network science and information retrieval such asthematic coding, temporal network analysis, and community and topic detection.Diabetes-related tweets fall within broad thematic groups: health information,news, social interaction, and commercial. At the same time, humorous messagesand references to popular culture appear consistently, more than any other typeof tweet. We classify authors according to their temporal 'hub' and 'authority'scores. Whereas the hub landscape is diffuse and fluid over time, topauthorities are highly persistent across time and comprise bloggers, advocacygroups and NGOs related to diabetes, as well as for-profit entities withoutspecific diabetes expertise. Top authorities fall into seven interestcommunities as derived from their Twitter follower network. Our findings haveimplications for public health professionals and policy makers who seek to usesocial media as an engagement tool and to inform policy design.
Branch T, Barahona M, Dodson CA, et al., 2017, Kinetic Analysis Reveals the Identity of A beta-Metal Complex Responsible for the Initial Aggregation of A beta in the Synapse, ACS CHEMICAL NEUROSCIENCE, Vol: 8, Pages: 1970-1979, ISSN: 1948-7193
Colijn C, Jones N, Johnston IG, et al., 2017, Toward Precision Healthcare: Context and Mathematical Challenges, FRONTIERS IN PHYSIOLOGY, Vol: 8, ISSN: 1664-042X
Dattani J, Barahona M, 2017, Stochastic models of gene transcription with upstream drives: Exact solution and sample path characterisation, Journal of the Royal Society Interface, Vol: 14, ISSN: 1742-5689
Gene transcription is a highly stochastic and dynamic process. As a result, the mRNA copynumber of a given gene is heterogeneous both between cells and across time. We present a frameworkto model gene transcription in populations of cells with time-varying (stochastic or deterministic)transcription and degradation rates. Such rates can be understood as upstream cellular drivesrepresenting the effect of different aspects of the cellular environment. We show that the full solutionof the master equation contains two components: a model-specific, upstream effective drive, whichencapsulates the effect of cellular drives (e.g., entrainment, periodicity or promoter randomness),and a downstream transcriptional Poissonian part, which is common to all models. Our analyticalframework treats cell-to-cell and dynamic variability consistently, unifying several approaches in theliterature. We apply the obtained solution to characterise different models of experimental relevance,and to explain the influence on gene transcription of synchrony, stationarity, ergodicity, as well asthe effect of time-scales and other dynamic characteristics of drives. We also show how the solutioncan be applied to the analysis of noise sources in single-cell data, and to reduce the computationalcost of stochastic simulations.
Gosztolai A, Schumacher J, Behrends V, et al., 2017, GlnK Facilitates the Dynamic Regulation of Bacterial Nitrogen Assimilation., Biophysical journal, Vol: 112, Pages: 2219-2230, ISSN: 0006-3495
Ammonium assimilation in Escherichia coli is regulated by two paralogous proteins (GlnB and GlnK), which orchestrate interactions with regulators of gene expression, transport proteins, and metabolic pathways. Yet how they conjointly modulate the activity of glutamine synthetase, the key enzyme for nitrogen assimilation, is poorly understood. We combine experiments and theory to study the dynamic roles of GlnB and GlnK during nitrogen starvation and upshift. We measure time-resolved in vivo concentrations of metabolites, total and posttranslationally modified proteins, and develop a concise biochemical model of GlnB and GlnK that incorporates competition for active and allosteric sites, as well as functional sequestration of GlnK. The model predicts the responses of glutamine synthetase, GlnB, and GlnK under time-varying external ammonium level in the wild-type and two genetic knock-outs. Our results show that GlnK is tightly regulated under nitrogen-rich conditions, yet it is expressed during ammonium run-out and starvation. This suggests a role for GlnK as a buffer of nitrogen shock after starvation, and provides a further functional link between nitrogen and carbon metabolisms.
Kiselev VY, Kirschner K, Schaub MT, et al., 2017, SC3: consensus clustering of single-cell RNA-seq data, NATURE METHODS, Vol: 14, Pages: 483-+, ISSN: 1548-7091
Kuntz J, Thomas P, Stan G-B, et al., 2017, Rigorous bounds on the stationary distributions of the chemical master equation via mathematical programming
The stochastic dynamics of networks of biochemical reactions in living cellsare typically modelled using chemical master equations (CMEs). The stationarydistributions of CMEs are seldom solvable analytically, and few methods existthat yield numerical estimates with computable error bounds. Here, we presenttwo such methods based on mathematical programming techniques. First, we usesemidefinite programming to obtain increasingly tighter upper and lower boundson the moments of the stationary distribution for networks with rationalpropensities. Second, we employ linear programming to compute convergent upperand lower bounds on the stationary distributions themselves. The boundsobtained provide a computational test for the uniqueness of the stationarydistribution. In the unique case, the bounds collectively form an approximationof the stationary distribution accompanied with a computable $\ell^1$-errorbound. In the non-unique case, we explain how to adapt our approach so that ityields approximations of the ergodic distributions, also accompanied withcomputable error bounds. We illustrate our methodology through two biologicalexamples: Schl\"ogl's model and a toggle switch model.
Liu Z, Barahona M, 2017, Geometric multiscale community detection: Markov stability and vector partitioning, Journal of Complex Networks, Vol: 6, Pages: 157-172, ISSN: 2051-1329
Multiscale community detection can be viewed from a dynamical perspective within the Markov stability framework, which uses the diffusion of a Markov process on the graph to uncover intrinsic network substructures across all scales. Here we reformulate multiscale community detection as a max-sum length vector partitioning problem with respect to the set of time-dependent node vectors expressed in terms of eigenvectors of the transition matrix. This formulation provides a geometric interpretation of Markov stability in terms of a time-dependent spectral embedding, where the Markov time acts as an inhomogeneous geometric resolution factor that zooms the components of the node vectors at different rates. Our geometric formulation encompasses both modularity and the multi-resolution Potts model, which are shown to correspond to vector partitioning in a pseudo-Euclidean space, and is also linked to spectral partitioning methods, where the number of eigenvectors used corresponds to the dimensionality of the underlying embedding vector space. Inspired by the Louvain optimization for community detection, we then propose an algorithm based on a graph-theoretical heuristic for the vector partitioning problem. We apply the algorithm to the spectral optimization of modularity and Markov stability community detection. The spectral embedding based on the transition matrix eigenvectors leads to improved partitions with higher information content and higher modularity than the eigen-decomposition of the modularity matrix. We illustrate the results with random network benchmarks.
Vangelov B, Barahona M, 2017, Modelling the Dynamics of Biological Systems with the Geometric Hidden Markov Model
Many biological processes can be described geometrically in a simple way: stem cell differentiation can be represented as a branching tree and cell division can be depicted as a cycle. In this paper we introduce the geometric hidden Markov model (GHMM), a dynamical model whose goal is to capture the low-dimensional characteristics of biological processes from multivariate time series data. The framework integrates a graph-theoretical algorithm for dimensionality reduction with a latent variable model for sequential data. We analyzed time series data generated by an in silico model of a biomolecular circuit, the represillator. The trained model has a simple structure: the latent Markov chain corresponds to a two-dimensional lattice. We show that the short-term and long-term predictions of the GHMM reflect the oscillatory behaviour of the genetic circuit. Analysis of the inferred model with a community detection methods leads to a coarse-grained representation of the process.
Amor B, Vuik S, Callahan R, et al., 2016, Community detection and role identification in directed networks: understanding the Twitter network of the care.data debate, Dynamic Networks and Cyber-Security, Editors: Adams, Heard, Publisher: World Scientific, Pages: 111-136, ISBN: 978-1-60558752-3
With the rise of social media as an important channel for the debate anddiscussion of public affairs, online social networks such as Twitter havebecome important platforms for public information and engagement by policymakers. To communicate effectively through Twitter, policy makers need tounderstand how influence and interest propagate within its network of users. Inthis chapter we use graph-theoretic methods to analyse the Twitter debatesurrounding NHS England's controversial care.data scheme. Directionality is acrucial feature of the Twitter social graph - information flows from thefollowed to the followers - but is often ignored in social network analyses;our methods are based on the behaviour of dynamic processes on the network andcan be applied naturally to directed networks. We uncover robust communities ofusers and show that these communities reflect how information flows through theTwitter network. We are also able to classify users by their differing roles indirecting the flow of information through the network. Our methods and resultswill be useful to policy makers who would like to use Twitter effectively as acommunication medium.
Amor BRC, Schaub MT, Yaliraki S, et al., 2016, Prediction of allosteric sites and mediating interactions through bond-to-bond propensities, Nature Communications, Vol: 7, ISSN: 2041-1723
Allostery is a fundamental mechanism of biological regulation, in which binding of a molecule at a distant location affects the active site of a protein. Allosteric sites provide targets to fine-tune protein activity, yet we lack computational methodologies to predict them. Here we present an efficient graph-theoretical framework to reveal allosteric interactions (atoms and communication pathways strongly coupled to the active site) without a priori information of their location. Using an atomistic graph with energy-weighted covalent and weak bonds, we define a bond-to-bond propensity quantifying the non-local effect of instantaneous bond fluctuations propagating through the protein. Significant interactions are then identified using quantile regression. We exemplify our method with three biologically important proteins: caspase-1, CheY, and h-Ras, correctly predicting key allosteric interactions, whose significance is additionally confirmed against a reference set of 100 proteins. The almost-linear scaling of our method renders it suitable for high-throughput searches for candidate allosteric sites.
Bacik KA, Schaub MT, Beguerisse-Diaz M, et al., 2016, Flow-Based Network Analysis of the Caenorhabditis elegans Connectome, PLOS Computational Biology, Vol: 12, ISSN: 1553-734X
We exploit flow propagation on the directed neuronal network of the nematode C. elegans to reveal dynamically relevant features of its connectome. We find flow-based groupings of neurons at different levels of granularity, which we relate to functional and anatomical constituents of its nervous system. A systematic in silico evaluation of the full set of single and double neuron ablations is used to identify deletions that induce the most severe disruptions of the multi-resolution flow structure. Such ablations are linked to functionally relevant neurons, and suggest potential candidates for further in vivo investigation. In addition, we use the directional patterns of incoming and outgoing network flows at all scales to identify flow profiles for the neurons in the connectome, without pre-imposing a priori categories. The four flow roles identified are linked to signal propagation motivated by biological input-response scenarios.
Beguerisse Diaz M, Desikan R, Barahona M, 2016, Linear models of activation cascades: analytical solutions and coarse-graining of delayed signal transduction, Journal of the Royal Society Interface, Vol: 13, ISSN: 1742-5689
Cellular signal transduction usually involves activation cascades, the sequential activation of a series of proteins following the reception of an input signal. Here we study the classic model of weakly activated cascades and obtain analytical solutions for a variety of inputs. We show that in the special but important case of optimal-gain cascades (i.e., when the deactivation rates are identical) the downstream output of the cascade can be represented exactly as a lumped nonlinear module containing an incomplete gamma function with real parameters that depend on the rates and length of the cascade, as well as parameters of the input signal. The expressions obtained can be applied to the non-identical case when the deactivation rates are random to capture the variability in the cascade outputs. We also show that cascades can be rearranged so that blocks with similar rates can be lumped and represented through our nonlinear modules. Our results can be used both to represent cascades in computational models of differential equations and to fit data efficiently, by reducing the number of equations and parameters involved. In particular, the length of the cascade appears as a real-valued parameter and can thus be fitted in the same manner as Hill coefficients. Finally, we show how the obtained nonlinear modules can be used instead of delay differential equations to model delays in signal transduction.
Georgiou PS, Yaliraki SN, Drakakis EM, et al., 2016, Window functions and sigmoidal behaviour of memristive systems, INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, Vol: 44, Pages: 1685-1696, ISSN: 0098-9886
Kuntz J, Ottobre M, Stan G-B, et al., 2016, BOUNDING STATIONARY AVERAGES OF POLYNOMIAL DIFFUSIONS VIA SEMIDEFINITE PROGRAMMING, SIAM JOURNAL ON SCIENTIFIC COMPUTING, Vol: 38, Pages: A3891-A3920, ISSN: 1064-8275
Schaub MT, O'Clery N, Billeh YN, et al., 2016, Graph partitions and cluster synchronization in networks of oscillators, CHAOS, Vol: 26, ISSN: 1054-1500
Branch T, Barahona M, Ying L, 2015, Secondary Metal Binding to Amyloid-Beta Monomer is Insignificant under Synaptic Conditions, 59th Annual Meeting of the Biophysical-Society, Publisher: CELL PRESS, Pages: 385A-385A, ISSN: 0006-3495
Branch T, Girvan P, Barahona M, et al., 2015, Introduction of a Fluorescent Probe to Amyloid-beta to Reveal Kinetic Insights into Its Interactions with Copper(II), ANGEWANDTE CHEMIE-INTERNATIONAL EDITION, Vol: 54, Pages: 1227-1230, ISSN: 1433-7851
Branch T, Girvan P, Barahona M, et al., 2015, Kinetics of amyloid-beta/metal ions interactions in the synaptic cleft: experiment and simulation, 10th European-Biophysical-Societies-Association (EBSA) European Biophysics Congress, Publisher: SPRINGER, Pages: S230-S230, ISSN: 0175-7571
Noseda M, Harada M, McSweeney S, et al., 2015, PDGFRα demarcates the cardiogenic clonogenic Sca1(+) stem/progenitor cell in adult murine myocardium., Nature Communications, Vol: 6, Pages: 6930-6930, ISSN: 2041-1723
Cardiac progenitor/stem cells in adult hearts represent an attractive therapeutic target for heart regeneration, though (inter)-relationships among reported cells remain obscure. Using single-cell qRT-PCR and clonal analyses, here we define four subpopulations of cardiac progenitor/stem cells in adult mouse myocardium all sharing stem cell antigen-1 (Sca1), based on side population (SP) phenotype, PECAM-1 (CD31) and platelet-derived growth factor receptor-α (PDGFRα) expression. SP status predicts clonogenicity and cardiogenic gene expression (Gata4/6, Hand2 and Tbx5/20), properties segregating more specifically to PDGFRα(+) cells. Clonal progeny of single Sca1(+) SP cells show cardiomyocyte, endothelial and smooth muscle lineage potential after cardiac grafting, augmenting cardiac function although durable engraftment is rare. PDGFRα(-) cells are characterized by Kdr/Flk1, Cdh5, CD31 and lack of clonogenicity. PDGFRα(+)/CD31(-) cells derive from cells formerly expressing Mesp1, Nkx2-5, Isl1, Gata5 and Wt1, distinct from PDGFRα(-)/CD31(+) cells (Gata5 low; Flk1 and Tie2 high). Thus, PDGFRα demarcates the clonogenic cardiogenic Sca1(+) stem/progenitor cell.
Schaub MT, Billeh YN, Anastassiou CA, et al., 2015, Emergence of Slow-Switching Assemblies in Structured Neuronal Networks, PLOS COMPUTATIONAL BIOLOGY, Vol: 11, ISSN: 1553-734X
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.