30 results found
Wilson T, Vo DHT, Thorne T, 2023, Identifying Subpopulations of Cells in Single-Cell Transcriptomic Data: A Bayesian Mixture Modeling Approach to Zero Inflation of Counts., J Comput Biol, Vol: 30, Pages: 1059-1074
In the study of single-cell RNA-seq (scRNA-Seq) data, a key component of the analysis is to identify subpopulations of cells in the data. A variety of approaches to this have been considered, and although many machine learning-based methods have been developed, these rarely give an estimate of uncertainty in the cluster assignment. To allow for this, probabilistic models have been developed, but scRNA-Seq data exhibit a phenomenon known as dropout, whereby a large proportion of the observed read counts are zero. This poses challenges in developing probabilistic models that appropriately model the data. We develop a novel Dirichlet process mixture model that employs both a mixture at the cell level to model multiple populations of cells and a zero-inflated negative binomial mixture of counts at the transcript level. By taking a Bayesian approach, we are able to model the expression of genes within clusters, and to quantify uncertainty in cluster assignments. It is shown that this approach outperforms previous approaches that applied multinomial distributions to model scRNA-Seq counts and negative binomial models that do not take into account zero inflation. Applied to a publicly available data set of scRNA-Seq counts of multiple cell types from the mouse cortex and hippocampus, we demonstrate how our approach can be used to distinguish subpopulations of cells as clusters in the data, and to identify gene sets that are indicative of membership of a subpopulation.
Perryman R, Renziehausen A, Shaye H, et al., 2023, Inhibition of the angiotensin II type 2 receptor AT2R is a novel therapeutic strategy for glioblastoma, Proceedings of the National Academy of Sciences of USA, Vol: 119, ISSN: 0027-8424
Glioblastoma (GBM) is an aggressive malignant primary brain tumor with limited therapeutic options. We show that the angiotensin II (AngII) type 2 receptor (AT2R) is a novel therapeutic target for GBM and that AngII, endogenously produced in GBM cells, promotes proliferation through AT2R. We repurposed EMA401, an AT2R antagonist originally developed as a peripherally restricted analgesic, for GBM and showed that it inhibits the proliferation of AT2R-expressing GBM spheroids and blocks their invasiveness and angiogenic capacity. The crystal structure of AT2R bound to EMA401 was determined and revealed the receptor to be in an active-like conformation with helix-VIII blocking G protein or β-arrestin recruitment. The architecture and interactions of EMA401 in AT2R differ drastically from complexes of AT2R with other relevant compounds. To enhance central nervous system (CNS) penetration of EMA401, we exploited the crystal structure to design an angiopep-2 tethered EMA401 derivative, A3E. A3E exhibited enhanced CNS penetration, leading to reduced tumor volume, inhibition of proliferation and increased levels of apoptosis in an orthotopic xenograft model of GBM.
Thorne T, Kirk PDW, Harrington HA, 2022, Topological approximate Bayesian computation for parameter inference of an angiogenesis model, BIOINFORMATICS, Vol: 38, Pages: 2529-2535, ISSN: 1367-4803
<jats:title>Abstract</jats:title><jats:p>In the study of single cell RNA-seq data, a key component of the analysis is to identify sub-populations of cells in the data. A variety of approaches to this have been considered, and although many machine learning based methods have been developed, these rarely give an estimate of uncertainty in the cluster assignment. To allow for this probabilistic models have been developed, but single cell RNA-seq data exhibit a phenomenon known as dropout, whereby a large proportion of the observed read counts are zero. This poses challenges in developing probabilistic models that appropriately model the data. We develop a novel Dirichlet process mixture model which employs both a mixture at the cell level to model multiple populations of cells, and a zero-inflated negative binomial mixture of counts at the transcript level. By taking a Bayesian approach we are able to model the expression of genes within clusters, and to quantify uncertainty in cluster assignments. It is shown that this approach out-performs previous approaches that applied multinomial distributions to model single cell RNA-seq counts and negative binomial models that do not take into account zero-inflation. Applied to a publicly available data set of single cell RNA-seq counts of multiple cell types from the mouse cortex and hippocampus, we demonstrate how our approach can be used to distinguish sub-populations of cells as clusters in the data, and to identify gene sets that are indicative of membership of a sub-population. The methodology is implemented as an open source Snakemake pipeline available from<jats:ext-link xmlns:xlink="http://www.w3.org/1999/xlink" ext-link-type="uri" xlink:href="https://github.com/tt104/scmixture">https://github.com/tt104/scmixture</jats:ext-link>.</jats:p>
Babtie AC, Stumpf MPH, Thorne T, 2020, Gene Regulatory Network Inference, Systems Medicine: Integrative, Qualitative and Computational Approaches: Volume 1-3, Pages: 86-95, ISBN: 9780128160787
Transcriptomic data quantifying gene expression states for single cells or cell populations at a genomic level is now readily available. Changes in transcriptional state during cell development and function are governed by gene regulatory networks, comprising a collection of genes and regulatory interactions between these genes (or gene products). Network inference algorithms aim to infer functional interactions between genes from experimentally observed expression profiles, and identify the structure of the underlying regulatory networks. Here we describe popular classes of network inference algorithms, highlighting their respective strengths and weaknesses, along with some general challenges faced by these methods. Analyzing inferred network structures can provide insight into the genes, transcriptional changes, and regulatory interactions that play key roles in biological and disease-related processes of interest.
Liang H, Ganeshbabu U, Thorne T, 2020, A Dynamic Bayesian Network Approach for Analysing Topic-Sentiment Evolution, IEEE ACCESS, Vol: 8, Pages: 54164-54174, ISSN: 2169-3536
Gafson AR, Savva C, Thorne T, et al., 2019, Breaking the cycle: reversal of flux in the tricarboxylic acid cycle by dimethyl fumarate, Neurology, Neuroimmunology and Neuroinflammation, Vol: 6, ISSN: 2332-7812
ObjectiveTo infer possible molecular effectors of therapeutic effects and adverse events for the pro-drug dimethyl fumarate (DMF, Tecfidera) in the plasma of relapsing-remitting MS patients (RRMS) based on untargeted blood plasma metabolomics. MethodsBlood samples were collected from 27 RRMS patients at baseline and six weeks after initiation of treatment with DMF (BG-12; Tecfidera). Patients were separated into a discovery (n=15) and a validation cohort (n=12). Ten healthy controls were also recruited and blood samples were collected over the same time intervals. Untargeted metabolomic profiling using ultrahigh performance liquid chromatography-tandem mass spectrometry (UPLC-MS) was performed on plasma samples from the discovery cohort and healthy controls at Metabolon Inc. (Durham, NC). UPLC-MS was then performed on samples from the validation cohort at the National Phenome Centre (Imperial College, UK). Plasma neurofilament concentration (NfL) was also assayed for all subjects using the Simoa platform (Quanterix, Lexington, MA). Time course and cross-sectional statistical analyses were performed to identify pharmacodynamic changes in the metabolome secondary to DMF and relate these to adverse events. Results In the discovery cohort, tricarboxylic acid (TCA) cycle intermediates fumarate and succinate and TCA cycle metabolites succinyl-carnitine and methyl succinyl-carnitine were increased 6-weeks after the start of treatment (q < 0.05). We confirmed that methyl succinyl carnitine was also increased in the validation cohort 6-weeks after the start of treatment (q < 0.05). Changes in concentrations of these metabolites were not seen over a similar time period in blood from the untreated healthy control population. Increased succinyl-carnitine and methyl succinyl-carnitine were associated with adverse events from DMF (flushing, abdominal symptoms. The mean plasma NfL concentration before treatment was higher in the RRMS patients than in the healthy contro
Gafson AR, Thorne T, McKechnie CIJ, et al., 2018, Lipoprotein markers associated with disability from multiple sclerosis, Scientific Reports, Vol: 8, ISSN: 2045-2322
Altered lipid metabolism is a feature of chronic infammatory disorders. Increased plasma lipids andlipoproteins have been associated with multiple sclerosis (MS) disease activity. Our objective was tocharacterise the specifc lipids and associated plasma lipoproteins increased in MS and to test for anassociation with disability. Plasma samples were collected from 27 RRMS patients (median EDSS,1.5, range 1–7) and 31 healthy controls. Concentrations of lipids within lipoprotein sub-classes weredetermined from NMR spectra. Plasma cytokines were measured using the MesoScale DiscoveryV-PLEX kit. Associations were tested using multivariate linear regression. Diferences between thepatient and volunteer groups were found for lipids within VLDL and HDL lipoprotein sub-fractions(p<0.05). Multivariate regression demonstrated a high correlation between lipids within VLDLsub-classes and the Expanded Disability Status Scale (EDSS) (p<0.05). An optimal model for EDSSincluded free cholesterol carried by VLDL-2, gender and age (R2=0.38, p<0.05). Free cholesterolcarried by VLDL-2 was highly correlated with plasma cytokines CCL-17 and IL-7 (R2=0.78, p<0.0001).These results highlight relationships between disability, infammatory responses and systemic lipidmetabolism in RRMS. Altered lipid metabolism with systemic infammation may contribute to immuneactivation.
Thorne T, 2018, Approximate inference of gene regulatory network models from RNA-Seq time series data, BMC BIOINFORMATICS, Vol: 19, ISSN: 1471-2105
BackgroundInference of gene regulatory network structures from RNA-Seq data is challenging due to the nature of the data, as measurements take the form of counts of reads mapped to a given gene. Here we present a model for RNA-Seq time series data that applies a negative binomial distribution for the observations, and uses sparse regression with a horseshoe prior to learn a dynamic Bayesian network of interactions between genes. We use a variational inference scheme to learn approximate posterior distributions for the model parameters.ResultsThe methodology is benchmarked on synthetic data designed to replicate the distribution of real world RNA-Seq data. We compare our method to other sparse regression approaches and find improved performance in learning directed networks. We demonstrate an application of our method to a publicly available human neuronal stem cell differentiation RNA-Seq time series data set to infer the underlying network structure.ConclusionsOur method is able to improve performance on synthetic data by explicitly modelling the statistical distribution of the data when learning networks from RNA-Seq time series. Applying approximate inference techniques we can learn network structures quickly with only moderate computing resources.
Thorne TW, 2016, NetDiff – Bayesian model selection for differential gene regulatory network inference, Scientific Reports, Vol: 6, ISSN: 2045-2322
Differential networks allow us to better understand the changes in cellular processes that are exhibited in conditions of interest, identifying variations in gene regulation or protein interaction between, for example, cases and controls, or in response to external stimuli. Here we present a novel methodology for the inference of differential gene regulatory networks from gene expression microarray data. Specifically we apply a Bayesian model selection approach to compare models of conserved and varying network structure, and use Gaussian graphical models to represent the network structures. We apply a variational inference approach to the learning of Gaussian graphical models of gene regulatory networks, that enables us to perform Bayesian model selection that is significantly more computationally efficient than Markov Chain Monte Carlo approaches. Our method is demonstrated to be more robust than independent analysis of data from multiple conditions when applied to synthetic network data, generating fewer false positive predictions of differential edges. We demonstrate the utility of our approach on real world gene expression microarray data by applying it to existing data from amyotrophic lateral sclerosis cases with and without mutations in C9orf72, and controls, where we are able to identify differential network interactions for further investigation.
Thorne T, 2015, Empirical likelihood tests for nonparametric detection of differential expression from RNA-seq data, STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, Vol: 14, Pages: 575-583, ISSN: 2194-6302
Zurauskiene J, Kirk P, Thorne T, et al., 2014, Bayesian non-parametric approaches to reconstructing oscillatory systems and the Nyquist limit, PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, Vol: 407, Pages: 33-42, ISSN: 0378-4371
Zurauskiene J, Kirk P, Thorne T, et al., 2014, Derivative processes for modelling metabolic fluxes, Bioinformatics, Vol: 30, Pages: 1892-1898, ISSN: 1367-4803
Motivation: One of the challenging questions in modelling biological systems is to characterize the functional forms of the processes that control and orchestrate molecular and cellular phenotypes. Recently proposed methods for the analysis of metabolic pathways, for example, dynamic flux estimation, can only provide estimates of the underlying fluxes at discrete time points but fail to capture the complete temporal behaviour. To describe the dynamic variation of the fluxes, we additionally require the assumption of specific functional forms that can capture the temporal behaviour. However, it also remains unclear how to address the noise which might be present in experimentally measured metabolite concentrations.Results: Here we propose a novel approach to modelling metabolic fluxes: derivative processes that are based on multiple-output Gaussian processes (MGPs), which are a flexible non-parametric Bayesian modelling technique. The main advantages that follow from MGPs approach include the natural non-parametric representation of the fluxes and ability to impute the missing data in between the measurements. Our derivative process approach allows us to model changes in metabolite derivative concentrations and to characterize the temporal behaviour of metabolic fluxes from time course data. Because the derivative of a Gaussian process is itself a Gaussian process, we can readily link metabolite concentrations to metabolic fluxes and vice versa. Here we discuss how this can be implemented in an MGP framework and illustrate its application to simple models, including nitrogen metabolism in Escherichia coli.
Kirk P, Thorne T, Stumpf MPH, 2013, Model selection in systems and synthetic biology, CURRENT OPINION IN BIOTECHNOLOGY, Vol: 24, Pages: 767-774, ISSN: 0958-1669
Thorne T, Fratta P, Hanna MG, et al., 2013, Graphical modelling of molecular networks underlying sporadic inclusion body myositis, MOLECULAR BIOSYSTEMS, Vol: 9, Pages: 1736-1742, ISSN: 1742-206X
Thorne T, Stumpf MPH, 2012, Inference of temporally varying Bayesian Networks, BIOINFORMATICS, Vol: 28, Pages: 3298-3305, ISSN: 1367-4803
Barnes CP, Filippi S, Stumpf MPH, et al., 2012, Considerate approaches to constructing summary statistics for ABC model selection, STATISTICS AND COMPUTING, Vol: 22, Pages: 1181-1197, ISSN: 0960-3174
Thorne T, Stumpf MPH, 2012, Graph spectral analysis of protein interaction network evolution, JOURNAL OF THE ROYAL SOCIETY INTERFACE, Vol: 9, Pages: 2653-2666, ISSN: 1742-5689
Kaloriti D, Tillmann A, Cook E, et al., 2012, Combinatorial stresses kill pathogenic <i>Candida</i> species, MEDICAL MYCOLOGY, Vol: 50, Pages: 699-709, ISSN: 1369-3786
Harrington HA, Ho KL, Thorne T, et al., 2012, Parameter-free model discrimination criterion based on steady-state coplanarity, PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, Vol: 109, Pages: 15746-15751, ISSN: 0027-8424
You T, Ingram P, Jacobsen MD, et al., 2012, A systems biology analysis of long and short-term memories of osmotic stress adaptation in fungi., BMC Res Notes, Vol: 5
BACKGROUND: Saccharomyces cerevisiae senses hyperosmotic conditions via the HOG signaling network that activates the stress-activated protein kinase, Hog1, and modulates metabolic fluxes and gene expression to generate appropriate adaptive responses. The integral control mechanism by which Hog1 modulates glycerol production remains uncharacterized. An additional Hog1-independent mechanism retains intracellular glycerol for adaptation. Candida albicans also adapts to hyperosmolarity via a HOG signaling network. However, it remains unknown whether Hog1 exerts integral or proportional control over glycerol production in C. albicans. RESULTS: We combined modeling and experimental approaches to study osmotic stress responses in S. cerevisiae and C. albicans. We propose a simple ordinary differential equation (ODE) model that highlights the integral control that Hog1 exerts over glycerol biosynthesis in these species. If integral control arises from a separation of time scales (i.e. rapid HOG activation of glycerol production capacity which decays slowly under hyperosmotic conditions), then the model predicts that glycerol production rates elevate upon adaptation to a first stress and this makes the cell adapts faster to a second hyperosmotic stress. It appears as if the cell is able to remember the stress history that is longer than the timescale of signal transduction. This is termed the long-term stress memory. Our experimental data verify this. Like S. cerevisiae, C. albicans mimimizes glycerol efflux during adaptation to hyperosmolarity. Also, transient activation of intermediate kinases in the HOG pathway results in a short-term memory in the signaling pathway. This determines the amplitude of Hog1 phosphorylation under a periodic sequence of stress and non-stressed intervals. Our model suggests that the long-term memory also affects the way a cell responds to periodic stress conditions. Hence, during osmohomeostasis, short-term memory is dependent upon long-term me
Liepe J, Taylor H, Barnes CP, et al., 2012, Calibrating spatio-temporal models of leukocyte dynamics against <i>in vivo</i> live-imaging data using approximate Bayesian computation, INTEGRATIVE BIOLOGY, Vol: 4, Pages: 335-345, ISSN: 1757-9694
Thorne TW, Ho H-L, Huvet M, et al., 2011, Prediction of putative protein interactions through evolutionary analysis of osmotic stress response in the model yeast <i>Saccharomyces cerevisae</i>, FUNGAL GENETICS AND BIOLOGY, Vol: 48, Pages: 504-511, ISSN: 1087-1845
Huvet M, Toni T, Sheng X, et al., 2010, The evolution of the Phage shock protein (Psp) response system: interplay between protein function, genomic organization and system function., Mol Biol Evol
Kelly WP, Thorne TW, Stumpf MPH, 2009, Statistical Null Models for Biological Network Analysis, Statistical and Evolutionary Analysis of Biological Networks
Stumpf MPH, Thorne T, de Silva E, et al., 2008, Estimating the size of the human interactome, PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, Vol: 105, Pages: 6959-6964, ISSN: 0027-8424
Thorne T, Stumpf MPH, 2007, Generating confidence intervals on biological networks, BMC BIOINFORMATICS, Vol: 8, ISSN: 1471-2105
Stumpf MPH, Kelly WP, Thorne T, et al., 2007, Evolution at the system level: the natural history of protein interaction networks, TRENDS IN ECOLOGY & EVOLUTION, Vol: 22, Pages: 366-373, ISSN: 0169-5347
de Silva E, Thorne T, Ingram P, et al., 2006, The effects of incomplete protein interaction data on structural and evolutionary inferences, BMC BIOLOGY, Vol: 4
Stumpf MPH, Thorne TW, 2006, Multi-model inference of network properties from incomplete data, Journal of Integrative Bioinformatics, ISSN: 1613-4516
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.