Publications
44 results found
Zuber V, Cameron A, Myserlis E, et al., 2021, Leveraging genetic data to elucidate the relationship between Covid-19 and ischemic stroke, Journal of the American Heart Association, Vol: 10, Pages: 1-24, ISSN: 2047-9980
BackgroundThe relationship between coronavirus disease 2019 (Covid-19) and ischemic stroke is poorly understood due to potential unmeasured confounding and reverse causation. We aimed to leverage genetic data to triangulate reported associations. Methods and ResultsAnalyses primarily focused on critical Covid-19, defined as hospitalization with Covid-19 requiring respiratory support or resulting in death. Cross-trait linkage disequilibrium score regression was used to estimate genetic correlations of critical Covid-19 with ischemic stroke, other related cardiovascular outcomes, and risk factors common to both Covid-19 and cardiovascular disease (body mass index, smoking and chronic inflammation, estimated using C-reactive protein). Mendelian randomization analysis was performed to investigate whether liability to critical Covid-19 was associated with increased risk of any cardiovascular outcome for which genetic correlation was identified. There was evidence of genetic correlation between critical Covid-19 and ischemic stroke (rg=0.29, false discovery rate (FDR)=0.012), body mass index (rg=0.21, FDR=0.00002) and C-reactive protein (rg=0.20, FDR=0.00035), but no other trait investigated. In Mendelian randomization, liability to critical Covid-19 was associated with increased risk of ischemic stroke (odds ratio [OR] per logOR increase in genetically predicted critical Covid-19 liability 1.03, 95% confidence interval 1.00-1.06, p-value=0.03). Similar estimates were obtained for ischemic stroke subtypes. Consistent estimates were also obtained when performing statistical sensitivity analyses more robust to the inclusion of pleiotropic variants, including multivariable Mendelian randomization analyses adjusting for potential genetic confounding through body mass index, smoking and chronic inflammation. There was no evidence to suggest that genetic liability to ischemic stroke increased the risk of critical Covid-19.ConclusionsThese data support that liability to critica
Alexopoulos A, Bottolo L, 2021, Bayesian Variable Selection for Gaussian Copula Regression Models, JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, Vol: 30, Pages: 578-593, ISSN: 1061-8600
- Author Web Link
- Cite
- Citations: 5
Ruffieux H, Fairfax BP, Nassiri I, et al., 2021, EPISPOT: An epigenome-driven approach for detecting and interpreting hotspots in molecular QTL studies, AMERICAN JOURNAL OF HUMAN GENETICS, Vol: 108, Pages: 983-1000, ISSN: 0002-9297
- Author Web Link
- Cite
- Citations: 3
Ruffieux H, Fairfax BP, Nassiri I, et al., 2020, EPISPOT: an epigenome-driven approach for detecting and interpreting hotspots in molecular QTL studies
<jats:title>Abstract</jats:title><jats:p>We present EPISPOT, a fully joint framework which exploits large panels of epigenetic annotations as variant-level information to enhance molecular quantitative trait locus (QTL) mapping. Thanks to a purpose-built Bayesian inferential algorithm, EPISPOT accommodates functional information for both <jats:italic>cis</jats:italic> and <jats:italic>trans</jats:italic> actions, including QTL <jats:italic>hotspot</jats:italic> effects. It effectively couples simultaneous QTL analysis of thousands of genetic variants and molecular traits, and hypothesis-free selection of biologically interpretable annotations which directly contribute to the QTL effects. This unified, epigenome-aided learning boosts statistical power and sheds light on the regulatory basis of the uncovered hits; EPISPOT therefore marks an essential step towards improving the challenging detection and functional interpretation of <jats:italic>trans</jats:italic>-acting genetic variants and hotspots. We illustrate the advantages of EPISPOT in simulations emulating real-data conditions and in a monocyte expression QTL study, which confirms known hotspots and finds other signals, as well as plausible mechanisms of action. In particular, by highlighting the role of monocyte DNase-I sensitivity sites from > 150 epigenetic annotations, we clarify the mediation effects and cell-type specificity of major hotspots close to the lysozyme gene. Our approach forgoes the daunting and underpowered task of one-annotation-at-a-time enrichment analyses for prioritising <jats:italic>cis</jats:italic> and <jats:italic>trans</jats:italic> QTL hits and is tailored to any transcriptomic, proteomic or metabolomic QTL problem. By enabling principled epigenome-driven QTL mapping transcriptome-wide, EPISPOT helps progress towards a better functional understanding of genetic regulation.</jats:p&g
Lawler K, Huang-Doran I, Sonoyama T, et al., 2020, Leptin-Mediated Changes in the Human Metabolome, JOURNAL OF CLINICAL ENDOCRINOLOGY & METABOLISM, Vol: 105, Pages: 2541-2552, ISSN: 0021-972X
- Author Web Link
- Cite
- Citations: 13
Steele H, Gomez-Duran A, Pyle A, et al., 2020, Metabolic effects of bezafibrate in mitochondrial disease, EMBO MOLECULAR MEDICINE, Vol: 12, ISSN: 1757-4676
- Author Web Link
- Cite
- Citations: 34
Zuber V, Gill D, Ala-Korpela M, et al., 2020, High-throughput multivariable Mendelian randomization analysis prioritizes apolipoprotein B as key lipid risk factor for coronary artery disease
<jats:sec><jats:title>Background</jats:title><jats:p>Genetic variants can be used to prioritize risk factors as potential therapeutic targets via Mendelian randomization (MR). An agnostic statistical framework using Bayesian model averaging (MR-BMA) can disentangle the causal role of correlated risk factors with shared genetic predictors. Here, our objective is to identify lipoprotein measures as mediators between lipid-associated genetic variants and coronary artery disease (CAD) for the purpose of detecting therapeutic targets for CAD.</jats:p></jats:sec><jats:sec><jats:title>Methods</jats:title><jats:p>As risk factors we consider 30 lipoprotein measures and metabolites derived from a high-throughput metabolomics study including 24,925 participants. We fit multivariable MR models of genetic associations with CAD estimated in 453,595 participants (including 113,937 cases) regressed on genetic associations with the risk factors. MR-BMA assigns to each combination of risk factors a model score quantifying how well the genetic associations with CAD are explained. Risk factors are ranked by their marginal score and selected using false discovery rate (FDR) criteria. We perform sensitivity and replication analyses varying the dataset for genetic associations with CAD.</jats:p></jats:sec><jats:sec><jats:title>Results</jats:title><jats:p>In the main analysis, the top combination of risk factors ranked by the model score contains apolipoprotein B (ApoB) only. ApoB is also the highest ranked risk factor with respect to the marginal score (FDR< 0.005). Additionally, ApoB is selected in all replication analyses. No other measure of cholesterol or triglyceride is consistently selected otherwise.</jats:p></jats:sec><jats:sec><jats:title>Conclusions</jats:title><jats:p>Our agnostic genetic investigation prioritizes ApoB across all datasets co
Chen H, Moreno-Moral A, Pesce F, et al., 2019, Author Correction: WWP2 regulates pathological cardiac fibrosis by modulating SMAD2 signaling, Nature Communications, Vol: 10, ISSN: 2041-1723
Ochoa E, Zuber V, Fernandez-Jimenez N, et al., 2019, MethylCal: Bayesian calibration of methylation levels, Nucleic Acids Research, Vol: 47, Pages: 1-14, ISSN: 0305-1048
Bisulfite amplicon sequencing has become the primary choice for single-base methylation quantification of multiple targets in parallel. The main limitation of this technology is a preferential amplification of an allele and strand in the PCR due to methylation state. This effect, known as ‘PCR bias', causes inaccurate estimation of the methylation levels and calibration methods based on standard controls have been proposed to correct for it. Here, we present a Bayesian calibration tool, MethylCal, which can analyse jointly all CpGs within a CpG island (CGI) or a Differentially Methylated Region (DMR), avoiding ‘one-at-a-time' CpG calibration. This enables more precise modeling of the methylation levels observed in the standard controls. It also provides accurate predictions of the methylation levels not considered in the controlled experiment, a feature that is paramount in the derivation of the corrected methylation degree. We tested the proposed method on eight independent assays (two CpG islands and six imprinting DMRs) and demonstrated its benefits, including the ability to detect outliers. We also evaluated MethylCal’s calibration in two practical cases, a clinical diagnostic test on 18 patients potentially affected by Beckwith–Wiedemann syndrome, and 17 individuals with celiac disease. The calibration of the methylation levels obtained by MethylCal allows a clearer identification of patients undergoing loss or gain of methylation in borderline cases and could influence further clinical or treatment decisions.
Chen H, Moreno-Moral A, Pesce F, et al., 2019, WWP2 regulates pathological cardiac fibrosis by modulating SMAD2 signaling, Nature Communications, Vol: 10, Pages: 1-19, ISSN: 2041-1723
Cardiac fibrosis is a final common pathology in inherited and acquired heart diseases that causes cardiac electrical and pump failure. Here, we use systems genetics to identify a pro-fibrotic gene network in the diseased heart and show that this network is regulated by the E3 ubiquitin ligase WWP2, specifically by the WWP2-N terminal isoform. Importantly, the WWP2-regulated pro-fibrotic gene network is conserved across different cardiac diseases characterized by fibrosis: human and murine dilated cardiomyopathy and repaired tetralogy of Fallot. Transgenic mice lacking the N-terminal region of the WWP2 protein show improved cardiac function and reduced myocardial fibrosis in response to pressure overload or myocardial infarction. In primary cardiac fibroblasts, WWP2 positively regulates the expression of pro-fibrotic markers and extracellular matrix genes. TGFβ1 stimulation promotes nuclear translocation of the WWP2 isoforms containing the N-terminal region and their interaction with SMAD2. WWP2 mediates the TGFβ1-induced nucleocytoplasmic shuttling and transcriptional activity of SMAD2.
Adamowicz M, Morgan CC, Haubner BJ, et al., 2018, Functionally conserved noncoding regulators of cardiomyocyte proliferation and regeneration in mouse and human, Circulation: Cardiovascular Genetics, Vol: 11, ISSN: 1942-325X
Background: The adult mammalian heart has little regenerative capacity after myocardial infarction (MI), whereas neonatal mouse heart regenerates without scarring or dysfunction. However, the underlying pathways are poorly defined. We sought to derive insights into the pathways regulating neonatal development of the mouse heart and cardiac regeneration post-MI.Methods and Results: Total RNA-seq of mouse heart through the first 10 days of postnatal life (referred to as P3, P5, P10) revealed a previously unobserved transition in microRNA (miRNA) expression between P3 and P5 associated specifically with altered expression of protein-coding genes on the focal adhesion pathway and cessation of cardiomyocyte cell division. We found profound changes in the coding and noncoding transcriptome after neonatal MI, with evidence of essentially complete healing by P10. Over two-thirds of each of the messenger RNAs, long noncoding RNAs, and miRNAs that were differentially expressed in the post-MI heart were differentially expressed during normal postnatal development, suggesting a common regulatory pathway for normal cardiac development and post-MI cardiac regeneration. We selected exemplars of miRNAs implicated in our data set as regulators of cardiomyocyte proliferation. Several of these showed evidence of a functional influence on mouse cardiomyocyte cell division. In addition, a subset of these miRNAs, miR-144-3p, miR-195a-5p, miR-451a, and miR-6240 showed evidence of functional conservation in human cardiomyocytes.Conclusions: The sets of messenger RNAs, miRNAs, and long noncoding RNAs that we report here merit further investigation as gatekeepers of cell division in the postnatal heart and as targets for extension of the period of cardiac regeneration beyond the neonatal period.
Inshaw JRJ, Walker NM, Wallace C, et al., 2018, The chromosome 6q22.33 region is associated with age at diagnosis of type 1 diabetes and disease risk in those diagnosed under 5 years of age, DIABETOLOGIA, Vol: 61, Pages: 147-157, ISSN: 0012-186X
- Author Web Link
- Cite
- Citations: 23
Rackham OJL, Langley SR, Oates T, et al., 2017, A Bayesian Approach for Analysis of Whole-Genome Bisulfite Sequencing Data Identifies Disease-Associated Changes in DNA Methylation, GENETICS, Vol: 205, Pages: 1443-1458, ISSN: 0016-6731
DNA methylation is a key epigenetic modification involved in gene regulation whose contribution to disease susceptibility remains to be fully understood. Here, we present a novel Bayesian smoothing approach (called ABBA) to detect differentially methylated regions (DMRs) from whole-genome bisulfite sequencing (WGBS). We also show how this approach can be leveraged to identify disease-associated changes in DNA methylation, suggesting mechanisms through which these alterations might affect disease. From a data modeling perspective, ABBA has the distinctive feature of automatically adapting to different correlation structures in CpG methylation levels across the genome while taking into account the distance between CpG sites as a covariate. Our simulation study shows that ABBA has greater power to detect DMRs than existing methods, providing an accurate identification of DMRs in the large majority of simulated cases. To empirically demonstrate the method’s efficacy in generating biological hypotheses, we performed WGBS of primary macrophages derived from an experimental rat system of glomerulonephritis and used ABBA to identify >1000 disease-associated DMRs. Investigation of these DMRs revealed differential DNA methylation localized to a 600 bp region in the promoter of the Ifitm3 gene. This was confirmed by ChIP-seq and RNA-seq analyses, showing differential transcription factor binding at the Ifitm3 promoter by JunD (an established determinant of glomerulonephritis), and a consistent change in Ifitm3 expression. Our ABBA analysis allowed us to propose a new role for Ifitm3 in the pathogenesis of glomerulonephritis via a mechanism involving promoter hypermethylation that is associated with Ifitm3 repression in the rat strain susceptible to glomerulonephritis.
Imprialou M, Petretto E, Bottolo L, 2016, Expression QTLs Mapping and Analysis: A Bayesian Perspective., Systems Genetics, Publisher: Humana Press, Pages: 189-215, ISBN: 978-1-4939-6425-3
The aim of expression Quantitative Trait Locus (eQTL) mapping is the identification of DNA sequence variants that explain variation in gene expression. Given the recent yield of trait-associated genetic variants identified by large-scale genome-wide association analyses (GWAS), eQTL mapping has become a useful tool to understand the functional context where these variants operate and eventually narrow down functional gene targets for disease. Despite its extensive application to complex (polygenic) traits and disease, the majority of eQTL studies still rely on univariate data modeling strategies, i.e., testing for association of all transcript-marker pairs. However these "one at-a-time" strategies are (1) unable to control the number of false-positives when an intricate Linkage Disequilibrium structure is present and (2) are often underpowered to detect the full spectrum of trans-acting regulatory effects. Here we present our viewpoint on the most recent advances on eQTL mapping approaches, with a focus on Bayesian methodology. We review the advantages of the Bayesian approach over frequentist methods and provide an empirical example of polygenic eQTL mapping to illustrate the different properties of frequentist and Bayesian methods. Finally, we discuss how multivariate eQTL mapping approaches have distinctive features with respect to detection of polygenic effects, accuracy, and interpretability of the results.
Bouganis C, Mingas G, Bottolo L, 2016, Particle MCMC algorithms and architectures for accelerating inference in state-space models, International Journal of Approximate Reasoning, Vol: 83, Pages: 413-433, ISSN: 1873-4731
Particle Markov Chain Monte Carlo (pMCMC) is a stochastic algorithm designed to generate samples from a prob-ability distribution, when the density of the distribution does not admit a closed form expression. pMCMC is mostcommonly used to sample from the Bayesian posterior distribution in State-Space Models (SSMs), a class of prob-abilistic models used in numerous scientific applications. Nevertheless, this task is prohibitive when dealing withcomplex SSMs with massive data, due to the high computational cost of pMCMC and its poor performance when theposterior exhibits multi-modality. This paper aims to address both issues by: 1) Proposing a novel pMCMC algorithm(denoted ppMCMC), which uses multiple Markov chains (instead of the one used by pMCMC) to improve sampling
Wang M, Sips P, Khin E, et al., 2016, Wars2 is a determinant of angiogenesis, Nature Communications, Vol: 7, ISSN: 2041-1723
Coronary flow (CF) measured ex vivo is largely determined by capillary density that reflects angiogenic vessel formation in the heart in vivo. Here we exploit this relationship and show that CF in the rat is influenced by a locus on rat chromosome 2 that is also associated with cardiac capillary density. Mitochondrial tryptophanyl-tRNA synthetase (Wars2), encoding an L53F protein variant within the ATP-binding motif, is prioritized as the candidate at the locus by integrating genomic data sets. WARS2(L53F) has low enzyme activity and inhibition of WARS2 in endothelial cells reduces angiogenesis. In the zebrafish, inhibition of wars2 results in trunk vessel deficiencies, disordered endocardial-myocardial contact and impaired heart function. Inhibition of Wars2 in the rat causes cardiac angiogenesis defects and diminished cardiac capillary density. Our data demonstrate a pro-angiogenic function for Wars2 both within and outside the heart that may have translational relevance given the association of WARS2 with common human diseases.
Johnson MR, Shkura K, Langley SR, et al., 2016, Systems genetics identifies a convergent gene network for cognition and neurodevelopmental disease, Nature Neuroscience, Vol: 19, Pages: 223-232, ISSN: 1546-1726
Genetic determinants of cognition are poorly characterized, and their relationship to genes that confer risk for neurodevelopmental disease is unclear. Here we performed a systems-level analysis of genome-wide gene expression data to infer gene-regulatory networks conserved across species and brain regions. Two of these networks, M1 and M3, showed replicable enrichment for common genetic variants underlying healthy human cognitive abilities, including memory. Using exome sequence data from 6,871 trios, we found that M3 genes were also enriched for mutations ascertained from patients with neurodevelopmental disease generally, and intellectual disability and epileptic encephalopathy in particular. M3 consists of 150 genes whose expression is tightly developmentally regulated, but which are collectively poorly annotated for known functional pathways. These results illustrate how systems-level analyses can reveal previously unappreciated relationships between neurodevelopmental disease–associated genes in the developed human brain, and provide empirical support for a convergent gene-regulatory network influencing cognition and neurodevelopmental disease.
Liquet B, Bottolo L, Campanella G, et al., 2016, R2GUESS: a graphics processing unit-based R package for Bayesian variable selection regression of multivariate responses, Journal of Statistical Software, Vol: 69, ISSN: 1548-7660
Technological advances in molecular biology over the past decade have given rise to high dimensional and complex datasets offering the possibility to investigate biological associations between a range of genomic features and complex phenotypes. The analysis of this novel type of data generated unprecedented computational challenges which ultimately led to the definition and implementation of computationally efficient statistical models that were able to scale to genome-wide data, including Bayesian variable selection approaches. While extensive methodological work has been carried out in this area, only few methods capable of handling hundreds of thousands of predictors were implemented and distributed. Among these we recently proposed GUESS, a computationally optimised algorithm making use of graphics processing unit capabilities, which can accommodate multiple outcomes. In this paper we propose R2GUESS, an R package wrapping the original C++ source code. In addition to providing a user-friendly interface of the original code automating its parametrisation, and data handling, R2GUESS also incorporates many features to explore the data, to extend statistical inferences from the native algorithm (e.g., effect size estimation, significance assessment), and to visualize outputs from the algorithm. We first detail the model and its parametrisation, and describe in details its optimised implementation. Based on two examples we finally illustrate its statistical performances and flexibility.
Lewin A, Saadi H, Peters JE, et al., 2015, MT-HESS: an efficient Bayesian approach for simultaneous association detection in OMICS datasets, with application to eQTL mapping in multiple tissues, Bioinformatics, Vol: 32, Pages: 523-532, ISSN: 1367-4803
Motivation: Analysing the joint association between a large set of responses and predictors is a fundamental statistical task in integrative genomics, exemplified by numerous expression Quantitative Trait Loci (eQTL) studies. Of particular interest are the so-called ‘hotspots’, important genetic variants that regulate the expression of many genes. Recently, attention has focussed on whether eQTLs are common to several tissues, cell-types or, more generally, conditions or whether they are specific to a particular condition.Results: We have implemented MT-HESS, a Bayesian hierarchical model that analyses the association between a large set of predictors, e.g. SNPs, and many responses, e.g. gene expression, in multiple tissues, cells or conditions. Our Bayesian sparse regression algorithm goes beyond ‘one-at-a-time’ association tests between SNPs and responses and uses a fully multivariate model search across all linear combinations of SNPs, coupled with a model of the correlation between condition/tissue-specific responses. In addition, we use a hierarchical structure to leverage shared information across different genes, thus improving the detection of hotspots. We show the increase of power resulting from our new approach in an extensive simulation study. Our analysis of two case studies highlights new hotspots that would remain undetected by standard approaches and shows how greater prediction power can be achieved when several tissues are jointly considered.
Rackham OJL, Dellaportas P, Petretto E, et al., 2015, WGBSSuite: simulating whole-genome bisulphite sequencing data and benchmarking differential DNA methylation analysis tools, Bioinformatics, Vol: 31, Pages: 2371-2373, ISSN: 1367-4803
Motivation: As the number of studies looking at differences between DNA methylation increases, there is a growing demand to develop and benchmark statistical methods to analyse these data. To date no objective approach for the comparison of these methods has been developed and as such it remains difficult to assess which analysis tool is most appropriate for a given experiment. As a result, there is an unmet need for a DNA methylation data simulator that can accurately reproduce a wide range of experimental setups, and can be routinely used to compare the performance of different statistical models.Results: We have developed WGBSSuite, a flexible stochastic simulation tool that generates single-base resolution DNA methylation data genome-wide. Several simulator parameters can be derived directly from real datasets provided by the user in order to mimic real case scenarios. Thus, it is possible to choose the most appropriate statistical analysis tool for a given simulated design. To show the usefulness of our simulator, we also report a benchmark of commonly used methods for differential methylation analysis.Availability and implementation: WGBS code and documentation are available under GNU licence at http://www.wgbssuite.org.uk/Contact: owen.rackham@imperial.ac.uk or l.bottolo@imperial.ac.ukSupplementary information:Supplementary data are available at Bioinformatics online.
Johnson MR, Behmoaras J, Bottolo L, et al., 2015, Systems genetics identifies Sestrin 3 as a regulator of a proconvulsant gene network in human epileptic hippocampus, Nature Communications, Vol: 6, ISSN: 2041-1723
Gene-regulatory network analysis is a powerful approach to elucidate the molecular processes and pathways underlying complex disease. Here we employ systems genetics approaches to characterize the genetic regulation of pathophysiological pathways in human temporal lobe epilepsy (TLE). Using surgically acquired hippocampi from 129 TLE patients, we identify a gene-regulatory network genetically associated with epilepsy that contains a specialized, highly expressed transcriptional module encoding proconvulsive cytokines and Toll-like receptor signalling genes. RNA sequencing analysis in a mouse model of TLE using 100 epileptic and 100 control hippocampi shows the proconvulsive module is preserved across-species, specific to the epileptic hippocampus and upregulated in chronic epilepsy. In the TLE patients, we map the trans-acting genetic control of this proconvulsive module to Sestrin 3 (SESN3), and demonstrate that SESN3 positively regulates the module in macrophages, microglia and neurons. Morpholino-mediated Sesn3 knockdown in zebrafish confirms the regulation of the transcriptional module, and attenuates chemically induced behavioural seizures in vivo.
Falchi M, El-Sayed Moustafa JS, Takousis P, et al., 2014, Low copy number of the salivary amylase gene predisposes to obesity, Nature Genetics, Vol: 46, Pages: 492-497, ISSN: 1061-4036
Common multi-allelic copy number variants (CNVs) appear enriched for phenotypic associations compared to their biallelic counterparts1,2,3,4. Here we investigated the influence of gene dosage effects on adiposity through a CNV association study of gene expression levels in adipose tissue. We identified significant association of a multi-allelic CNV encompassing the salivary amylase gene (AMY1) with body mass index (BMI) and obesity, and we replicated this finding in 6,200 subjects. Increased AMY1 copy number was positively associated with both amylase gene expression (P = 2.31 × 10−14) and serum enzyme levels (P < 2.20 × 10−16), whereas reduced AMY1 copy number was associated with increased BMI (change in BMI per estimated copy = −0.15 (0.02) kg/m2; P = 6.93 × 10−10) and obesity risk (odds ratio (OR) per estimated copy = 1.19, 95% confidence interval (CI) = 1.13–1.26; P = 1.46 × 10−10). The OR value of 1.19 per copy of AMY1 translates into about an eightfold difference in risk of obesity between subjects in the top (copy number > 9) and bottom (copy number < 4) 10% of the copy number distribution. Our study provides a first genetic link between carbohydrate metabolism and BMI and demonstrates the power of integrated genomic approaches beyond genome-wide association studies.
Xiao X, Moreno-Moral A, Rotival M, et al., 2014, Multi-tissue analysis of co-expression networks by higher-order generalized singular value decomposition identifies functionally coherent transcriptional modules, PLoS Genetics, Vol: 10, ISSN: 1553-7390
Recent high-throughput efforts such as ENCODE have generated a large body of genome-scale transcriptional data in multiple conditions (e.g., cell-types and disease states). Leveraging these data is especially important for network-based approaches to human disease, for instance to identify coherent transcriptional modules (subnetworks) that can inform functional disease mechanisms and pathological pathways. Yet, genome-scale network analysis across conditions is significantly hampered by the paucity of robust and computationally-efficient methods. Building on the Higher-Order Generalized Singular Value Decomposition, we introduce a new algorithmic approach for efficient, parameter-free and reproducible identification of network-modules simultaneously across multiple conditions. Our method can accommodate weighted (and unweighted) networks of any size and can similarly use co-expression or raw gene expression input data, without hinging upon the definition and stability of the correlation used to assess gene co-expression. In simulation studies, we demonstrated distinctive advantages of our method over existing methods, which was able to recover accurately both common and condition-specific network-modules without entailing ad-hoc input parameters as required by other approaches. We applied our method to genome-scale and multi-tissue transcriptomic datasets from rats (microarray-based) and humans (mRNA-sequencing-based) and identified several common and tissue-specific subnetworks with functional significance, which were not detected by other methods. In humans we recapitulated the crosstalk between cell-cycle progression and cell-extracellular matrix interactions processes in ventricular zones during neocortex expansion and further, we uncovered pathways related to development of later cognitive functions in the cortical plate of the developing brain which were previously unappreciated. Analyses of seven rat tissues identified a multi-tissue subnetwork of co-express
Bottolo L, Chadeau-Hyam M, Hastie DI, et al., 2013, GUESS-ing polygenic associations with multiple phenotypes using a GPU-based evolutionary stochastic search algorithm, PLoS Genetics, Vol: 9, ISSN: 1553-7390
Genome-wide association studies (GWAS) yielded significant advances in defining the genetic architecture of complex traits and disease. Still, a major hurdle of GWAS is narrowing down multiple genetic associations to a few causal variants for functional studies. This becomes critical in multi-phenotype GWAS where detection and interpretability of complex SNP(s)-trait(s) associations are complicated by complex Linkage Disequilibrium patterns between SNPs and correlation between traits. Here we propose a computationally efficient algorithm (GUESS) to explore complex genetic-association models and maximize genetic variant detection. We integrated our algorithm with a new Bayesian strategy for multi-phenotype analysis to identify the specific contribution of each SNP to different trait combinations and study genetic regulation of lipid metabolism in the Gutenberg Health Study (GHS). Despite the relatively small size of GHS (n = 3,175), when compared with the largest published meta-GWAS (n>100,000), GUESS recovered most of the major associations and was better at refining multi-trait associations than alternative methods. Amongst the new findings provided by GUESS, we revealed a strong association of SORT1 with TG-APOB and LIPC with TG-HDL phenotypic groups, which were overlooked in the larger meta-GWAS and not revealed by competing approaches, associations that we replicated in two independent cohorts. Moreover, we demonstrated the increased power of GUESS over alternative multi-phenotype approaches, both Bayesian and non-Bayesian, in a simulation study that mimics real-case scenarios. We showed that our parallel implementation based on Graphics Processing Units outperforms alternative multi-phenotype methods. Beyond multivariate modelling of multi-phenotypes, our Bayesian model employs a flexible hierarchical prior structure for genetic effects that adapts to any correlation structure of the predictors and increases the power to identify associated variants. This pr
Chadeau-Hyam M, Campanella G, Jombart T, et al., 2013, Deciphering the complex: Methodological overview of statistical models to derive OMICS-based biomarkers, ENVIRONMENTAL AND MOLECULAR MUTAGENESIS, Vol: 54, Pages: 542-557, ISSN: 0893-6692
- Author Web Link
- Cite
- Citations: 90
Langley SR, Bottolo L, Kunes J, et al., 2013, Systems-level approaches reveal conservation of trans-regulated genes in the rat and genetic determinants of blood pressure in humans, CARDIOVASCULAR RESEARCH, Vol: 97, Pages: 653-665, ISSN: 0008-6363
- Author Web Link
- Cite
- Citations: 24
Froguel P, Ndiaye NC, Bonnefond A, et al., 2012, A Genome-Wide Association Study Identifies rs2000999 as a Strong Genetic Determinant of Circulating Haptoglobin Levels, PLOS One, Vol: 7, ISSN: 1932-6203
Haptoglobin is an acute phase inflammatory marker. Its main function is to bind hemoglobin released from erythrocytes toaid its elimination, and thereby haptoglobin prevents the generation of reactive oxygen species in the blood. Haptoglobinlevels have been repeatedly associated with a variety of inflammation-linked infectious and non-infectious diseases,including malaria, tuberculosis, human immunodeficiency virus, hepatitis C, diabetes, carotid atherosclerosis, and acutemyocardial infarction. However, a comprehensive genetic assessment of the inter-individual variability of circulatinghaptoglobin levels has not been conducted so far. We used a genome-wide association study initially conducted in 631French children followed by a replication in three additional European sample sets and we identified a common singlenucleotide polymorphism (SNP), rs2000999 located in the Haptoglobin gene (HP) as a strong genetic predictor of circulatingHaptoglobin levels (Poverall = 8.1610259), explaining 45.4% of its genetic variability (11.8% of Hp global variance). Thefunctional relevance of rs2000999 was further demonstrated by its specific association with HP mRNA levels (b = 0.2360.08,P = 0.007). Finally, SNP rs2000999 was associated with decreased total and low-density lipoprotein cholesterol in 8,789European children (Ptotal cholesterol = 0.002 and PLDL = 0.0008). Given the central position of haptoglobin in manyinflammation-related metabolic pathways, the relevance of rs2000999 genotyping when evaluating haptoglobinconcentration should be further investigated in order to improve its diagnostic/therapeutic and/or prevention impact.
El-Sayed Moustafa JS, Eleftherohorinou H, de Smith AJ, et al., 2012, Novel association approach for variable number tandem repeats (VNTRs) identifies DOCK5 as a susceptibility gene for severe obesity, Hum Mol Genet, Vol: 21, Pages: 3727-3738, ISSN: 1460-2083
Variable number tandem repeats (VNTRs) constitute a relatively under-examined class of genomic variants in the context of complex disease because of their sequence complexity and the challenges in assaying them. Recent large-scale genome-wide copy number variant mapping and association efforts have highlighted the need for improved methodology for association studies using these complex polymorphisms. Here we describe the in-depth investigation of a complex region on chromosome 8p21.2 encompassing the dedicator of cytokinesis 5 (DOCK5) gene. The region includes two VNTRs of complex sequence composition which flank a common 3975 bp deletion, all three of which were genotyped by polymerase chain reaction and fragment analysis in a total of 2744 subjects. We have developed a novel VNTR association method named VNTRtest, suitable for association analysis of multi-allelic loci with binary and quantitative outcomes, and have used this approach to show significant association of the DOCK5 VNTRs with childhood and adult severe obesity (P(empirical)= 8.9 x 10(-8) and P= 3.1 x 10(-3), respectively) which we estimate explains ~0.8% of the phenotypic variance. We also identified an independent association between the 3975 base pair (bp) deletion and obesity, explaining a further 0.46% of the variance (P(combined)= 1.6 x 10(-3)). Evidence for association between DOCK5 transcript levels and the 3975 bp deletion (P= 0.027) and both VNTRs (P(empirical)= 0.015) was also identified in adipose tissue from a Swedish family sample, providing support for a functional effect of the DOCK5 deletion and VNTRs. These findings highlight the potential role of DOCK5 in human obesity and illustrate a novel approach for analysis of the contribution of VNTRs to disease susceptibility through association studies.
Bottolo L, Petretto E, Blankenberg S, et al., 2011, Bayesian Detection of Expression Quantitative Trait Loci Hot Spots, GENETICS, Vol: 189, Pages: 1449-+, ISSN: 0016-6731
- Author Web Link
- Cite
- Citations: 34
Richardson S, Bottolo L, Rosenthal JS, 2011, Bayesian models for sparse regression analysis of high dimensional data, Bayesian Statistics 9, Editors: Bernardo, Bayarri, Berger, Dawid, Heckerman, Smith, West, Publisher: OUP Oxford, ISBN: 9780199694587
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.