349 results found
David A, Islam S, Tankhilevich E, et al., 2021, The AlphaFold database of protein structures: a biologist’s guide, Journal of Molecular Biology, Vol: 434, Pages: 167336-167336, ISSN: 0022-2836
AlphaFold, the deep learning algorithm developed by DeepMind, recently released the three-dimensional models of the whole human proteome to the scientific community. Here we discuss the advantages, limitations and the still unsolved challenges of the AlphaFold models from the perspective of a biologist, who may not be an expert in structural biology.
Casadio R, Lenhard B, Sternberg MJE, 2021, Computational Resources for Molecular Biology 2021, JOURNAL OF MOLECULAR BIOLOGY, Vol: 433, ISSN: 0022-2836
David A, Khanna T, Hanna G, et al., 2021, Missense3D-DB web catalogue: an atom-based analysis and repository of 4M human protein-coding genetic variants, Human Genetics, Vol: 140, Pages: 805-812, ISSN: 0340-6717
The interpretation of human genetic variation is one of the greatest challenges of modern genetics. New approaches are urgently needed to prioritize variants, especially those that are rare or lack a definitive clinical interpretation. We examined 10,136,597 human missense genetic variants from GnomAD, ClinVar and UniProt. We were able to perform large-scale atom-based mapping and phenotype interpretation of 3,960,015 of these variants onto 18,874 experimental and 84,818 in house predicted three-dimensional coordinates of the human proteome. We demonstrate that 14% of amino acid substitutions from the GnomAD database that could be structurally analysed are predicted to affect protein structure (n = 568,548, of which 566,439 rare or extremely rare) and may, therefore, have a yet unknown disease-causing effect. The same is true for 19.0% (n = 6266) of variants of unknown clinical significance or conflicting interpretation reported in the ClinVar database. The results of the structural analysis are available in the dedicated web catalogue Missense3D-DB (http://missense3d.bc.ic.ac.uk/). For each of the 4 M variants, the results of the structural analysis are presented in a friendly concise format that can be included in clinical genetic reports. A detailed report of the structural analysis is also available for the non-experts in structural biology. Population frequency and predictions from SIFT and PolyPhen are included for a more comprehensive variant interpretation. This is the first large-scale atom-based structural interpretation of human genetic variation and offers geneticists and the biomedical community a new approach to genetic variant interpretation.
David A, Barbié V, Attimonelli M, et al., 2021, Annotation and curation of human genomic variations: an ELIXIR Implementation Study [version 1; peer review: 1 approved with reservations], F1000Research, Vol: 9, Pages: 1-11, ISSN: 2046-1402
Background: ELIXIR is an intergovernmental organization, primarilybased around European countries, established to host life science resources, including databases, software tools, training material and cloud storage for the scientific community under a single infrastructure. Methods: In 2018, ELIXIR commissioned an international survey on the usage of databases and tools for annotating and curating human genomic variants with the aim of improving ELIXIR resources. The 27-question survey was made available on-line between September and December 2018 to rank the importance and explore the usage and limitations of a wide range of databases and tools for annotating and curating human genomic variants, including resources specific for next generation sequencing, research into mitochondria and protein structure. Results: Eighteen countries participated in the survey and a total of 92 questionnaires were collected and analysed. Most respondents (89%, n=82) were from academia or a research environment. 51% (n=47) ofrespondents gave answers on behalf of a small research group (<10 people), 33% (n=30) in relation to individual work and 16% (n=15) on behalf of a large group (>10 people). The survey showed that the scientific community considers several resources supported by ELIXIR crucial or very important. Moreover, it showed that the work done by ELIXIR is greatly valued. In particular, most respondents acknowledged the importance of key features and benefits promoted by ELIXIR, such as the verified scientific quality and maintenance of ELIXIR-approved resources. Conclusions ELIXIR is a “one-stop-shop” that helps researchers identify the most suitable, robust and well-maintained bioinformatics resources for delivering their research tasks
Singh A, Dauzhenka T, Kundrotas PJ, et al., 2020, Application of docking methodologies to modeled proteins, Proteins: Structure, Function, and Bioinformatics, Vol: 88, Pages: 1180-1188, ISSN: 0887-3585
Protein docking is essential for structural characterization of protein interactions. Besides providing the structure of protein complexes, modeling of proteins and their complexes is important for understanding the fundamental principles and specific aspects of protein interactions. The accuracy of protein modeling, in general, is still less than that of the experimental approaches. Thus, it is important to investigate the applicability of docking techniques to modeled proteins. We present new comprehensive benchmark sets of protein models for the development and validation of protein docking, as well as a systematic assessment of free and template‐based docking techniques on these sets. As opposed to previous studies, the benchmark sets reflect the real case modeling/docking scenario where the accuracy of the models is assessed by the modeling procedure, without reference to the native structure (which would be unknown in practical applications). We also expanded the analysis to include docking of protein pairs where proteins have different structural accuracy. The results show that, in general, the template‐based docking is less sensitive to the structural inaccuracies of the models than the free docking. The near‐native docking poses generated by the template‐based approach, typically, also have higher ranks than those produces by the free docking (although the free docking is indispensable in modeling the multiplicity of protein interactions in a crowded cellular environment). The results show that docking techniques are applicable to protein models in a broad range of modeling accuracy. The study provides clear guidelines for practical applications of docking to protein models.
Wodak SJ, Velankar S, Sternberg MJE, 2020, Modeling protein interactions and complexes in CAPRI 7th CAPRI evaluation meeting April 3-5 EMBL-EBI, Hinxton UK., Proteins: Structure, Function, and Bioinformatics, Vol: 88, Pages: 913-915, ISSN: 0887-3585
Mancini A, Howard SR, Marelli F, et al., 2020, LGR4 deficiency results in delayed puberty through impaired Wnt/β-catenin signaling, JCI insight, Vol: 5, Pages: 1-17, ISSN: 2379-3708
The initiation of puberty is driven by an upsurge in hypothalamic gonadotropin-releasing hormone (GnRH) secretion. In turn, GnRH secretion upsurge depends on the development of a complex GnRH neuroendocrine network during embryonic life. Although delayed puberty (DP) affects up to 2% of the population, is highly heritable, and is associated with adverse health outcomes, the genes underlying DP remain largely unknown. We aimed to discover regulators by whole-exome sequencing of 160 individuals of 67 multigenerational families in our large, accurately phenotyped DP cohort. LGR4 was the only gene remaining after analysis that was significantly enriched for potentially pathogenic, rare variants in 6 probands. Expression analysis identified specific Lgr4 expression at the site of GnRH neuron development. LGR4 mutant proteins showed impaired Wnt/β-catenin signaling, owing to defective protein expression, trafficking, and degradation. Mice deficient in Lgr4 had significantly delayed onset of puberty and fewer GnRH neurons compared with WT, whereas lgr4 knockdown in zebrafish embryos prevented formation and migration of GnRH neurons. Further, genetic lineage tracing showed strong Lgr4-mediated Wnt/β-catenin signaling pathway activation during GnRH neuron development. In conclusion, our results show that LGR4 deficiency impairs Wnt/β-catenin signaling with observed defects in GnRH neuron development, resulting in a DP phenotype.
David A, Sternberg M, 2020, Structure, function and variants analysis of the androgen-regulated TMPRSS2, a drug target candidate for COVID-19 infection, bioRxiv
Lenhard B, Sternberg MJE, 2020, Computational Resources for Molecular Biology: Special Issue 2020, JOURNAL OF MOLECULAR BIOLOGY, Vol: 432, Pages: 3361-3363, ISSN: 0022-2836
David A, 2020, A polygenic biomarker to identify patients with severe hypercholesterolemia of polygenic origin, Molecular Genetics and Genomic Medicine, Vol: 8, Pages: 1-9, ISSN: 2324-9269
BackgroundSevere hypercholesterolemia (HC, LDL‐C > 4.9 mmol/L) affects over 30 million people worldwide. In this study, we validated a new polygenic risk score (PRS) for LDL‐C.MethodsSummary statistics from the Global Lipid Genome Consortium and genotype data from two large populations were used.ResultsA 36‐SNP PRS was generated using data for 2,197 white Americans. In a replication cohort of 4,787 Finns, the PRS was strongly associated with the LDL‐C trait and explained 8% of its variability (p = 10–41). After risk categorization, the risk of having HC was higher in the high‐ versus low‐risk group (RR = 4.17, p < 1 × 10−7). Compared to a 12‐SNP LDL‐C raising score (currently used in the United Kingdom), the PRS explained more LDL‐C variability (8% vs. 6%). Among Finns with severe HC, 53% (66/124) versus 44% (55/124) were classified as high risk by the PRS and LDL‐C raising score, respectively. Moreover, 54% of individuals with severe HC defined as low risk by the LDL‐C raising score were reclassified to intermediate or high risk by the new PRS.ConclusionThe new PRS has a better predictive role in identifying HC of polygenic origin compared to the currently available method and can better stratify patients into diagnostic and therapeutic algorithms.
Singh A, Dauzhenka T, Kundrotas P, et al., 2020, Application of Docking to Protein Models, 64th Annual Meeting of the Biophysical-Society, Publisher: CELL PRESS, Pages: 360A-360A, ISSN: 0006-3495
PDBe-KB consortium, 2020, PDBe-KB: a community-driven resource for structural and functional annotations, Nucleic Acids Research, Vol: 48, Pages: D344-D353, ISSN: 0305-1048
The Protein Data Bank in Europe-Knowledge Base (PDBe-KB, https://pdbe-kb.org) is a community-driven, collaborative resource for literature-derived, manually curated and computationally predicted structural and functional annotations of macromolecular structure data, contained in the Protein Data Bank (PDB). The goal of PDBe-KB is two-fold: (i) to increase the visibility and reduce the fragmentation of annotations contributed by specialist data resources, and to make these data more findable, accessible, interoperable and reusable (FAIR) and (ii) to place macromolecular structure data in their biological context, thus facilitating their use by the broader scientific community in fundamental and applied research. Here, we describe the guidelines of this collaborative effort, the current status of contributed data, and the PDBe-KB infrastructure, which includes the data exchange format, the deposition system for added value annotations, the distributable database containing the assembled data, and programmatic access endpoints. We also describe a series of novel web-pages-the PDBe-KB aggregated views of structure data-which combine information on macromolecular structures from many PDB entries. We have recently released the first set of pages in this series, which provide an overview of available structural and functional information for a protein of interest, referenced by a UniProtKB accession.
Waman VP, Blundell TL, Buchan DWA, et al., 2020, The Genome3D Consortium for Structural Annotations of Selected Model Organisms., Methods Mol Biol, Vol: 2165, Pages: 27-67
Genome3D consortium is a collaborative project involving protein structure prediction and annotation resources developed by six world-leading structural bioinformatics groups, based in the United Kingdom (namely Blundell, Murzin, Gough, Sternberg, Orengo, and Jones). The main objective of Genome3D serves as a common portal to provide both predicted models and annotations of proteins in model organisms, using several resources developed by these labs such as CATH-Gene3D, DOMSERF, pDomTHREADER, PHYRE, SUPERFAMILY, FUGUE/TOCATTA, and VIVACE. These resources primarily use SCOP- and/or CATH-based protein domain assignments. Another objective of Genome3D is to compare structural classifications of protein domains in CATH and SCOP databases and to provide a consensus mapping of CATH and SCOP protein superfamilies. CATH/SCOP mapping analyses led to the identification of total of 1429 consensus superfamilies.Currently, Genome3D provides structural annotations for ten model organisms, including Homo sapiens, Arabidopsis thaliana, Mus musculus, Escherichia coli, Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Plasmodium falciparum, Staphylococcus aureus, and Schizosaccharomyces pombe. Thus, Genome3D serves as a common gateway to each structure prediction/annotation resource and allows users to perform comparative assessment of the predictions. It, thus, assists researchers to broaden their perspective on structure/function predictions of their query protein of interest in selected model organisms.
Leal Ayala LG, David A, Jarvelin MR, et al., 2019, Identification of disease-associated loci using machine learning for genotype and network data integration, Bioinformatics, Vol: 35, Pages: 5182-5190, ISSN: 1367-4803
MotivationIntegration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci.ResultsWe developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals’ ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user’s research needs.
Sillitoe I, Andreeva A, Blundell TL, et al., 2019, Genome3D: integrating a collaborative data pipeline to expand the depth and breadth of consensus protein structure annotation, Nucleic Acids Research, Vol: 48, Pages: D314-D319, ISSN: 0305-1048
Genome3D (https://www.genome3d.eu) is a freely available resource that provides consensus structural annotations for representative protein sequences taken from a selection of model organisms. Since the last NAR update in 2015, the method of data submission has been overhauled, with annotations now being 'pushed' to the database via an API. As a result, contributing groups are now able to manage their own structural annotations, making the resource more flexible and maintainable. The new submission protocol brings a number of additional benefits including: providing instant validation of data and avoiding the requirement to synchronise releases between resources. It also makes it possible to implement the submission of these structural annotations as an automated part of existing internal workflows. In turn, these improvements facilitate Genome3D being opened up to new prediction algorithms and groups. For the latest release of Genome3D (v2.1), the underlying dataset of sequences used as prediction targets has been updated using the latest reference proteomes available in UniProtKB. A number of new reference proteomes have also been added of particular interest to the wider scientific community: cow, pig, wheat and mycobacterium tuberculosis. These additions, along with improvements to the underlying predictions from contributing resources, has ensured that the number of annotations in Genome3D has nearly doubled since the last NAR update article. The new API has also been used to facilitate the dissemination of Genome3D data into InterPro, thereby widening the visibility of both the annotation data and annotation algorithms.
Lenhard B, Sternberg MJE, 2019, Computation resources for molecular biology: Special issue 2019, Journal of Molecular Biology, Vol: 431, Pages: 2395-2397, ISSN: 0022-2836
Ittisoponpisan S, Islam S, Khanna T, et al., 2019, Can predicted protein 3D-structures provide reliable insights into whether missense variants are disease-associated?, Journal of Molecular Biology, Vol: 431, Pages: 2197-2212, ISSN: 0022-2836
Knowledge of protein structure can be used to predict the phenotypic consequence of a missense variant. Since structural coverage of the human proteome can be roughly tripled to over 50% of the residues if homology-predicted structures are included in addition to experimentally determined coordinates, it is important to assess the reliability of using predicted models when analyzing missense variants. Accordingly, we assess whether a missense variant is structurally damaging by using experimental and predicted structures. We considered 606 experimental structures and show that 40% of the 1965 disease-associated missense variants analyzed have a structurally damaging change in the mutant structure. Only 11% of the 2134 neutral variants are structurally damaging. Importantly, similar results are obtained when 1052 structures predicted using Phyre2 algorithm were used, even when the model shares low (< 40%) sequence identity to the template. Thus, structure-based analysis of the effects of missense variants can be effectively applied to homology models. Our in-house pipeline, Missense3D, for structurally assessing missense variants was made available at http://www.sbg.bio.ic.ac.uk/~missense3d
Ofoegbu T, David A, Kelley L, et al., 2019, PhyreRisk: a dynamic web application to bridge genomics, proteomics and 3D structural data to guide interpretation of human genetic variants, Journal of Molecular Biology, Vol: 431, Pages: 2460-2466, ISSN: 0022-2836
PhyreRisk is an open-access, publicly accessible web application for interactively bridging genomic, proteomic and structural data facilitating the mapping of human variants onto protein structures. A major advance over other tools for sequence-structure variant mapping is that PhyreRisk provides information on 20,214 human canonical proteins and an additional 22,271 alternative protein sequences (isoforms). Specifically, PhyreRisk provides structural coverage (partial or complete) for 70% (14,035 of 20,214 canonical proteins) of the human proteome, by storing 18,874 experimental structures and 84,818 pre-built models of canonical proteins and their isoforms generated using our in house Phyre2. PhyreRisk reports 55,732 experimentally, multi-validated protein interactions from IntAct and 24,260 experimental structures of protein complexes.Another major feature of PhyreRisk is that, rather than presenting a limited set of precomputed variant-structure mapping of known genetic variants, it allows the user to explore novel variants using, as input, genomic coordinates formats (Ensembl, VCF, reference SNP ID and HGVS notations) and Human Build GRCh37 and GRCh38. PhyreRisk also supports mapping variants using amino acid coordinates and searching for genes or proteins of interest.PhyreRisk is designed to empower researchers to translate genetic data into protein structural information, thereby providing a more comprehensive appreciation of the functional impact of variants. PhyreRisk is freely available at http://phyrerisk.bc.ic.ac.uk
Ciezarek AG, Osborne OG, Shipley ON, et al., 2019, Phylotranscriptomic Insights into the Diversification of Endothermic Thunnus Tunas, Molecular Biology and Evolution, Vol: 36, Pages: 84-96, ISSN: 0737-4038
Birds, mammals, and certain fishes, including tunas, opahs and lamnid sharks, are endothermic, conserving internallygenerated, metabolic heat to maintain body or tissue temperatures above that of the environment. Bluefin tunas arecommercially important fishes worldwide, and some populations are threatened. They are renowned for their endothermy, maintaining elevated temperatures of the oxidative locomotor muscle, viscera, brain and eyes, and occupying cold, productive high-latitude waters. Less cold-tolerant tunas, such as yellowfin tuna, by contrast, remain inwarm-temperate to tropical waters year-round, reproducing more rapidly than most temperate bluefin tuna populations, providing resiliency in the face of large-scale industrial fisheries. Despite the importance of these traits tonot only fisheries but also habitat utilization and responses to climate change, little is known of the genetic processesunderlying the diversification of tunas. In collecting and analyzing sequence data across 29,556 genes, we found thatparallel selection on standing genetic variation is associated with the evolution of endothermy in bluefin tunas. Thisincludes two shared substitutions in genes encoding glycerol-3 phosphate dehydrogenase, an enzyme that contributesto thermogenesis in bumblebees and mammals, as well as four genes involved in the Krebs cycle, oxidative phosphorylation, b-oxidation, and superoxide removal. Using phylogenetic techniques, we further illustrate that the eightThunnus species are genetically distinct, but found evidence of mitochondrial genome introgression across twospecies. Phylogeny-based metrics highlight conservation needs for some of these species.
Ciezarek A, Osbourne O, Shipley ON, et al., 2018, Diversification of characteristics related to regional endothermy in Thunnus tunas, Molecular Biology and Evolution, ISSN: 1537-1719
Birds, mammals, and certain fishes, including tunas, opahs and lamnid sharks, are endothermic, conserving internally generated, metabolic heat to maintain body or tissue temperatures above that of the environment. Bluefin tunas, among the most threatened, but commercially important, fishes worldwide are renowned regional endotherms, maintaining elevated temperatures of the oxidative locomotor muscle, viscera, brain and eyes, and occupying cold, productive high-latitude waters. Less cold-tolerant tuna, such as yellowfin, by contrast, remain in warm-temperate to tropical waters year-round, reproducing more rapidly than temperate bluefin tuna. Thereby, they are more resilient to fisheries, whereas bluefins have declined steeply. Despite the importance of these traits to not only fisheries, but response to climate change, little is known of the genetic processes underlying the diversification of tuna. In collecting and analysing sequence data across 29,556 genes, we found that parallel selection on standing genetic variation has driven the evolution of endothermy in bluefin tunas. This includes two shared substitutions in genes encoding glycerol-3 phosphate dehydrogenase, an enzyme which underlies thermogenesis in bumblebees and mammals, as well as four genes involved in the Krebs cycle, oxidative phosphorylation, β-oxidation and superoxide removal. Using phylogenetic techniques, we further illustrate that the eight Thunnus species are genetically distinct, but found evidence of mitochondrial genome introgression across two species. Phylogeny-based metrics highlight conservation needs for some of these species.
David A, Ittisoponpisan S, Sternberg MJE, 2018, PROTEIN STRUCTURE ANALYSIS AIDS IN THE INTERPRETATION OF GENETIC VARIANTS OF UNCERTAIN CLINICAL SIGNIFICANCE IDENTIFIED IN THE LDL RECEPTOR, HEART UK 32nd Annual Medical and Scientific Conference on Hot Topics in Atheroscloerosis and Cardiovascular Disease, Publisher: ELSEVIER IRELAND LTD, Pages: E2-E3, ISSN: 1567-5688
Reynolds CR, Islam S, Sternberg MJE, 2018, EzMol: A web server wizard for the rapid visualisation and image production of protein and nucleic acid structures, Journal of Molecular Biology, Vol: 430, Pages: 2244-2248, ISSN: 0022-2836
EzMol is a molecular visualization Web server in the form of a software wizard, located at http://www.sbg.bio.ic.ac.uk/ezmol/. It is designed for easy and rapid image manipulation and display of protein molecules, and is intended for users who need to quickly produce high-resolution images of protein molecules but do not have the time or inclination to use a software molecular visualization system. EzMol allows the upload of molecular structure files in PDB format to generate a Web page including a representation of the structure that the user can manipulate. EzMol provides intuitive options for chain display, adjusting the color/transparency of residues, side chains and protein surfaces, and for adding labels to residues. The final adjusted protein image can then be downloaded as a high-resolution image. There are a range of applications for rapid protein display, including the illustration of specific areas of a protein structure and the rapid prototyping of images.
Sternberg MJE, Yosef N, 2018, Computation Resources for Molecular Biology: Special Issue 2018, JOURNAL OF MOLECULAR BIOLOGY, Vol: 430, Pages: 2181-2183, ISSN: 0022-2836
Cornish AJ, David A, Sternberg MJE, 2018, PhenoRank: reducing study bias in gene prioritisation through simulation, Bioinformatics, Vol: 34, Pages: 2087-2095, ISSN: 1367-4803
Motivation: Genome-wide association studies have identified thousands of loci associated with human disease, but identifying the causal genes at these loci is often difficult. Several methods prioritise genes most likely to be disease causing through the integration of biological data, including protein-protein interaction and phenotypic data. Data availability is not the same for all genes however, potentially influencing the performance of these methods. Results: We demonstrate that whilst disease genes tend to be associated with greater numbers of data, this may be at least partially a result of them being better studied. With this observation we develop PhenoRank, which prioritises disease genes whilst avoiding being biased towards genes with more available data. Bias is avoided by comparing gene scores generated for the query disease against gene scores generated using simulated sets of phenotype terms, which ensures that differences in data availability do not affect the ranking of genes. We demonstrate that whilst existing prioritisation methods are biased by data availability, PhenoRank is not similarly biased. Avoiding this bias allows PhenoRank to effectively prioritise genes with fewer available data and improves its overall performance. PhenoRank outperforms three available prioritisation methods in cross-validation (PhenoRank area under receiver operating characteristic curve [AUC]=0.89, DADA AUC=0.87, EXOMISER AUC=0.71, PRINCE AUC=0.83, P < 2.2 × 10-16). Availability: PhenoRank is freely available for download at https://github.com/alexjcornish/PhenoRank. Contact: email@example.com. Supplementary information: Supplementary data are available at Bioinformatics online.
Alhuzimi E, Leal LG, Sternberg MJE, et al., 2017, Properties of human genes guided by their enrichment in rare and common variants, Human Mutation, Vol: 39, Pages: 365-370, ISSN: 1059-7794
We analyzed 563,099 common (minor allele frequency, MAF≥0.01) and rare (MAF < 0.01) genetic variants annotated in ExAC and UniProt and 26,884 disease-causing variants from ClinVar and UniProt occurring in the coding region of 17,975 human protein-coding genes. Three novel sets of genes were identified: those enriched in rare variants (n = 32 genes), in common variants (n = 282 genes), and in disease-causing variants (n = 800 genes). Genes enriched in rare variants have far greater similarities in terms of biological and network properties to genes enriched in disease-causing variants, than to genes enriched in common variants. However, in half of the genes enriched in rare variants (AOC2, MAMDC4, ANKHD1, CDC42BPB, SPAG5, TRRAP, TANC2, IQCH, USP54, SRRM2, DOPEY2, and PITPNM1), no disease-causing variants have been identified in major, publicly available databases. Thus, genetic variants in these genes are strong candidates for disease and their identification, as part of sequencing studies, should prompt further in vitro analyses.
Bryant WA, Stentz R, Le Gall G, et al., 2017, In silico analysis of the small molecule content of outer membrane vesicles produced by Bacteroides thetaiotaomicron indicates an extensive metabolic link between microbe and host, Frontiers in Microbiology, Vol: 8, ISSN: 1664-302X
The interactions between the gut microbiota and its host are of central importance to the health of the host. Outer membrane vesicles (OMVs) are produced ubiquitously by Gram-negative bacteria including the gut commensal Bacteroides thetaiotaomicron. These vesicles can interact with the host in various ways but until now their complement of small molecules has not been investigated in this context. Using an untargeted high-coverage metabolomic approach we have measured the small molecule content of these vesicles in contrasting in vitro conditions to establish what role these metabolites could perform when packed into these vesicles. B. thetaiotaomicron packs OMVs with a highly conserved core set of small molecules which are strikingly enriched with mouse-digestible metabolites and with metabolites previously shown to be associated with colonization of the murine GIT. By use of an expanded genome-scale metabolic model of B. thetaiotaomicron and a potential host (the mouse) we have established many possible metabolic pathways between the two organisms that were previously unknown, and have found several putative novel metabolic functions for mouse that are supported by gene annotations, but that do not currently appear in existing mouse metabolic networks. The lipidome of these OMVs bears no relation to the mouse lipidome, so the purpose of this particular composition of lipids remains unclear. We conclude from this analysis that through intimate symbiotic evolution OMVs produced by B. thetaiotaomicron are likely to have been adopted as a conduit for small molecules bound for the mammalian host in vivo.
Greener J, sternberg MJE, 2017, Structure-based prediction of protein allostery, Current Opinion in Structural Biology, Vol: 50, Pages: 1-8, ISSN: 0959-440X
Allostery is the functional change at one site on a protein caused by a change at a distant site. In order for the benefits of allostery to be taken advantage of, both for basic understanding of proteins and to develop new classes of drugs, the structure-based prediction of allosteric binding sites, modulators and communication pathways is necessary. Here we review the recently emerging field of allosteric prediction, focusing mainly on computational methods. We also describe the search for cryptic binding pockets and attempts to design allostery into proteins. The development and adoption of such methods is essential or the long-preached potential of allostery will remain elusive.
Waese J, Fan J, Pasha A, et al., 2017, ePlant: Visualizing and Exploring Multiple Levels of Data for HypothesisGeneration in Plant Biology, Plant Cell, Vol: 29, Pages: 1806-1821, ISSN: 1532-298X
A big challenge in current systems biology research arises when different types of data must be accessed from separate sources and visualized using separate tools. The high cognitive load required to navigate such a workflow is detrimental to hypothesis generation. Accordingly, there is a need for a robust research platform that incorporates all data and provides integrated search, analysis, and visualization features through a single portal. Here, we present ePlant (http://bar.utoronto.ca/eplant), a visual analytic tool for exploring multiple levels of Arabidopsis thaliana data through a zoomable user interface. ePlant connects to several publicly available web services to download genome, proteome, interactome, transcriptome, and 3D molecular structure data for one or more genes or gene products of interest. Data are displayed with a set of visualization tools that are presented using a conceptual hierarchy from big to small, and many of the tools combine information from more than one data type. We describe the development of ePlant in this article and present several examples illustrating its integrative features for hypothesis generation. We also describe the process of deploying ePlant as an “app” on Araport. Building on readily available web services, the code for ePlant is freely available for any other biological species research.
Scales M, Chubb D, Dobbins SE, et al., 2017, Search for rare protein altering variants influencing susceptibility to multiple myeloma, Oncotarget, Vol: 8, Pages: 36203-36210, ISSN: 1949-2553
The genetic basis underlying the inherited risk of developing multiple myeloma (MM) is largely unknown. To examine the impact of rare protein altering variants on the risk of developing MM we analyzed high-coverage exome sequencing data on 513 MM cases and 1,569 healthy controls, performing both single variant and gene burden tests. We did not identify any recurrent coding low-frequency alleles (1–5%) with moderate effect that were statistically associated with MM. In a gene burden analysis we did however identify a promising relationship between variation in the marrow kinetochore microtubule stromal gene KIF18A, which plays a role in control mitotic chromosome positioning dynamics, and risk of MM (P =3.6x10−6). Further analysis showed KIF18A displays a distinct pattern of expression across molecular subgroups of MM as well as being associated with patient survival. Our results inform future study design and provide a resource for contextualizing the impact of candidate MM susceptibility genes.
Greener JG, Filippis I, Sternberg MJE, 2017, Predicting protein dynamics and allostery using multi-protein atomic distance constraints, Structure, Vol: 25, Pages: 546-558, ISSN: 1878-4186
The related concepts of protein dynamics, conformational ensembles and allostery are of-ten difficult to study with molecular dynamics (MD) due to the timescales involved. Wepresent ExProSE (Exploration of Protein Structural Ensembles), a distance geometry-basedmethod that generates an ensemble of protein structures from two input structures. ExProSEprovides a unified framework for the exploration of protein structure and dynamics in a fastand accessible way. Using a dataset of apo/holo pairs it is shown that existing coarse-grainedmethods can often not span large conformational changes. For T4-lysozyme ExProSE is ableto generate ensembles that are more native-like than tCONCOORD and NMSim, and com-parable to targeted MD. By adding additional constraints representing potential modulators,ExProSE can predict allosteric sites. ExProSE ranks an allosteric pocket first or second for 27out of 58 allosteric proteins, which is similar and complementary to existing methods. TheExProSE source code is freely-available.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.