330 results found
Lenhard B, Sternberg MJE, 2019, Computation resources for molecular biology: Special issue 2019, Journal of Molecular Biology, Vol: 431, Pages: 2395-2397, ISSN: 0022-2836
Ittisoponpisan S, Islam S, Khanna T, et al., 2019, Can predicted protein 3D-structures provide reliable insights into whether missense variants are disease-associated?, Journal of Molecular Biology, Vol: 431, Pages: 2197-2212, ISSN: 0022-2836
Knowledge of protein structure can be used to predict the phenotypic consequence of a missense variant. Since structural coverage of the human proteome can be roughly tripled to over 50% of the residues if homology-predicted structures are included in addition to experimentally determined coordinates, it is important to assess the reliability of using predicted models when analyzing missense variants. Accordingly, we assess whether a missense variant is structurally damaging by using experimental and predicted structures. We considered 606 experimental structures and show that 40% of the 1965 disease-associated missense variants analyzed have a structurally damaging change in the mutant structure. Only 11% of the 2134 neutral variants are structurally damaging. Importantly, similar results are obtained when 1052 structures predicted using Phyre2 algorithm were used, even when the model shares low (< 40%) sequence identity to the template. Thus, structure-based analysis of the effects of missense variants can be effectively applied to homology models. Our in-house pipeline, Missense3D, for structurally assessing missense variants was made available at http://www.sbg.bio.ic.ac.uk/~missense3d
Leal Ayala LG, David A, Jarvelin MR, et al., 2019, Identification of disease-associated loci using machine learning for genotype and network data integration, Bioinformatics, ISSN: 1367-4803
MotivationIntegration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci.ResultsWe developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals’ ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user’s research needs.
Ofoegbu T, David A, Kelley L, et al., 2019, PhyreRisk: a dynamic web application to bridge genomics, proteomics and 3D structural data to guide interpretation of human genetic variants, Journal of Molecular Biology, ISSN: 0022-2836
PhyreRisk is an open-access, publicly accessible web application for interactively bridging genomic, proteomic and structural data facilitating the mapping of human variants onto protein structures. A major advance over other tools for sequence-structure variant mapping is that PhyreRisk provides information on 20,214 human canonical proteins and an additional 22,271 alternative protein sequences (isoforms). Specifically, PhyreRisk provides structural coverage (partial or complete) for 70% (14,035 of 20,214 canonical proteins) of the human proteome, by storing 18,874 experimental structures and 84,818 pre-built models of canonical proteins and their isoforms generated using our in house Phyre2. PhyreRisk reports 55,732 experimentally, multi-validated protein interactions from IntAct and 24,260 experimental structures of protein complexes.Another major feature of PhyreRisk is that, rather than presenting a limited set of precomputed variant-structure mapping of known genetic variants, it allows the user to explore novel variants using, as input, genomic coordinates formats (Ensembl, VCF, reference SNP ID and HGVS notations) and Human Build GRCh37 and GRCh38. PhyreRisk also supports mapping variants using amino acid coordinates and searching for genes or proteins of interest.PhyreRisk is designed to empower researchers to translate genetic data into protein structural information, thereby providing a more comprehensive appreciation of the functional impact of variants. PhyreRisk is freely available at http://phyrerisk.bc.ic.ac.uk
Ciezarek AG, Osborne OG, Shipley ON, et al., 2019, Phylotranscriptomic Insights into the Diversification of Endothermic Thunnus Tunas, MOLECULAR BIOLOGY AND EVOLUTION, Vol: 36, Pages: 84-96, ISSN: 0737-4038
Ciezarek A, Osbourne O, Shipley ON, et al., Diversification of characteristics related to regional endothermy in Thunnus tunas, Molecular Biology and Evolution, ISSN: 1537-1719
Birds, mammals, and certain fishes, including tunas, opahs and lamnid sharks, are endothermic, conserving internally generated, metabolic heat to maintain body or tissue temperatures above that of the environment. Bluefin tunas, among the most threatened, but commercially important, fishes worldwide are renowned regional endotherms, maintaining elevated temperatures of the oxidative locomotor muscle, viscera, brain and eyes, and occupying cold, productive high-latitude waters. Less cold-tolerant tuna, such as yellowfin, by contrast, remain in warm-temperate to tropical waters year-round, reproducing more rapidly than temperate bluefin tuna. Thereby, they are more resilient to fisheries, whereas bluefins have declined steeply. Despite the importance of these traits to not only fisheries, but response to climate change, little is known of the genetic processes underlying the diversification of tuna. In collecting and analysing sequence data across 29,556 genes, we found that parallel selection on standing genetic variation has driven the evolution of endothermy in bluefin tunas. This includes two shared substitutions in genes encoding glycerol-3 phosphate dehydrogenase, an enzyme which underlies thermogenesis in bumblebees and mammals, as well as four genes involved in the Krebs cycle, oxidative phosphorylation, β-oxidation and superoxide removal. Using phylogenetic techniques, we further illustrate that the eight Thunnus species are genetically distinct, but found evidence of mitochondrial genome introgression across two species. Phylogeny-based metrics highlight conservation needs for some of these species.
David A, Ittisoponpisan S, Sternberg MJE, 2018, PROTEIN STRUCTURE ANALYSIS AIDS IN THE INTERPRETATION OF GENETIC VARIANTS OF UNCERTAIN CLINICAL SIGNIFICANCE IDENTIFIED IN THE LDL RECEPTOR, HEART UK 32nd Annual Medical and Scientific Conference on Hot Topics in Atheroscloerosis and Cardiovascular Disease, Publisher: ELSEVIER IRELAND LTD, Pages: E2-E3, ISSN: 1567-5688
Reynolds CR, Islam S, Sternberg MJE, 2018, EzMol: A web server wizard for the rapid visualisation and image production of protein and nucleic acid structures, Journal of Molecular Biology, Vol: 430, Pages: 2244-2248, ISSN: 0022-2836
EzMol is a molecular visualization Web server in the form of a software wizard, located at http://www.sbg.bio.ic.ac.uk/ezmol/. It is designed for easy and rapid image manipulation and display of protein molecules, and is intended for users who need to quickly produce high-resolution images of protein molecules but do not have the time or inclination to use a software molecular visualization system. EzMol allows the upload of molecular structure files in PDB format to generate a Web page including a representation of the structure that the user can manipulate. EzMol provides intuitive options for chain display, adjusting the color/transparency of residues, side chains and protein surfaces, and for adding labels to residues. The final adjusted protein image can then be downloaded as a high-resolution image. There are a range of applications for rapid protein display, including the illustration of specific areas of a protein structure and the rapid prototyping of images.
Sternberg MJE, Yosef N, 2018, Computation Resources for Molecular Biology: Special Issue 2018, JOURNAL OF MOLECULAR BIOLOGY, Vol: 430, Pages: 2181-2183, ISSN: 0022-2836
Cornish AJ, David A, Sternberg MJE, 2018, PhenoRank: reducing study bias in gene prioritisation through simulation, Bioinformatics, Vol: 34, Pages: 2087-2095, ISSN: 1367-4803
Motivation: Genome-wide association studies have identified thousands of loci associated with human disease, but identifying the causal genes at these loci is often difficult. Several methods prioritise genes most likely to be disease causing through the integration of biological data, including protein-protein interaction and phenotypic data. Data availability is not the same for all genes however, potentially influencing the performance of these methods. Results: We demonstrate that whilst disease genes tend to be associated with greater numbers of data, this may be at least partially a result of them being better studied. With this observation we develop PhenoRank, which prioritises disease genes whilst avoiding being biased towards genes with more available data. Bias is avoided by comparing gene scores generated for the query disease against gene scores generated using simulated sets of phenotype terms, which ensures that differences in data availability do not affect the ranking of genes. We demonstrate that whilst existing prioritisation methods are biased by data availability, PhenoRank is not similarly biased. Avoiding this bias allows PhenoRank to effectively prioritise genes with fewer available data and improves its overall performance. PhenoRank outperforms three available prioritisation methods in cross-validation (PhenoRank area under receiver operating characteristic curve [AUC]=0.89, DADA AUC=0.87, EXOMISER AUC=0.71, PRINCE AUC=0.83, P < 2.2 × 10-16). Availability: PhenoRank is freely available for download at https://github.com/alexjcornish/PhenoRank. Contact: email@example.com. Supplementary information: Supplementary data are available at Bioinformatics online.
Alhuzimi E, Leal LG, Sternberg MJE, et al., 2017, Properties of human genes guided by their enrichment in rare and common variants, Human Mutation, Vol: 39, Pages: 365-370, ISSN: 1059-7794
We analyzed 563,099 common (minor allele frequency, MAF≥0.01) and rare (MAF < 0.01) genetic variants annotated in ExAC and UniProt and 26,884 disease-causing variants from ClinVar and UniProt occurring in the coding region of 17,975 human protein-coding genes. Three novel sets of genes were identified: those enriched in rare variants (n = 32 genes), in common variants (n = 282 genes), and in disease-causing variants (n = 800 genes). Genes enriched in rare variants have far greater similarities in terms of biological and network properties to genes enriched in disease-causing variants, than to genes enriched in common variants. However, in half of the genes enriched in rare variants (AOC2, MAMDC4, ANKHD1, CDC42BPB, SPAG5, TRRAP, TANC2, IQCH, USP54, SRRM2, DOPEY2, and PITPNM1), no disease-causing variants have been identified in major, publicly available databases. Thus, genetic variants in these genes are strong candidates for disease and their identification, as part of sequencing studies, should prompt further in vitro analyses.
Bryant WA, Stentz R, Le Gall G, et al., 2017, In silico analysis of the small molecule content of outer membrane vesicles produced by Bacteroides thetaiotaomicron indicates an extensive metabolic link between microbe and host, Frontiers in Microbiology, Vol: 8, ISSN: 1664-302X
The interactions between the gut microbiota and its host are of central importance to the health of the host. Outer membrane vesicles (OMVs) are produced ubiquitously by Gram-negative bacteria including the gut commensal Bacteroides thetaiotaomicron. These vesicles can interact with the host in various ways but until now their complement of small molecules has not been investigated in this context. Using an untargeted high-coverage metabolomic approach we have measured the small molecule content of these vesicles in contrasting in vitro conditions to establish what role these metabolites could perform when packed into these vesicles. B. thetaiotaomicron packs OMVs with a highly conserved core set of small molecules which are strikingly enriched with mouse-digestible metabolites and with metabolites previously shown to be associated with colonization of the murine GIT. By use of an expanded genome-scale metabolic model of B. thetaiotaomicron and a potential host (the mouse) we have established many possible metabolic pathways between the two organisms that were previously unknown, and have found several putative novel metabolic functions for mouse that are supported by gene annotations, but that do not currently appear in existing mouse metabolic networks. The lipidome of these OMVs bears no relation to the mouse lipidome, so the purpose of this particular composition of lipids remains unclear. We conclude from this analysis that through intimate symbiotic evolution OMVs produced by B. thetaiotaomicron are likely to have been adopted as a conduit for small molecules bound for the mammalian host in vivo.
Greener J, sternberg MJE, 2017, Structure-based prediction of protein allostery, Current Opinion in Structural Biology, Vol: 50, Pages: 1-8, ISSN: 0959-440X
Allostery is the functional change at one site on a protein caused by a change at a distant site. In order for the benefits of allostery to be taken advantage of, both for basic understanding of proteins and to develop new classes of drugs, the structure-based prediction of allosteric binding sites, modulators and communication pathways is necessary. Here we review the recently emerging field of allosteric prediction, focusing mainly on computational methods. We also describe the search for cryptic binding pockets and attempts to design allostery into proteins. The development and adoption of such methods is essential or the long-preached potential of allostery will remain elusive.
Waese J, Fan J, Pasha A, et al., 2017, ePlant: Visualizing and Exploring Multiple Levels of Data for HypothesisGeneration in Plant Biology, Plant Cell, Vol: 29, Pages: 1806-1821, ISSN: 1532-298X
A big challenge in current systems biology research arises when different types of data must be accessed from separate sources and visualized using separate tools. The high cognitive load required to navigate such a workflow is detrimental to hypothesis generation. Accordingly, there is a need for a robust research platform that incorporates all data and provides integrated search, analysis, and visualization features through a single portal. Here, we present ePlant (http://bar.utoronto.ca/eplant), a visual analytic tool for exploring multiple levels of Arabidopsis thaliana data through a zoomable user interface. ePlant connects to several publicly available web services to download genome, proteome, interactome, transcriptome, and 3D molecular structure data for one or more genes or gene products of interest. Data are displayed with a set of visualization tools that are presented using a conceptual hierarchy from big to small, and many of the tools combine information from more than one data type. We describe the development of ePlant in this article and present several examples illustrating its integrative features for hypothesis generation. We also describe the process of deploying ePlant as an “app” on Araport. Building on readily available web services, the code for ePlant is freely available for any other biological species research.
Scales M, Chubb D, Dobbins SE, et al., 2017, Search for rare protein altering variants influencing susceptibility to multiple myeloma, Oncotarget, Vol: 8, Pages: 36203-36210, ISSN: 1949-2553
The genetic basis underlying the inherited risk of developing multiple myeloma (MM) is largely unknown. To examine the impact of rare protein altering variants on the risk of developing MM we analyzed high-coverage exome sequencing data on 513 MM cases and 1,569 healthy controls, performing both single variant and gene burden tests. We did not identify any recurrent coding low-frequency alleles (1–5%) with moderate effect that were statistically associated with MM. In a gene burden analysis we did however identify a promising relationship between variation in the marrow kinetochore microtubule stromal gene KIF18A, which plays a role in control mitotic chromosome positioning dynamics, and risk of MM (P =3.6x10−6). Further analysis showed KIF18A displays a distinct pattern of expression across molecular subgroups of MM as well as being associated with patient survival. Our results inform future study design and provide a resource for contextualizing the impact of candidate MM susceptibility genes.
Greener JG, Filippis I, Sternberg MJE, 2017, Predicting protein dynamics and allostery using multi-protein atomic distance constraints, Structure, Vol: 25, Pages: 546-558, ISSN: 1878-4186
The related concepts of protein dynamics, conformational ensembles and allostery are of-ten difficult to study with molecular dynamics (MD) due to the timescales involved. Wepresent ExProSE (Exploration of Protein Structural Ensembles), a distance geometry-basedmethod that generates an ensemble of protein structures from two input structures. ExProSEprovides a unified framework for the exploration of protein structure and dynamics in a fastand accessible way. Using a dataset of apo/holo pairs it is shown that existing coarse-grainedmethods can often not span large conformational changes. For T4-lysozyme ExProSE is ableto generate ensembles that are more native-like than tCONCOORD and NMSim, and com-parable to targeted MD. By adding additional constraints representing potential modulators,ExProSE can predict allosteric sites. ExProSE ranks an allosteric pocket first or second for 27out of 58 allosteric proteins, which is similar and complementary to existing methods. TheExProSE source code is freely-available.
Sundriyal S, Moniot S, Mahmud Z, et al., 2017, Thienopyrimidinone Based Sirtuin-2 (SIRT2)-Selective Inhibitors Bind in the Ligand Induced Selectivity Pocket, JOURNAL OF MEDICINAL CHEMISTRY, Vol: 60, Pages: 1928-1945, ISSN: 0022-2623
Sirtuins (SIRTs) are NAD-dependent deacylases, known to be involved in a variety of pathophysiological processes and thus remain promising therapeutic targets for further validation. Previously, we reported a novel thienopyrimidinone SIRT2 inhibitor with good potency and excellent selectivity for SIRT2. Herein, we report an extensive SAR study of this chemical series and identify the key pharmacophoric elements and physiochemical properties that underpin the excellent activity observed. New analogues have been identified with submicromolar SIRT2 inhibtory activity and good to excellent SIRT2 subtype-selectivity. Importantly, we report a cocrystal structure of one of our compounds (29c) bound to SIRT2. This reveals our series to induce the formation of a previously reported selectivity pocket but to bind in an inverted fashion to what might be intuitively expected. We believe these findings will contribute significantly to an understanding of the mechanism of action of SIRT2 inhibitors and to the identification of refined, second generation inhibitors.
Ittisoponpisan S, Sternberg MJE, Alhuzimi E, et al., 2017, Landscape of pleiotropic proteins causing human disease: structural and system biology insights, Human Mutation, Vol: 38, Pages: 289-296, ISSN: 1098-1004
Pleiotropyis the phenomenon by which the same gene can result in multiple phenotypes. Pleiotropic proteins are emerging as important contributors to rare and common disorders. Nevertheless, little is known on the mechanisms underlying pleiotropy and the characteristic of pleiotropic proteins.We analysed disease-causing proteins reported in UniProt and observed that 12% are pleiotropic (variants in the same protein cause more than one disease). Pleiotropic proteins were enriched indeleterious and rare variants, but not in common variants. Pleiotropic proteins were more likely to be involved in the pathogenesis of neoplasms, neurological and circulatory diseases, and congenital malformations, whereas non-pleiotropicproteinsin endocrine and metabolic disorders. Pleiotropic proteins were more essential and hada higher number of interacting partners compared to non-pleiotropic proteins. Significantly more pleiotropic than non-pleiotropic proteins contained at least one intrinsically long disordered region (p<0.001). Deleterious variants occurring in structurally disordered regions were more commonly found in pleiotropic, rather than non-pleiotropic proteins. 14In conclusion, pleiotropic proteins are an important contributor to human disease. They represent a biologically different class of proteins compared to non-pleiotropic proteins anda better understanding of their characteristicsand genetic variants, cangreatly aid in the interpretation of genetic studies and drug design.
Ostankovitch MI, Sternberg MJ, 2016, Computation resources for molecular biology: special issue 2017, Journal of Molecular Biology, Vol: 429, Pages: 345-347, ISSN: 1089-8638
Ainsworth D, Sternberg MJE, Raczy C, et al., 2016, k-SLAM: Accurate and ultra-fast taxonomic classification and gene identification for large metagenomic datasets, Nucleic Acids Research, Vol: 45, Pages: 1649-1656, ISSN: 1362-4962
k-SLAM is a highly e cient algorithm for the characterisa-tion of metagenomic data. Unlike other ultra-fast metage-nomic classi ers, full sequence alignment is performed allow-ing for gene identi cation and variant calling in addition toaccurate taxonomic classi cation. Ak-mer based methodprovides greater taxonomic accuracy than other classi ersand a three orders of magnitude speed increase over align-ment based approaches. The use of alignments to nd vari-ants and genes along with their taxonomic origins enablesnovel strains to be characterised. k-SLAM's speed allows afull taxonomic classi cation and gene identi cation to betractable on modern large datasets. A pseudo-assemblymethod is used to increase classi cation accuracy by up to40% for species which have high sequence homology withintheir genus.
Jiang Y, Oron TR, Clark WT, et al., 2016, An expanded evaluation of protein function prediction methods shows an improvement in accuracy, Genome Biology, Vol: 17, ISSN: 1474-760X
BackgroundA major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.ResultsWe conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2.ConclusionsThe top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.
Metherell LA, Guerra-Assunção JA, Sternberg M, et al., 2016, Three-dimensional model of human Nicotinamide Nucleotide Transhydrogenase (NNT) and sequence-structure analysis of its disease-causing variations, Human Mutation, Vol: 37, Pages: 1074-1084, ISSN: 1098-1004
Defective mitochondrial proteins are emerging as major contributors to human disease. Nicotinamide nucleotide transhydrogenase (NNT), a widely expressed mitochondrial protein, has a crucial role in the defence against oxidative stress. NNT variations have recently been reported in patients with familial glucocorticoid deficiency (FGD) and in patients with heart failure. Moreover, knockout animal models suggest that NNT has a major role in diabetes mellitus and obesity. In this study, we used experimental structures of bacterial transhydrogenases to generate a structural model of human NNT (H-NNT). Structure-based analysis allowed the identification of H-NNT residues forming the NAD binding site, the proton canal and the large interaction site on the H-NNT dimer. In addition, we were able to identify key motifs that allow conformational changes adopted by domain III in relation to its functional status, such as the flexible linker between domains II and III and the salt bridge formed by H-NNT Arg882 and Asp830. Moreover, integration of sequence and structure data allowed us to study the structural and functional effect of deleterious amino acid substitutions causing FGD and left ventricular non-compaction cardiomyopathy. In conclusion, interpretation of the function–structure relationship of H-NNT contributes to our understanding of mitochondrial disorders.
Howard SR, Guasti L, Ruiz-Babot G, et al., 2016, IGSF10 mutations dysregulate gonadotropin-releasing hormone neuronal migration resulting in delayed puberty., EMBO Molecular Medicine, Vol: 8, Pages: 626-642, ISSN: 1757-4676
Early or late pubertal onset affects up to 5% of adolescents and is associated with adverse health and psychosocial outcomes. Self-limited delayed puberty (DP) segregates predominantly in an autosomal dominant pattern, but the underlying genetic background is unknown. Using exome and candidate gene sequencing, we have identified rare mutations in IGSF10 in 6 unrelated families, which resulted in intracellular retention with failure in the secretion of mutant proteins. IGSF10 mRNA was strongly expressed in embryonic nasal mesenchyme, during gonadotropin-releasing hormone (GnRH) neuronal migration to the hypothalamus. IGSF10 knockdown caused a reduced migration of immature GnRH neurons in vitro, and perturbed migration and extension of GnRH neurons in a gnrh3:EGFP zebrafish model. Additionally, loss-of-function mutations in IGSF10 were identified in hypothalamic amenorrhea patients. Our evidence strongly suggests that mutations in IGSF10 cause DP in humans, and points to a common genetic basis for conditions of functional hypogonadotropic hypogonadism (HH). While dysregulation of GnRH neuronal migration is known to cause permanent HH, this is the first time that this has been demonstrated as a casual mechanism in DP.
Sternberg MJE, Ostankovitch MI, 2016, Computation Resources for Molecular Biology: A Special Issue, Journal of Molecular Biology, Vol: 428, Pages: 669-670, ISSN: 1089-8638
Mezulis S, Sternberg MJ, Kelley LA, 2015, PhyreStorm: A web server for fast structural searches against the PDB., Journal of Molecular Biology, Vol: 428, Pages: 702-708, ISSN: 1089-8638
The identification of structurally similar proteins can provide a range of biological insights and accordingly the alignment of a query protein to a database of experimentally-determined protein structures is a technique commonly used in the fields of structural and evolutionary biology. The PhyreStorm web server has been designed to provide comprehensive, up-to-date and rapid structural comparisons against the Protein Data Bank (PDB) combined with a rich and intuitive user interface. It is intended that this facility will enable biologists inexpert in bioinformatics access to a powerful tool for exploring protein structure relationships beyond what can be achieved by sequence analysis alone. By partitioning the PDB into similar structures, PhyreStorm is able to quickly discard the majority of structures that cannot possibly align well to a query protein, reducing the number of alignments required by an order of magnitude. PhyreStorm is capable of finding 93±2% of all highly similar (TM-score >0.7) structures in the PDB for each query structure, usually in under 60 seconds. PhyreStorm is available at http://www.sbg.bio.ic.ac.uk/phyrestorm/.
Greener J, Sternberg MJE, 2015, AlloPred: prediction of allosteric pockets on proteins using normal mode perturbation analysis., BMC Bioinformatics, Vol: 16, ISSN: 1471-2105
BackgroundDespite being hugely important in biological processes, allostery is poorly understood and no universal mechanism has been discovered. Allosteric drugs are a largely unexplored prospect with many potential advantages over orthosteric drugs. Computational methods to predict allosteric sites on proteins are needed to aid the discovery of allosteric drugs, as well as to advance our fundamental understanding of allostery.ResultsAlloPred, a novel method to predict allosteric pockets on proteins, was developed. AlloPred uses perturbation of normal modes alongside pocket descriptors in a machine learning approach that ranks the pockets on a protein. AlloPred ranked an allosteric pocket top for 23 out of 40 known allosteric proteins, showing comparable and complementary performance to two existing methods. In 28 of 40 cases an allosteric pocket was ranked first or second. The AlloPred web server, freely available at http://www.sbg.bio.ic.ac.uk/allopred/home, allows visualisation and analysis of predictions. The source code and dataset information are also available from this site.ConclusionsPerturbation of normal modes can enhance our ability to predict allosteric sites on proteins. Computational methods such as AlloPred assist drug discovery efforts by suggesting sites on proteins for further experimental study.
Cornish AJ, Filippis I, David A, et al., 2015, Exploring the cellular basis of human disease through a large-scale mapping of deleterious genes to cell types, Genome Medicine, Vol: 7, ISSN: 1756-994X
David A, Sternberg MJ, 2015, The contribution of missense mutations in core and rim residues of protein-protein interfaces to human disease., Journal of Molecular Biology, Vol: 427, Pages: 2886-2898, ISSN: 1089-8638
Missense mutations at protein-protein interaction (PPIs) sites, called interfaces, are important contributors to human disease. Interfaces are non-uniform surface areas characterized by two main regions, 'core' and 'rim', which differ in terms of evolutionary conservation and physico-chemical properties. Moreover, within interfaces, only a small subset of residues ('hot spots') is crucial for the binding free energy of the protein-protein complex. We performed a large-scale structural analysis of human single amino acid variations (SAVs) and demonstrated that disease-causing mutations are preferentially located within the interface core, as opposed to the rim (p< 0.01). In contrast, the interface rim is significantly enriched in polymorphisms, similar to the remaining non-interacting surface. Energetic hot spots tend to be enriched in disease-causing mutations compared to non-hot spots (p=0.05), regardless of their occurrence in core or rim residues. For individual amino acids, the frequency of substitution into a polymorphism or disease-causing mutation differed to other amino acids and was related to its structural location, as was the type of physico-chemical change introduced by the SAV. In conclusion, this study demonstrated the different distribution and properties of disease-causing SAVs and polymorphisms within different structural regions and in relation to the energetic contribution of amino acid in protein-protein interfaces, thus highlighting the importance of a structural system biology approach for predicting the effect of SAVs.
Kelley LA, Sternberg MJ, 2015, Partial protein domains: evolutionary insights and bioinformatics challenges., Genome Biology, Vol: 16, Pages: 100-100, ISSN: 1474-760X
Protein domains are generally thought to correspond to units of evolution. New research raises questions about how such domains are defined with bioinformatics tools and sheds light on how evolution has enabled partial domains to be viable.
Kelley LA, Mezulis S, Yates CM, et al., 2015, The Phyre2 web portal for protein modeling, prediction and analysis., Nature Protocols, Vol: 10, Pages: 845-858, ISSN: 1754-2189
Phyre2 is a suite of tools available on the web to predict and analyze protein structure, function and mutations. The focus of Phyre2 is to provide biologists with a simple and intuitive interface to state-of-the-art protein bioinformatics tools. Phyre2 replaces Phyre, the original version of the server for which we previously published a paper in Nature Protocols. In this updated protocol, we describe Phyre2, which uses advanced remote homology detection methods to build 3D models, predict ligand binding sites and analyze the effect of amino acid variants (e.g., nonsynonymous SNPs (nsSNPs)) for a user's protein sequence. Users are guided through results by a simple interface at a level of detail they determine. This protocol will guide users from submitting a protein sequence to interpreting the secondary and tertiary structure of their models, their domain composition and model quality. A range of additional available tools is described to find a protein structure in a genome, to submit large number of sequences at once and to automatically run weekly searches for proteins that are difficult to model. The server is available at http://www.sbg.bio.ic.ac.uk/phyre2. A typical structure prediction will be returned between 30 min and 2 h after submission.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.