139 results found
Ebbels T, Segura-Lepe M, Keun H, Predictive modelling using pathway scores: robustness and significance of pathway collections, BMC Bioinformatics, ISSN: 1471-2105
BackgroundTranscriptomic data is often used to build statistical models which are predictive of agiven phenotype, such as disease status. Genes work together in pathways and it iswidely thought that pathway representations will be more robust to noise in the geneexpression levels. We aimed to test this hypothesis by constructing models based oneither genes alone, or based on sample specific scores for each pathway, thustransforming the data to a ‘pathway space’. We progressively degraded the raw databy addition of noise and examined the ability of the models to maintain predictivity.ResultsModels in the pathway space indeed had higher predictive robustness than models inthe gene space. This result was independent of the workflow, parameters, classifierand data set used. Surprisingly, randomised pathway mappings produced models ofsimilar accuracy and robustness to true mappings, suggesting that the success ofpathway space models is not conferred by the specific definitions of the pathway.Instead, predictive models built on the true pathway mappings led to prediction ruleswith fewer influential pathways than those built on randomised pathways. The extent ofthis effect was used to differentiate pathway collections coming from a variety of widelyused pathway databases.ConclusionsPrediction models based on pathway scores are more robust to degradation of geneexpression information than the equivalent models based on ungrouped genes. Whilemodels based on true pathway scores are not more robust or accurate than thosebased on randomised pathways, true pathways produced simpler prediction rules,emphasizing a smaller number of pathways.
Afonso C, Barrow MP, Davies AN, et al., 2019, Data mining and visualisation: general discussion., Faraday Discuss, Vol: 218, Pages: 354-371
Viant MR, Ebbels TMD, Beger RD, et al., 2019, Use cases, best practice and reporting standards for metabolomics in regulatory toxicology, Nature Communications, Vol: 10, ISSN: 2041-1723
Metabolomics is a widely used technology in academic research, yet its application to regulatory science has been limited. The most commonly cited barrier to its translation is lack of performance and reporting standards. The MEtabolomics standaRds Initiative in Toxicology (MERIT) project brings together international experts from multiple sectors to address this need. Here, we identify the most relevant applications for metabolomics in regulatory toxicology and develop best practice guidelines, performance and reporting standards for acquiring and analysing untargeted metabolomics and targeted metabolite data. We recommend that these guidelines are evaluated and implemented for several regulatory use cases.
Tzoulaki I, Karaman I, Dehghan A, et al., Serum metabolic signatures of coronary and carotid atherosclerosis and subsequent cardiovascular disease, European Heart Journal, ISSN: 1522-9645
Aims: To characterise serum metabolic signatures associated with atherosclerosis in the coronary or carotid arteries and subsequently their association with incident cardiovascular disease (CVD). Methods and Results: We used untargeted one-dimensional (1D) serum metabolic profiling by proton (1H) nuclear magnetic resonance (NMR) spectroscopy among 3,867 participants from the Multi-Ethnic Study of Atherosclerosis (MESA), with replication among 3,569 participants from the Rotterdam and LOLIPOP Studies. Atherosclerosis was assessed by coronary artery calcium (CAC) and carotid intima-media thickness (IMT). We used multivariable linear regression to evaluate associations between NMR features and atherosclerosis accounting for multiplicity of comparisons. We then examined associations between metabolites associated with atherosclerosis and incident CVD available in MESA and Rotterdam and explored molecular networks through bioinformatics analyses. Overall, 30 NMR measured metabolites were associated with CAC and/or IMT, P =1.3x10-14 to 6.5x10-6 (discovery), P =4.2x10-14 to 4.4x10-2 (replication). These associations were substantially attenuated after adjustment for conventional cardiovascular risk factors. Metabolites associated with atherosclerosis revealed disturbances in lipid and carbohydrate metabolism, branched-chain and aromatic amino acid metabolism, as well as oxidative stress and inflammatory pathways. Analyses of incident CVD events showed inverse associations with creatine, creatinine and phenylalanine, and direct associations with mannose, acetaminophen-glucuronide and lactate as well as apolipoprotein B (P <0.05). Conclusion: Metabolites associated with atherosclerosis were largely consistent between the two vascular beds (coronary and carotid arteries) and predominantly tag pathways that overlap with the known cardiovascular risk factors. We present an integrated systems network that highlights a series of inter-connected pathways underlying atherosclero
Gao Q, Dragsted LO, Ebbels T, 2019, Comparison of Bi- and Tri-Linear PLS models for variable selection in metabolomic time-series experiments, Metabolites, Vol: 9, ISSN: 2218-1989
Metabolomic studies with a time-series design are widely used for discovery and validation of biomarkers. In such studies, changes of metabolic profiles over time under different conditions (e.g., control and intervention) are compared, and metabolites responding differently between the conditions are identified as putative biomarkers. To incorporate time-series information into the variable (biomarker) selection in partial least squares regression (PLS) models, we created PLS models with different combinations of bilinear/trilinear X and group/time response dummy Y. In total, five PLS models were evaluated on two real datasets, and also on simulated datasets with varying characteristics (number of subjects, number of variables, inter-individual variability, intra-individual variability and number of time points). Variables showing specific temporal patterns observed visually and determined statistically were labelled as discriminating variables. Bootstrapped-VIP scores were calculated for variable selection and the variable selection performance of five PLS models were assessed based on their capacity to correctly select the discriminating variables. The results showed that the bilinear PLS model with group × time response as dummy Y provided the highest recall (true positive rate) of 83–95% with high precision, independent of most characteristics of the datasets. Trilinear PLS models tend to select a small number of variables with high precision but relatively high false negative rate (lower power). They are also less affected by the noise compared to bilinear PLS models. In datasets with high inter-individual variability, bilinear PLS models tend to provide higher recall while trilinear models tend to provide higher precision. Overall, we recommend bilinear PLS with group x time response Y for variable selection applications in metabolomics intervention time series studies.
Peters K, Bradbury J, Bergmann S, et al., 2019, PhenoMeNal: Processing and analysis of metabolomics data in the cloud, GigaScience, Vol: 8, ISSN: 2047-217X
Background: Metabolomics is the comprehensive study of a multitude of small molecules to gain insight into an organism's metabolism. The research field is dynamic and expanding with applications across biomedical, biotechnological and many other applied biological domains. Its computationally-intensive nature has driven requirements for open data formats, data repositories and data analysis tools. However, the rapid progress has resulted in a mosaic of independent-and sometimes incompatible-analysis methods that are difficult to connect into a useful and complete data analysis solution. Findings: PhenoMeNal (Phenome and Metabolome aNalysis) is an advanced and complete solution to set up Infrastructure-as-a-Service (IaaS) that brings workflow-oriented, interoperable metabolomics data analysis platforms into the cloud. PhenoMeNal seamlessly integrates a wide array of existing open source tools which are tested and packaged as Docker containers through the project's continuous integration process and deployed based on a kubernetes orchestration framework. It also provides a number of standardized, automated and published analysis workflows in the user interfaces Galaxy, Jupyter, Luigi and Pachyderm. Conclusions: PhenoMeNal constitutes a keystone solution in cloud e-infrastructures available for metabolomics. PhenoMeNal is a unique and complete solution for setting up cloud e-infrastructures through easy-to-use web interfaces that can be scaled to any custom public and private cloud environment. By harmonizing and automating software installation and configuration and through ready-to-use scientific workflow user interfaces, PhenoMeNal has succeeded in providing scientists with workflow-driven, reproducible and shareable metabolomics data analysis platforms which are interfaced through standard data formats, representative datasets, versioned, and have been tested for reproducibility and interoperability. The elastic implementation of PhenoMeNal further allows easy adap
Chan Q, Lau C-HE, Gibson R, et al., 2019, Relationships of Dietary and Supplement Magnesium Intake and Its Urinary Metabolomic Biomarkers With Blood Pressure: The INTERMAP Study, Scientific Sessions of the American-Heart-Association on Epidemiology and Prevention/Lifestyle and Cardiometabolic Health, Publisher: LIPPINCOTT WILLIAMS & WILKINS, ISSN: 0009-7322
Gibson R, Lau C-HE, Chan Q, et al., 2019, Cross-Sectional Investigation of the Relationship Between Fish Consumption and Its Urinary Biomarkers With Blood Pressure Across Asian and Western Populations: Results From the INTERMAP Study, Scientific Sessions of the American-Heart-Association on Epidemiology and Prevention/Lifestyle and Cardiometabolic Health, Publisher: LIPPINCOTT WILLIAMS & WILKINS, ISSN: 0009-7322
Gibson R, Lau C, Loo RL, et al., The association of fish consumption and its urinary metabolites with cardiovascular risk factors: The International Study of Macro-/Micronutrients and Blood Pressure (INTERMAP), American Journal of Clinical Nutrition, ISSN: 0002-9165
Ebbels TMD, Karaman I, Graça G, 2019, Processing and Analysis of Untargeted Multicohort NMR Data., Pages: 453-470
NMR data from large studies combining multiple cohorts is becoming common in large-scale metabolomics. The data size and combination of cohorts with diverse properties leads to special problems for data processing and analysis. These include alignment, normalization, detection and removal of outliers, presence of strong correlations, and the identification of unknowns. Nonetheless, these challenges can be addressed with suitable algorithms and techniques, leading to enhanced data sets ripe for further data mining.
Peluso A, Ebbels T, Glen R, 2018, Empirical estimation of permutation-based metabolome-wide significance thresholds, Publisher: bioRxiv
A key issue in the omics literature is the search of statistically significant relationships between molecular markers and phenotype. The aim is to detect disease-related discriminatory features while controlling for false positive associations at adequate power. Metabolome-wide association studies have revealed significant relationships of metabolic phenotypes with disease risk by analysing hundreds to tens of thousands of molecular variables leading to multivariate data which are highly noisy and collinear. In this context, Bonferroni or Sidak correction are rather useful as these are valid for independent tests, while permutation procedures allow for the estimation of p-values from the null distribution without assuming independence among features. Nevertheless, under the permutation approach the distribution of p-values may presents systematic deviations from the theoretical null distribution which leads to biased adjusted threshold estimate, e.g. smaller than a Bonferroni or Sidak correction. We make use of parametric approximation methods based on a multivariate Normal distribution to derive stable estimates of the metabolome-wide significance level within a univariate approach based on a permutation procedure which effectively controls the maximum overall type I error rate at the α level. We illustrate the results for different model parametrizations and distributional features of the outcome measure, as well as for diverse correlation levels within the features and between the features and the phenotype in real data and simulated studies. MWSL is the open-source R software package for the empirical estimation of the metabolomic-wide significance level available at https://github.com/AlinaPeluso/MWSL.
Kamp H, Beger R, Dorne J-LCM, et al., 2018, MEtabolomics standaRds Initiative in Toxicology (MERIT), 54th Congress of the European-Societies-of-Toxicology (EUROTOX) - Toxicology Out of the Box, Publisher: ELSEVIER IRELAND LTD, Pages: S214-S214, ISSN: 0378-4274
Ye L, De Iorio M, Ebbels TMD, 2018, Bayesian estimation of the number of protonation sites for urinary metabolites from NMR spectroscopic data, Metabolomics, Vol: 14, ISSN: 1573-3882
IntroductionTo aid the development of better algorithms for 1H NMR data analysis, such as alignment or peak-fitting, it is important to characterise and model chemical shift changes caused by variation in pH. The number of protonation sites, a key parameter in the theoretical relationship between pH and chemical shift, is traditionally estimated from the molecular structure, which is often unknown in untargeted metabolomics applications.ObjectiveWe aim to use observed NMR chemical shift titration data to estimate the number of protonation sites for a range of urinary metabolites.MethodsA pool of urine from healthy subjects was titrated in the range pH 2–12, standard 1H NMR spectra were acquired and positions of 51 peaks (corresponding to 32 identified metabolites) were recorded. A theoretical model of chemical shift was fit to the data using a Bayesian statistical framework, using model selection procedures in a Markov Chain Monte Carlo algorithm to estimate the number of protonation sites for each molecule.ResultsThe estimated number of protonation sites was found to be correct for 41 out of 51 peaks. In some cases, the number of sites was incorrectly estimated, due to very close pKa values or a limited amount of data in the required pH range.ConclusionsGiven appropriate data, it is possible to estimate the number of protonation sites for many metabolites typically observed in 1H NMR metabolomics without knowledge of the molecular structure. This approach may be a valuable resource for the development of future automated metabolite alignment, annotation and peak fitting algorithms.
Kaluarachchi M, Boulangé C, Karaman I, et al., 2018, A comparison of human serum and plasma metabolites using untargeted 1H NMR spectroscopy and UPLC-MS, Metabolomics, Vol: 14, ISSN: 1573-3882
Introduction:Differences in the metabolite profiles between serum and plasma are incompletely understood.Objectives:To evaluate metabolic profile differences between serum and plasma and among plasma sample subtypes.Methods:We analyzed serum, platelet rich plasma (PRP), platelet poor plasma (PPP), and platelet free plasma (PFP), collected from 8 non-fasting apparently healthy women, using untargeted standard 1D and CPMG 1H NMR and reverse phase and hydrophilic (HILIC) UPLC-MS. Differences between metabolic profiles were evaluated using validated principal component and orthogonal partial least squares discriminant analysis.ResultsExplorative analysis showed the main source of variation among samples was due to inter-individual differences with no grouping by sample type. After correcting for inter-individual differences, lipoproteins, lipids in VLDL/LDL, lactate, glutamine, and glucose were found to discriminate serum from plasma in NMR analyses. In UPLC-MS analyses, lysophosphatidylethanolamine (lysoPE)(18:0) and lysophosphatidic acid(20:0) were higher in serum, and phosphatidylcholines (PC)(16:1/18:2, 20:3/18:0, O-20:0/22:4), lysoPC(16:0), PE(O-18:2/20:4), sphingomyelin(18:0/22:0), and linoleic acid were lower. In plasma subtype analyses, isoleucine, leucine, valine, phenylalanine, glutamate, and pyruvate were higher among PRP samples compared with PPP and PFP by NMR while lipids in VLDL/LDL, citrate, and glutamine were lower. By UPLC-MS, PE(18:0/18:2) and PC(P-16:0/20:4) were higher in PRP compared with PFP samples.Conclusions:Correction for inter-individual variation was required to detect metabolite differences between serum and plasma. Our results suggest the potential importance of inter-individual effects and sample type on the results from serum and plasma metabolic phenotyping studies.
Posma JM, Garcia Perez I, Ebbels TMD, et al., 2018, Optimized phenotypic biomarker discovery and confounder elimination via covariate-adjusted projection to latent structures from metabolic spectroscopy data, Journal of Proteome Research, Vol: 17, Pages: 1586-1595, ISSN: 1535-3893
Metabolism is altered by genetics, diet, disease status, environment and many other factors. Modelling either one of these is often done without considering the effects of the other covariates. Attributing differences in metabolic profile to one of these factors needs to be done while controlling for the metabolic influence of the rest. We describe here a data analysis framework and novel confounder-adjustment algorithm for multivariate analysis of metabolic profiling data. Using simulated data we show that similar numbers of true associations and significantly less false positives are found compared to other commonly used methods. Covariate-Adjusted Projections to Latent Structures (CA-PLS) is exemplified here using a large-scale metabolic phenotyping study of two Chinese populations at different risks for cardiovascular disease. Using CA-PLS we find that some previously reported differences are actually associated with external factors and discover a number of previously unreported biomarkers linked to different metabolic pathways. CA-PLS can be applied to any multivariate data where confounding may be an issue and the confounder-adjustment procedure is translatable to other multivariate regression techniques.
Harada S, Hirayama A, Chan Q, et al., 2018, Reliability of plasma polar metabolite concentrations in a large-scale cohort study using capillary electrophoresis-mass spectrometry., PLoS ONE, Vol: 13, ISSN: 1932-6203
BACKGROUND: Cohort studies with metabolomics data are becoming more widespread, however, large-scale studies involving 10,000s of participants are still limited, especially in Asian populations. Therefore, we started the Tsuruoka Metabolomics Cohort Study enrolling 11,002 community-dwelling adults in Japan, and using capillary electrophoresis-mass spectrometry (CE-MS) and liquid chromatography-mass spectrometry. The CE-MS method is highly amenable to absolute quantification of polar metabolites, however, its reliability for large-scale measurement is unclear. The aim of this study is to examine reproducibility and validity of large-scale CE-MS measurements. In addition, the study presents absolute concentrations of polar metabolites in human plasma, which can be used in future as reference ranges in a Japanese population. METHODS: Metabolomic profiling of 8,413 fasting plasma samples were completed using CE-MS, and 94 polar metabolites were structurally identified and quantified. Quality control (QC) samples were injected every ten samples and assessed throughout the analysis. Inter- and intra-batch coefficients of variation of QC and participant samples, and technical intraclass correlation coefficients were estimated. Passing-Bablok regression of plasma concentrations by CE-MS on serum concentrations by standard clinical chemistry assays was conducted for creatinine and uric acid. RESULTS AND CONCLUSIONS: In QC samples, coefficient of variation was less than 20% for 64 metabolites, and less than 30% for 80 metabolites out of the 94 metabolites. Inter-batch coefficient of variation was less than 20% for 81 metabolites. Estimated technical intraclass correlation coefficient was above 0.75 for 67 metabolites. The slope of Passing-Bablok regression was estimated as 0.97 (95% confidence interval: 0.95, 0.98) for creatinine and 0.95 (0.92, 0.96) for uric acid. Compared to published data from other large cohort measurement platforms, reproducibility of metabolites common
Ebbels TMD, Rodriguez-Martinez A, Dumas M-E, et al., 2018, Advances in Computational Analysis of Metabolomic NMR Data, NMR-based Metabolomics
Tan LSL, Jasra A, De Iorio M, et al., 2017, Bayesian inference for multiple Gaussian graphical models with application to metabolic association networks, Annals of Applied Statistics, Vol: 11, Pages: 2222-2251, ISSN: 1932-6157
We investigate the effect of cadmium (a toxic environmental pollutant) on the correlation structure of a number of urinary metabolites using Gaussian graphical models (GGMs). The inferred metabolic associations can provide important information on the physiological state of a metabolic system and insights on complex metabolic relationships. Using the fitted GGMs, we construct differential networks, which highlight significant changes in metabolite interactions under different experimental conditions. The analysis of such metabolic association networks can reveal differences in the underlying biological reactions caused by cadmium exposure. We consider Bayesian inference and propose using the multiplicative (or Chung–Lu random graph) model as a prior on the graphical space. In the multiplicative model, each edge is chosen independently with probability equal to the product of the connectivities of the end nodes. This class of prior is parsimonious yet highly flexible; it can be used to encourage sparsity or graphs with a pre-specified degree distribution when such prior knowledge is available. We extend the multiplicative model to multiple GGMs linking the probability of edge inclusion through logistic regression and demonstrate how this leads to joint inference for multiple GGMs. A sequential Monte Carlo (SMC) algorithm is developed for estimating the posterior distribution of the graphs.
Metabolomics, the youngest of the major omics technologies, is supported by an active community of researchers and infrastructure developers across Europe. To coordinate and focus efforts around infrastructure building for metabolomics within Europe, a workshop on the "Future of metabolomics in ELIXIR" was organised at Frankfurt Airport in Germany. This one-day strategic workshop involved representatives of ELIXIR Nodes, members of the PhenoMeNal consortium developing an e-infrastructure that supports workflow-based metabolomics analysis pipelines, and experts from the international metabolomics community. The workshop established metabolite identification as the critical area, where a maximal impact of computational metabolomics and data management on other fields could be achieved. In particular, the existing four ELIXIR Use Cases, where the metabolomics community - both industry and academia - would benefit most, and which could be exhaustively mapped onto the current five ELIXIR Platforms were discussed. This opinion article is a call for support for a new ELIXIR metabolomics Use Case, which aligns with and complements the existing and planned ELIXIR Platforms and Use Cases.
Schober D, Jacob D, Wilson M, et al., 2017, nmrML: A Community Supported Open Data Standard for the Description, Storage, and Exchange of NMR Data., Analytical Chemistry, Vol: 90, Pages: 649-656, ISSN: 0003-2700
NMR is a widely used analytical technique with a growing number of repositories available. As a result, demands for a vendor-agnostic, open data format for long-term archiving of NMR data have emerged with the aim to ease and encourage sharing, comparison, and reuse of NMR data. Here we present nmrML, an open XML-based exchange and storage format for NMR spectral data. The nmrML format is intended to be fully compatible with existing NMR data for chemical, biochemical, and metabolomics experiments. nmrML can capture raw NMR data, spectral data acquisition parameters, and where available spectral metadata, such as chemical structures associated with spectral assignments. The nmrML format is compatible with pure-compound NMR data for reference spectral libraries as well as NMR data from complex biomixtures, i.e., metabolomics experiments. To facilitate format conversions, we provide nmrML converters for Bruker, JEOL and Agilent/Varian vendor formats. In addition, easy-to-use Web-based spectral viewing, processing, and spectral assignment tools that read and write nmrML have been developed. Software libraries and Web services for data validation are available for tool developers and end-users. The nmrML format has already been adopted for capturing and disseminating NMR data for small molecules by several open source data processing tools and metabolomics reference spectral libraries, e.g., serving as storage format for the MetaboLights data repository. The nmrML open access data standard has been endorsed by the Metabolomics Standards Initiative (MSI), and we here encourage user participation and feedback to increase usability and make it a successful standard.
Kauffmann H-M, Kamp H, Fuchs R, et al., 2017, Framework for the quality assurance of 'omics technologies considering GLP requirements, Regulatory Toxicology and Pharmacology, Vol: 91, Pages: S27-S35, ISSN: 0273-2300
‘Omics technologies are gaining importance to support regulatory toxicity studies. Prerequisites for performing ‘omics studies considering GLP principles were discussed at the European Centre for Ecotoxicology and Toxicology of Chemicals (ECETOC) Workshop Applying ‘omics technologies in Chemical Risk Assessment. A GLP environment comprises a standard operating procedure system, proper pre-planning and documentation, and inspections of independent quality assurance staff. To prevent uncontrolled data changes, the raw data obtained in the respective ‘omics data recording systems have to be specifically defined. Further requirements include transparent and reproducible data processing steps, and safe data storage and archiving procedures. The software for data recording and processing should be validated, and data changes should be traceable or disabled. GLP-compliant quality assurance of ‘omics technologies appears feasible for many GLP requirements. However, challenges include (i) defining, storing, and archiving the raw data; (ii) transparent descriptions of data processing steps; (iii) software validation; and (iv) ensuring complete reproducibility of final results with respect to raw data. Nevertheless, ‘omics studies can be supported by quality measures (e.g., GLP principles) to ensure quality control, reproducibility and traceability of experiments. This enables regulators to use ‘omics data in a fit-for-purpose context, which enhances their applicability for risk assessment.
Buesen R, Chorley BN, Lima BDS, et al., 2017, Applying 'omics technologies in chemicals risk assessment: Report of an ECETOC workshop, Regulatory Toxicology and Pharmacology, Vol: 91, Pages: S3-S13, ISSN: 0273-2300
Prevailing knowledge gaps in linking specific molecular changes to apical outcomes and methodological uncertainties in the generation, storage, processing, and interpretation of 'omics data limit the application of 'omics technologies in regulatory toxicology. Against this background, the European Centre for Ecotoxicology and Toxicology of Chemicals (ECETOC) convened a workshop Applying 'omics technologies in chemicals risk assessment that is reported herein. Ahead of the workshop, multi-expert teams drafted frameworks on best practices for (i) a Good-Laboratory Practice-like context for collecting, storing and curating 'omics data; (ii) the processing of 'omics data; and (iii) weight-of-evidence approaches for integrating 'omics data. The workshop participants confirmed the relevance of these Frameworks to facilitate the regulatory applicability and use of 'omics data, and the workshop discussions provided input for their further elaboration. Additionally, the key objective (iv) to establish approaches to connect 'omics perturbations to phenotypic alterations was addressed. Generally, it was considered promising to strive to link gene expression changes and pathway perturbations to the phenotype by mapping them to specific adverse outcome pathways. While further work is necessary before gene expression changes can be used to establish safe levels of substance exposure, the ECETOC workshop provided important incentives towards achieving this goal.
castagne R, Boulange CL, Karaman I, et al., 2017, Improving visualisation and interpretation of metabolome-wide association studies (MWAS): an application in a population-based cohort using untargeted 1H NMR metabolic profiling., Journal of Proteome Research, Vol: 16, Pages: 3623-3633, ISSN: 1535-3893
1H NMR spectroscopy of biofluids generates reproducible data allowing detection and quantification of small molecules in large population cohorts. Statistical models to analyze such data are now well-established, and the use of univariate metabolome wide association studies (MWAS) investigating the spectral features separately has emerged as a computationally efficient and interpretable alternative to multivariate models. The MWAS rely on the accurate estimation of a metabolome wide significance level (MWSL) to be applied to control the family wise error rate. Subsequent interpretation requires efficient visualization and formal feature annotation, which, in-turn, call for efficient prioritization of spectral variables of interest. Using human serum 1H NMR spectroscopic profiles from 3948 participants from the Multi-Ethnic Study of Atherosclerosis (MESA), we have performed a series of MWAS for serum levels of glucose. We first propose an extension of the conventional MWSL that yields stable estimates of the MWSL across the different model parameterizations and distributional features of the outcome. We propose both efficient visualization methods and a strategy based on subsampling and internal validation to prioritize the associations. Our work proposes and illustrates practical and scalable solutions to facilitate the implementation of the MWAS approach and improve interpretation in large cohort studies.
Weber RJM, Lawson TN, Salek RM, et al., 2016, Computational tools and workflows in metabolomics: An international survey highlights the opportunity for harmonisation through Galaxy, Metabolomics, Vol: 13, ISSN: 1573-3890
Chan Q, Loo RL, Ebbels TMD, et al., 2016, Metabolic phenotyping for discovery of urinary biomarkers of diet, xenobiotics and blood pressure in the INTERMAP Study: An overview, Hypertension Research, Vol: 40, Pages: 336-345, ISSN: 1348-4214
The aetiopathogenesis of cardiovascular diseases (CVD) is multifactorial. Adverse bloodpressure (BP) is a major independent risk factor for epidemic CVD affecting about 40% of theadult population worldwide and resulting in significant morbidity and mortality. Metabolicphenotyping of biological fluids has proven its application in characterising low moleculeweight metabolites providing novel insights into gene-environmental-gut microbiomeinteraction in relations to a disease state. In this review, we synthesise key results from theInternational Study of Macro/Micronutrients and Blood Pressure (INTERMAP) Study, a cross-sectional epidemiological study of 4,680 men and women aged 40-59 years from Japan, thePeople’s Republic of China, the United Kingdom, and the United States. We describe theadvancements we have made on: 1) analytical techniques for high throughput metabolicphenotyping; 2) statistical analyses for biomarker identification; 3) discovery of unique food-specific biomarkers; and 4) application of metabolome-wide association (MWA) studies togain a better understanding into the molecular mechanisms of cross cultural and regional BPdifferences.
Oude Griep LM, Chekmeneva E, Stamler J, et al., 2016, Urinary hippurate and proline betaine relative to fruit intake, blood pressure, and body mass index, Summer meeting 2016: New technology in nutrition research and practice, Publisher: Cambridge University Press (CUP), Pages: E178-E178, ISSN: 0029-6651
Karaman I, Ferreira DL, Boulange CL, et al., 2016, A workflow for integrated processing of multi-cohort untargeted 1H NMR metabolomics data in large scale metabolic epidemiology, Journal of Proteome Research, Vol: 15, Pages: 4188-4194, ISSN: 1535-3907
Large-scale metabolomics studies involving thousands of samples present multiple challenges in data analysis, particularly when an untargeted platform is used. Studies with multiple cohorts and analysis platforms exacerbate existing problems such as peak alignment and normalization. Therefore, there is a need for robust processing pipelines which can ensure reliable data for statistical analysis. The COMBI-BIO project incorporates serum from approximately 8000 individuals, in 3 cohorts, profiled by 6 assays in 2 phases using both 1H-NMR and UPLC-MS. Here we present the COMBI-BIO NMR analysis pipeline and demonstrate its fitness for purpose using representative quality control (QC) samples. NMR spectra were first aligned and normalized. After eliminating interfering signals, outliers identified using Hotelling’s T2 were removed and a cohort/phase adjustment was applied, resulting in two NMR datasets (CPMG and NOESY). Alignment of the NMR data was shown to increase the correlation-based alignment quality measure from 0.319 to 0.391 for CPMG and from 0.536 to 0.586 for NOESY, showing that the improvement was present across both large and small peaks. End-to-end quality assessment of the pipeline was achieved using Hotelling’s T2 distributions. For CPMG spectra, the interquartile range decreased from 1.425 in raw QC data to 0.679 in processed spectra, while the corresponding change for NOESY spectra was from 0.795 to 0.636 indicating an improvement in precision following processing. PCA indicated that gross phase and cohort differences were no longer present. These results illustrate that the pipeline produces robust and reproducible data, successfully addressing the methodological challenges of this large multi-faceted study.
Tredwell GD, Bundy JG, De lorio M, et al., 2016, Modelling the acid/base 1H NMR chemical shift limits of metabolites in human urine, Metabolomics, Vol: 12, ISSN: 1573-3890
IntroductionDespite the use of buffering agents the 1H NMR spectra of biofluid samples in metabolic profiling investigations typically suffer from extensive peak frequency shifting between spectra. These chemical shift changes are mainly due to differences in pH and divalent metal ion concentrations between the samples. This frequency shifting results in a correspondence problem: it can be hard to register the same peak as belonging to the same molecule across multiple samples. The problem is especially acute for urine, which can have a wide range of ionic concentrations between different samples.ObjectivesTo investigate the acid, base and metal ion dependent 1H NMR chemical shift variations and limits of the main metabolites in a complex biological mixture.MethodsUrine samples from five different individuals were collected and pooled, and pre-treated with Chelex-100 ion exchange resin. Urine samples were either treated with either HCl or NaOH, or were supplemented with various concentrations of CaCl2, MgCl2, NaCl or KCl, and their 1H NMR spectra were acquired.ResultsNonlinear fitting was used to derive acid dissociation constants and acid and base chemical shift limits for peaks from 33 identified metabolites. Peak pH titration curves for a further 65 unidentified peaks were also obtained for future reference. Furthermore, the peak variations induced by the main metal ions present in urine, Na+, K+, Ca2+ and Mg2+, were also measured.ConclusionThese data will be a valuable resource for 1H NMR metabolite profiling experiments and for the development of automated metabolite alignment and identification algorithms for 1H NMR spectra.
Blaise B, Correia G, Tin A, et al., 2016, A novel method for power analysis and sample size determination in metabolic phenotyping, Analytical Chemistry, Vol: 88, Pages: 5179-5188, ISSN: 1520-6882
Estimation of statistical power and sample size is a key aspect of experimental design. However, in metabolic phenotyping, there is currently no accepted approach for these tasks, in large part due to the unknown nature of the expected effect. In such hypothesis free science, neither the number or class of important analytes nor the effect size are known a priori. We introduce a new approach, based on multivariate simulation, which deals effectively with the highly correlated structure and high-dimensionality of metabolic phenotyping data. First, a large data set is simulated based on the characteristics of a pilot study investigating a given biomedical issue. An effect of a given size, corresponding either to a discrete (classification) or continuous (regression) outcome is then added. Different sample sizes are modeled by randomly selecting data sets of various sizes from the simulated data. We investigate different methods for effect detection, including univariate and multivariate techniques. Our framework allows us to investigate the complex relationship between sample size, power, and effect size for real multivariate data sets. For instance, we demonstrate for an example pilot data set that certain features achieve a power of 0.8 for a sample size of 20 samples or that a cross-validated predictivity QY2 of 0.8 is reached with an effect size of 0.2 and 200 samples. We exemplify the approach for both nuclear magnetic resonance and liquid chromatography–mass spectrometry data from humans and the model organism C. elegans.
David R, Ebbels T, Gooderham N, 2016, Synergistic and Antagonistic Mutation Responses of Human MCL-5 Cells to Mixtures of Benzo[a]pyrene and 2-Amino-1-Methyl-6-Phenylimidazo[4,5-b]pyridine: Dose-Related Variation in the Joint Effects of Common Dietary Carcinogens., Environmental Health Perspectives, Vol: 124, Pages: 88-96, ISSN: 1552-9924
BACKGROUND: Chemical carcinogens such as benzo[a]pyrene (BaP) and 2-amino-1-methyl-6-phenylimidazo[4,5-b]pyridine (PhIP) may contribute to the etiology of human diet-associated cancer. Individually, these are genotoxic, but the consequences of exposure to mixtures of these chemicals have not been systematically examined. OBJECTIVES: To determine the mutagenic response to mixtures of BaP and PhIP at concentrations relevant to human exposure (mM to sub-nM). METHODS: Human MCL-5 cells (metabolically competent) were exposed to BaP or PhIP individually or in mixtures. Mutagenicity was assessed at the thymidine kinase (TK) locus, CYP1A activity and message determined by Ethoxyresorufin-O-deethylase (EROD) activity and Q-PCR respectively, and cell cycle measured by flow cytometry. RESULTS: Mixtures gave modified dose-responses compared to the individual chemicals; a remarkable increased mutant frequency (MF) at low concentration combinations (not mutagenic individually), and decreased MF at higher concentration combinations, compared to the calculated predicted additive MF of the individual chemicals. EROD activity and CYP1A1 mRNA levels correlated with TK MF supporting involvement of the CYP1A family in mutation. Moreover, a cell cycle G2/M phase block was observed at high dose combinations, consistent with DNA damage sensing and repair. CONCLUSIONS: Mixtures of these genotoxic chemicals produced mutation responses that differed from expectations for additive effects of the individual chemicals. The increase in MF for some combinations of chemicals at low concentrations that were not genotoxic for the individual chemicals, and the non-monotonic dose response, may be important for understanding the mutagenic potential of food and the etiology of diet-associated cancers.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.