146 results found
Iacovacci J, Peluso A, Ebbels T, et al., 2020, Extraction and Integration of Genetic Networks from Short-Profile Omic Data Sets., Metabolites, Vol: 10, ISSN: 2218-1989
Mass spectrometry technologies are widely used in the fields of ionomics and metabolomics to simultaneously profile the intracellular concentrations of, e.g., amino acids or elements in genome-wide mutant libraries. These molecular or sub-molecular features are generally non-Gaussian and their covariance reveals patterns of correlations that reflect the system nature of the cell biochemistry and biology. Here, we introduce two similarity measures, the Mahalanobis cosine and the hybrid Mahalanobis cosine, that enforce information from the empirical covariance matrix of omics data from high-throughput screening and that can be used to quantify similarities between the profiled features of different mutants. We evaluate the performance of these similarity measures in the task of inferring and integrating genetic networks from short-profile ionomics/metabolomics data through an analysis of experimental data sets related to the ionome and the metabolome of the model organism S. cerevisiae. The study of the resulting ionome-metabolome Saccharomyces cerevisiae multilayer genetic network, which encodes multiple omic-specific levels of correlations between genes, shows that the proposed measures can provide an alternative description of relations between biological processes when compared to the commonly used Pearson's correlation coefficient and have the potential to guide the construction of novel hypotheses on the function of uncharacterised genes.
Iacovacci J, Peluso A, Ebbels T, et al., 2020, Extraction and Integration of Genetic Networks from Short-profile Omic Datasets, Metabolites, ISSN: 2218-1989
Wu C-T, Wang Y, Wang Y, et al., 2020, Targeted realignment of LC-MS profiles by neighbor-wise compound-specific graphical time warping with misalignment detection, BIOINFORMATICS, Vol: 36, Pages: 2862-2871, ISSN: 1367-4803
Jendoubi Bedhiafi T, Ebbels T, 2020, Integrative analysis of time course metabolic data and biomarker discovery, BMC Bioinformatics, Vol: 21, ISSN: 1471-2105
BackgroundMetabolomics time-course experiments provide the opportunity to understand the changes to an organism by observing the evolution of metabolic profiles in response to internal or external stimuli. Along with other omic longitudinal profiling technologies, these techniques have great potential to uncover complex relations between variations across diverse omic variables and provide unique insights into the underlying biology of the system. However, many statistical methods currently used to analyse short time-series omic data are i) prone to overfitting, ii) do not fully take into account the experimental design or iii) do not make full use of the multivariate information intrinsic to the data or iv) are unable to uncover multiple associations between different omic data. The model we propose is an attempt to i) overcome overfitting by using a weakly informative Bayesian model, ii) capture experimental design conditions through a mixed-effects model, iii) model interdependencies between variables by augmenting the mixed-effects model with a conditional auto-regressive (CAR) component and iv) identify potential associations between heterogeneous omic variables by using a horseshoe prior.ResultsWe assess the performance of our model on synthetic and real datasets and show that it can outperform comparable models for metabolomic longitudinal data analysis. In addition, our proposed method provides the analyst with new insights on the data as it is able to identify metabolic biomarkers related to treatment, infer perturbed pathways as a result of treatment and find significant associations with additional omic variables. We also show through simulation that our model is fairly robust against inaccuracies in metabolite assignments. On real data, we demonstrate that the number of profiled metabolites slightly affects the predictive ability of the model.ConclusionsOur single model approach to longitudinal analysis of metabolomics data provides an approach simultane
Gibson R, Lau C, Loo RL, et al., 2019, The association of fish consumption and its urinary metabolites with cardiovascular risk factors: The International Study of Macro-/Micronutrients and Blood Pressure (INTERMAP), American Journal of Clinical Nutrition, Vol: 111, Pages: 280-290, ISSN: 0002-9165
BackgroundResults from observational studies regarding associations between fish (including shellfish) intake and cardiovascular disease risk factors, including blood pressure (BP) and BMI, are inconsistent.ObjectiveTo investigate associations of fish consumption and associated urinary metabolites with BP and BMI in free-living populations.MethodsWe used cross-sectional data from the International Study of Macro-/Micronutrients and Blood Pressure (INTERMAP), including 4680 men and women (40–59 y) from Japan, China, the United Kingdom, and United States. Dietary intakes were assessed by four 24-h dietary recalls and BP from 8 measurements. Urinary metabolites (2 timed 24-h urinary samples) associated with fish intake acquired from NMR spectroscopy were identified. Linear models were used to estimate BP and BMI differences across categories of intake and per 2 SD higher intake of fish and its biomarkers.ResultsNo significant associations were observed between fish intake and BP. There was a direct association with fish intake and BMI in the Japanese population sample (P trend = 0.03; fully adjusted model). In Japan, trimethylamine-N-oxide (TMAO) and taurine, respectively, demonstrated area under the receiver operating characteristic curve (AUC) values of 0.81 and 0.78 in discriminating high against low fish intake, whereas homarine (a metabolite found in shellfish muscle) demonstrated an AUC of 0.80 for high/nonshellfish intake. Direct associations were observed between urinary TMAO and BMI for all regions except Japan (P < 0.0001) and in Western populations between TMAO and BP (diastolic blood pressure: mean difference 1.28; 95% CI: 0.55, 2.02 mmHg; P = 0.0006, systolic blood pressure: mean difference 1.67; 95% CI: 0.60, 2.73 mmHg; P = 0.002).ConclusionsUrinary TMAO showed a stronger association with fish intake in the Japanese compared with the Western population sample. Urinary TMAO was directly associated with BP in the Western but not the Japanese popula
Segura-Lepe M, Keun H, Ebbels T, 2019, Predictive modelling using pathway scores: robustness and significance of pathway collections, BMC Bioinformatics, Vol: 20, ISSN: 1471-2105
BackgroundTranscriptomic data is often used to build statistical models which are predictive of a given phenotype, such as disease status. Genes work together in pathways and it is widely thought that pathway representations will be more robust to noise in the gene expression levels. We aimed to test this hypothesis by constructing models based on either genes alone, or based on sample specific scores for each pathway, thus transforming the data to a ‘pathway space’. We progressively degraded the raw data by addition of noise and examined the ability of the models to maintain predictivity.ResultsModels in the pathway space indeed had higher predictive robustness than models in the gene space. This result was independent of the workflow, parameters, classifier and data set used. Surprisingly, randomised pathway mappings produced models of similar accuracy and robustness to true mappings, suggesting that the success of pathway space models is not conferred by the specific definitions of the pathway. Instead, predictive models built on the true pathway mappings led to prediction rules with fewer influential pathways than those built on randomised pathways. The extent of this effect was used to differentiate pathway collections coming from a variety of widely used pathway databases.ConclusionsPrediction models based on pathway scores are more robust to degradation of gene expression information than the equivalent models based on ungrouped genes. While models based on true pathway scores are not more robust or accurate than those based on randomised pathways, true pathways produced simpler prediction rules, emphasizing a smaller number of pathways.
Tzoulaki I, Castagné R, Boulangé CL, et al., 2019, Serum metabolic signatures of coronary and carotid atherosclerosis and subsequent cardiovascular disease, European Heart Journal, Vol: 40, Pages: 2883-2896, ISSN: 1522-9645
Aims: To characterise serum metabolic signatures associated with atherosclerosis in the coronary or carotid arteries and subsequently their association with incident cardiovascular disease (CVD). Methods and Results: We used untargeted one-dimensional (1D) serum metabolic profiling by proton (1H) nuclear magnetic resonance (NMR) spectroscopy among 3,867 participants from the Multi-Ethnic Study of Atherosclerosis (MESA), with replication among 3,569 participants from the Rotterdam and LOLIPOP Studies. Atherosclerosis was assessed by coronary artery calcium (CAC) and carotid intima-media thickness (IMT). We used multivariable linear regression to evaluate associations between NMR features and atherosclerosis accounting for multiplicity of comparisons. We then examined associations between metabolites associated with atherosclerosis and incident CVD available in MESA and Rotterdam and explored molecular networks through bioinformatics analyses. Overall, 30 NMR measured metabolites were associated with CAC and/or IMT, P =1.3x10-14 to 6.5x10-6 (discovery), P =4.2x10-14 to 4.4x10-2 (replication). These associations were substantially attenuated after adjustment for conventional cardiovascular risk factors. Metabolites associated with atherosclerosis revealed disturbances in lipid and carbohydrate metabolism, branched-chain and aromatic amino acid metabolism, as well as oxidative stress and inflammatory pathways. Analyses of incident CVD events showed inverse associations with creatine, creatinine and phenylalanine, and direct associations with mannose, acetaminophen-glucuronide and lactate as well as apolipoprotein B (P <0.05). Conclusion: Metabolites associated with atherosclerosis were largely consistent between the two vascular beds (coronary and carotid arteries) and predominantly tag pathways that overlap with the known cardiovascular risk factors. We present an integrated systems network that highlights a series of inter-connected pathways underlying atherosclero
Afonso C, Barrow MP, Davies AN, et al., 2019, Data mining and visualisation: general discussion., Faraday Discuss, Vol: 218, Pages: 354-371
Viant MR, Ebbels TMD, Beger RD, et al., 2019, Use cases, best practice and reporting standards for metabolomics in regulatory toxicology, Nature Communications, Vol: 10, ISSN: 2041-1723
Metabolomics is a widely used technology in academic research, yet its application to regulatory science has been limited. The most commonly cited barrier to its translation is lack of performance and reporting standards. The MEtabolomics standaRds Initiative in Toxicology (MERIT) project brings together international experts from multiple sectors to address this need. Here, we identify the most relevant applications for metabolomics in regulatory toxicology and develop best practice guidelines, performance and reporting standards for acquiring and analysing untargeted metabolomics and targeted metabolite data. We recommend that these guidelines are evaluated and implemented for several regulatory use cases.
Gao Q, Dragsted LO, Ebbels T, 2019, Comparison of Bi- and Tri-Linear PLS models for variable selection in metabolomic time-series experiments, Metabolites, Vol: 9, ISSN: 2218-1989
Metabolomic studies with a time-series design are widely used for discovery and validation of biomarkers. In such studies, changes of metabolic profiles over time under different conditions (e.g., control and intervention) are compared, and metabolites responding differently between the conditions are identified as putative biomarkers. To incorporate time-series information into the variable (biomarker) selection in partial least squares regression (PLS) models, we created PLS models with different combinations of bilinear/trilinear X and group/time response dummy Y. In total, five PLS models were evaluated on two real datasets, and also on simulated datasets with varying characteristics (number of subjects, number of variables, inter-individual variability, intra-individual variability and number of time points). Variables showing specific temporal patterns observed visually and determined statistically were labelled as discriminating variables. Bootstrapped-VIP scores were calculated for variable selection and the variable selection performance of five PLS models were assessed based on their capacity to correctly select the discriminating variables. The results showed that the bilinear PLS model with group × time response as dummy Y provided the highest recall (true positive rate) of 83–95% with high precision, independent of most characteristics of the datasets. Trilinear PLS models tend to select a small number of variables with high precision but relatively high false negative rate (lower power). They are also less affected by the noise compared to bilinear PLS models. In datasets with high inter-individual variability, bilinear PLS models tend to provide higher recall while trilinear models tend to provide higher precision. Overall, we recommend bilinear PLS with group x time response Y for variable selection applications in metabolomics intervention time series studies.
Peters K, Bradbury J, Bergmann S, et al., 2019, PhenoMeNal: Processing and analysis of metabolomics data in the cloud, GigaScience, Vol: 8, ISSN: 2047-217X
Background: Metabolomics is the comprehensive study of a multitude of small molecules to gain insight into an organism's metabolism. The research field is dynamic and expanding with applications across biomedical, biotechnological and many other applied biological domains. Its computationally-intensive nature has driven requirements for open data formats, data repositories and data analysis tools. However, the rapid progress has resulted in a mosaic of independent-and sometimes incompatible-analysis methods that are difficult to connect into a useful and complete data analysis solution. Findings: PhenoMeNal (Phenome and Metabolome aNalysis) is an advanced and complete solution to set up Infrastructure-as-a-Service (IaaS) that brings workflow-oriented, interoperable metabolomics data analysis platforms into the cloud. PhenoMeNal seamlessly integrates a wide array of existing open source tools which are tested and packaged as Docker containers through the project's continuous integration process and deployed based on a kubernetes orchestration framework. It also provides a number of standardized, automated and published analysis workflows in the user interfaces Galaxy, Jupyter, Luigi and Pachyderm. Conclusions: PhenoMeNal constitutes a keystone solution in cloud e-infrastructures available for metabolomics. PhenoMeNal is a unique and complete solution for setting up cloud e-infrastructures through easy-to-use web interfaces that can be scaled to any custom public and private cloud environment. By harmonizing and automating software installation and configuration and through ready-to-use scientific workflow user interfaces, PhenoMeNal has succeeded in providing scientists with workflow-driven, reproducible and shareable metabolomics data analysis platforms which are interfaced through standard data formats, representative datasets, versioned, and have been tested for reproducibility and interoperability. The elastic implementation of PhenoMeNal further allows easy adap
Chan Q, Lau C-HE, Gibson R, et al., 2019, Relationships of Dietary and Supplement Magnesium Intake and Its Urinary Metabolomic Biomarkers With Blood Pressure: The INTERMAP Study, Scientific Sessions of the American-Heart-Association on Epidemiology and Prevention/Lifestyle and Cardiometabolic Health, Publisher: LIPPINCOTT WILLIAMS & WILKINS, ISSN: 0009-7322
Gibson R, Lau C-HE, Chan Q, et al., 2019, Cross-Sectional Investigation of the Relationship Between Fish Consumption and Its Urinary Biomarkers With Blood Pressure Across Asian and Western Populations: Results From the INTERMAP Study, Scientific Sessions of the American-Heart-Association on Epidemiology and Prevention/Lifestyle and Cardiometabolic Health, Publisher: LIPPINCOTT WILLIAMS & WILKINS, ISSN: 0009-7322
Ebbels TMD, Karaman I, Graça G, 2019, Processing and Analysis of Untargeted Multicohort NMR Data., Pages: 453-470
NMR data from large studies combining multiple cohorts is becoming common in large-scale metabolomics. The data size and combination of cohorts with diverse properties leads to special problems for data processing and analysis. These include alignment, normalization, detection and removal of outliers, presence of strong correlations, and the identification of unknowns. Nonetheless, these challenges can be addressed with suitable algorithms and techniques, leading to enhanced data sets ripe for further data mining.
Peluso A, Ebbels T, Glen R, 2018, Empirical estimation of permutation-based metabolome-wide significance thresholds, Publisher: bioRxiv
A key issue in the omics literature is the search of statistically significant relationships between molecular markers and phenotype. The aim is to detect disease-related discriminatory features while controlling for false positive associations at adequate power. Metabolome-wide association studies have revealed significant relationships of metabolic phenotypes with disease risk by analysing hundreds to tens of thousands of molecular variables leading to multivariate data which are highly noisy and collinear. In this context, Bonferroni or Sidak correction are rather useful as these are valid for independent tests, while permutation procedures allow for the estimation of p-values from the null distribution without assuming independence among features. Nevertheless, under the permutation approach the distribution of p-values may presents systematic deviations from the theoretical null distribution which leads to biased adjusted threshold estimate, e.g. smaller than a Bonferroni or Sidak correction. We make use of parametric approximation methods based on a multivariate Normal distribution to derive stable estimates of the metabolome-wide significance level within a univariate approach based on a permutation procedure which effectively controls the maximum overall type I error rate at the α level. We illustrate the results for different model parametrizations and distributional features of the outcome measure, as well as for diverse correlation levels within the features and between the features and the phenotype in real data and simulated studies. MWSL is the open-source R software package for the empirical estimation of the metabolomic-wide significance level available at https://github.com/AlinaPeluso/MWSL.
Kamp H, Beger R, Dorne J-LCM, et al., 2018, MEtabolomics standaRds Initiative in Toxicology (MERIT), 54th Congress of the European-Societies-of-Toxicology (EUROTOX) - Toxicology Out of the Box, Publisher: ELSEVIER IRELAND LTD, Pages: S214-S214, ISSN: 0378-4274
Ye L, De Iorio M, Ebbels TMD, 2018, Bayesian estimation of the number of protonation sites for urinary metabolites from NMR spectroscopic data, Metabolomics, Vol: 14, ISSN: 1573-3882
IntroductionTo aid the development of better algorithms for 1H NMR data analysis, such as alignment or peak-fitting, it is important to characterise and model chemical shift changes caused by variation in pH. The number of protonation sites, a key parameter in the theoretical relationship between pH and chemical shift, is traditionally estimated from the molecular structure, which is often unknown in untargeted metabolomics applications.ObjectiveWe aim to use observed NMR chemical shift titration data to estimate the number of protonation sites for a range of urinary metabolites.MethodsA pool of urine from healthy subjects was titrated in the range pH 2–12, standard 1H NMR spectra were acquired and positions of 51 peaks (corresponding to 32 identified metabolites) were recorded. A theoretical model of chemical shift was fit to the data using a Bayesian statistical framework, using model selection procedures in a Markov Chain Monte Carlo algorithm to estimate the number of protonation sites for each molecule.ResultsThe estimated number of protonation sites was found to be correct for 41 out of 51 peaks. In some cases, the number of sites was incorrectly estimated, due to very close pKa values or a limited amount of data in the required pH range.ConclusionsGiven appropriate data, it is possible to estimate the number of protonation sites for many metabolites typically observed in 1H NMR metabolomics without knowledge of the molecular structure. This approach may be a valuable resource for the development of future automated metabolite alignment, annotation and peak fitting algorithms.
Kaluarachchi M, Boulangé C, Karaman I, et al., 2018, A comparison of human serum and plasma metabolites using untargeted 1H NMR spectroscopy and UPLC-MS, Metabolomics, Vol: 14, ISSN: 1573-3882
Introduction:Differences in the metabolite profiles between serum and plasma are incompletely understood.Objectives:To evaluate metabolic profile differences between serum and plasma and among plasma sample subtypes.Methods:We analyzed serum, platelet rich plasma (PRP), platelet poor plasma (PPP), and platelet free plasma (PFP), collected from 8 non-fasting apparently healthy women, using untargeted standard 1D and CPMG 1H NMR and reverse phase and hydrophilic (HILIC) UPLC-MS. Differences between metabolic profiles were evaluated using validated principal component and orthogonal partial least squares discriminant analysis.ResultsExplorative analysis showed the main source of variation among samples was due to inter-individual differences with no grouping by sample type. After correcting for inter-individual differences, lipoproteins, lipids in VLDL/LDL, lactate, glutamine, and glucose were found to discriminate serum from plasma in NMR analyses. In UPLC-MS analyses, lysophosphatidylethanolamine (lysoPE)(18:0) and lysophosphatidic acid(20:0) were higher in serum, and phosphatidylcholines (PC)(16:1/18:2, 20:3/18:0, O-20:0/22:4), lysoPC(16:0), PE(O-18:2/20:4), sphingomyelin(18:0/22:0), and linoleic acid were lower. In plasma subtype analyses, isoleucine, leucine, valine, phenylalanine, glutamate, and pyruvate were higher among PRP samples compared with PPP and PFP by NMR while lipids in VLDL/LDL, citrate, and glutamine were lower. By UPLC-MS, PE(18:0/18:2) and PC(P-16:0/20:4) were higher in PRP compared with PFP samples.Conclusions:Correction for inter-individual variation was required to detect metabolite differences between serum and plasma. Our results suggest the potential importance of inter-individual effects and sample type on the results from serum and plasma metabolic phenotyping studies.
Posma JM, Garcia Perez I, Ebbels TMD, et al., 2018, Optimized phenotypic biomarker discovery and confounder elimination via covariate-adjusted projection to latent structures from metabolic spectroscopy data, Journal of Proteome Research, Vol: 17, Pages: 1586-1595, ISSN: 1535-3893
Metabolism is altered by genetics, diet, disease status, environment and many other factors. Modelling either one of these is often done without considering the effects of the other covariates. Attributing differences in metabolic profile to one of these factors needs to be done while controlling for the metabolic influence of the rest. We describe here a data analysis framework and novel confounder-adjustment algorithm for multivariate analysis of metabolic profiling data. Using simulated data we show that similar numbers of true associations and significantly less false positives are found compared to other commonly used methods. Covariate-Adjusted Projections to Latent Structures (CA-PLS) is exemplified here using a large-scale metabolic phenotyping study of two Chinese populations at different risks for cardiovascular disease. Using CA-PLS we find that some previously reported differences are actually associated with external factors and discover a number of previously unreported biomarkers linked to different metabolic pathways. CA-PLS can be applied to any multivariate data where confounding may be an issue and the confounder-adjustment procedure is translatable to other multivariate regression techniques.
Harada S, Hirayama A, Chan Q, et al., 2018, Reliability of plasma polar metabolite concentrations in a large-scale cohort study using capillary electrophoresis-mass spectrometry., PLoS ONE, Vol: 13, ISSN: 1932-6203
BACKGROUND: Cohort studies with metabolomics data are becoming more widespread, however, large-scale studies involving 10,000s of participants are still limited, especially in Asian populations. Therefore, we started the Tsuruoka Metabolomics Cohort Study enrolling 11,002 community-dwelling adults in Japan, and using capillary electrophoresis-mass spectrometry (CE-MS) and liquid chromatography-mass spectrometry. The CE-MS method is highly amenable to absolute quantification of polar metabolites, however, its reliability for large-scale measurement is unclear. The aim of this study is to examine reproducibility and validity of large-scale CE-MS measurements. In addition, the study presents absolute concentrations of polar metabolites in human plasma, which can be used in future as reference ranges in a Japanese population. METHODS: Metabolomic profiling of 8,413 fasting plasma samples were completed using CE-MS, and 94 polar metabolites were structurally identified and quantified. Quality control (QC) samples were injected every ten samples and assessed throughout the analysis. Inter- and intra-batch coefficients of variation of QC and participant samples, and technical intraclass correlation coefficients were estimated. Passing-Bablok regression of plasma concentrations by CE-MS on serum concentrations by standard clinical chemistry assays was conducted for creatinine and uric acid. RESULTS AND CONCLUSIONS: In QC samples, coefficient of variation was less than 20% for 64 metabolites, and less than 30% for 80 metabolites out of the 94 metabolites. Inter-batch coefficient of variation was less than 20% for 81 metabolites. Estimated technical intraclass correlation coefficient was above 0.75 for 67 metabolites. The slope of Passing-Bablok regression was estimated as 0.97 (95% confidence interval: 0.95, 0.98) for creatinine and 0.95 (0.92, 0.96) for uric acid. Compared to published data from other large cohort measurement platforms, reproducibility of metabolites common
Ebbels TMD, Rodriguez-Martinez A, Dumas M-E, et al., 2018, Advances in Computational Analysis of Metabolomic NMR Data, NMR-based Metabolomics
© 2019 Elsevier Inc. All rights reserved. Metabolic phenotyping is entering the era of Big Data, leading to new opportunities and challenges. Cloud computing has been proposed as a novel paradigm, but as yet is not widely understood or used. In this chapter we introduce the concepts of Big Data and cloud computing, and discuss how they might change the landscape of metabolic phenotyping and analysis. We highlight some of the reasons for the increase in data size and explain advantages and disadvantages of large-scale computing in this context. We illustrate the area with a survey of software tools and databases currently available, and describe the newly developed cloud infrastructure “PhenoMeNal,” which will enable widespread use of these approaches. We conclude the chapter with a discussion of the important ethical, legal, and social implications (ELSI) of large-scale computing in this rapidly developing field.
Tan LSL, Jasra A, De Iorio M, et al., 2017, Bayesian inference for multiple Gaussian graphical models with application to metabolic association networks, Annals of Applied Statistics, Vol: 11, Pages: 2222-2251, ISSN: 1932-6157
We investigate the effect of cadmium (a toxic environmental pollutant) on the correlation structure of a number of urinary metabolites using Gaussian graphical models (GGMs). The inferred metabolic associations can provide important information on the physiological state of a metabolic system and insights on complex metabolic relationships. Using the fitted GGMs, we construct differential networks, which highlight significant changes in metabolite interactions under different experimental conditions. The analysis of such metabolic association networks can reveal differences in the underlying biological reactions caused by cadmium exposure. We consider Bayesian inference and propose using the multiplicative (or Chung–Lu random graph) model as a prior on the graphical space. In the multiplicative model, each edge is chosen independently with probability equal to the product of the connectivities of the end nodes. This class of prior is parsimonious yet highly flexible; it can be used to encourage sparsity or graphs with a pre-specified degree distribution when such prior knowledge is available. We extend the multiplicative model to multiple GGMs linking the probability of edge inclusion through logistic regression and demonstrate how this leads to joint inference for multiple GGMs. A sequential Monte Carlo (SMC) algorithm is developed for estimating the posterior distribution of the graphs.
Jendoubi T, Ebbels TMD, 2017, Integrative analysis of time course metabolic data and biomarker discovery, AMLICD workshop NIPS 2017
Metabonomics time-course experiments provide the opportunity to understandthe changes to an organism by observing the evolution of metabolic profiles inresponse to internal or external stimuli. Along with other omic longitudinalprofiling technologies, these techniques have great potential to complement theanalysis of complex relations between variations across diverse omic variablesand provide unique insights into the underlying biology of the system. However,many statistical methods currently used to analyse short time-series omic dataare i) prone to overfitting or ii) do not take into account the experimentaldesign or iii) do not make full use of the multivariate information intrinsicto the data or iv) unable to uncover multiple associations between differentomic data. The model we propose is an attempt to i) overcome overfitting byusing a weakly informative Bayesian model, ii) capture experimental designconditions through a mixed-effects model, iii) model interdependencies betweenvariables by augmenting the mixed-effects model with a conditionalauto-regressive (CAR) component and iv) identify potential associations betweenheterogeneous omic variables .
Metabolomics, the youngest of the major omics technologies, is supported by an active community of researchers and infrastructure developers across Europe. To coordinate and focus efforts around infrastructure building for metabolomics within Europe, a workshop on the "Future of metabolomics in ELIXIR" was organised at Frankfurt Airport in Germany. This one-day strategic workshop involved representatives of ELIXIR Nodes, members of the PhenoMeNal consortium developing an e-infrastructure that supports workflow-based metabolomics analysis pipelines, and experts from the international metabolomics community. The workshop established metabolite identification as the critical area, where a maximal impact of computational metabolomics and data management on other fields could be achieved. In particular, the existing four ELIXIR Use Cases, where the metabolomics community - both industry and academia - would benefit most, and which could be exhaustively mapped onto the current five ELIXIR Platforms were discussed. This opinion article is a call for support for a new ELIXIR metabolomics Use Case, which aligns with and complements the existing and planned ELIXIR Platforms and Use Cases.
Schober D, Jacob D, Wilson M, et al., 2017, nmrML: A Community Supported Open Data Standard for the Description, Storage, and Exchange of NMR Data., Analytical Chemistry, Vol: 90, Pages: 649-656, ISSN: 0003-2700
NMR is a widely used analytical technique with a growing number of repositories available. As a result, demands for a vendor-agnostic, open data format for long-term archiving of NMR data have emerged with the aim to ease and encourage sharing, comparison, and reuse of NMR data. Here we present nmrML, an open XML-based exchange and storage format for NMR spectral data. The nmrML format is intended to be fully compatible with existing NMR data for chemical, biochemical, and metabolomics experiments. nmrML can capture raw NMR data, spectral data acquisition parameters, and where available spectral metadata, such as chemical structures associated with spectral assignments. The nmrML format is compatible with pure-compound NMR data for reference spectral libraries as well as NMR data from complex biomixtures, i.e., metabolomics experiments. To facilitate format conversions, we provide nmrML converters for Bruker, JEOL and Agilent/Varian vendor formats. In addition, easy-to-use Web-based spectral viewing, processing, and spectral assignment tools that read and write nmrML have been developed. Software libraries and Web services for data validation are available for tool developers and end-users. The nmrML format has already been adopted for capturing and disseminating NMR data for small molecules by several open source data processing tools and metabolomics reference spectral libraries, e.g., serving as storage format for the MetaboLights data repository. The nmrML open access data standard has been endorsed by the Metabolomics Standards Initiative (MSI), and we here encourage user participation and feedback to increase usability and make it a successful standard.
Kauffmann H-M, Kamp H, Fuchs R, et al., 2017, Framework for the quality assurance of 'omics technologies considering GLP requirements, Regulatory Toxicology and Pharmacology, Vol: 91, Pages: S27-S35, ISSN: 0273-2300
‘Omics technologies are gaining importance to support regulatory toxicity studies. Prerequisites for performing ‘omics studies considering GLP principles were discussed at the European Centre for Ecotoxicology and Toxicology of Chemicals (ECETOC) Workshop Applying ‘omics technologies in Chemical Risk Assessment. A GLP environment comprises a standard operating procedure system, proper pre-planning and documentation, and inspections of independent quality assurance staff. To prevent uncontrolled data changes, the raw data obtained in the respective ‘omics data recording systems have to be specifically defined. Further requirements include transparent and reproducible data processing steps, and safe data storage and archiving procedures. The software for data recording and processing should be validated, and data changes should be traceable or disabled. GLP-compliant quality assurance of ‘omics technologies appears feasible for many GLP requirements. However, challenges include (i) defining, storing, and archiving the raw data; (ii) transparent descriptions of data processing steps; (iii) software validation; and (iv) ensuring complete reproducibility of final results with respect to raw data. Nevertheless, ‘omics studies can be supported by quality measures (e.g., GLP principles) to ensure quality control, reproducibility and traceability of experiments. This enables regulators to use ‘omics data in a fit-for-purpose context, which enhances their applicability for risk assessment.
Buesen R, Chorley BN, Lima BDS, et al., 2017, Applying 'omics technologies in chemicals risk assessment: Report of an ECETOC workshop, Regulatory Toxicology and Pharmacology, Vol: 91, Pages: S3-S13, ISSN: 0273-2300
Prevailing knowledge gaps in linking specific molecular changes to apical outcomes and methodological uncertainties in the generation, storage, processing, and interpretation of 'omics data limit the application of 'omics technologies in regulatory toxicology. Against this background, the European Centre for Ecotoxicology and Toxicology of Chemicals (ECETOC) convened a workshop Applying 'omics technologies in chemicals risk assessment that is reported herein. Ahead of the workshop, multi-expert teams drafted frameworks on best practices for (i) a Good-Laboratory Practice-like context for collecting, storing and curating 'omics data; (ii) the processing of 'omics data; and (iii) weight-of-evidence approaches for integrating 'omics data. The workshop participants confirmed the relevance of these Frameworks to facilitate the regulatory applicability and use of 'omics data, and the workshop discussions provided input for their further elaboration. Additionally, the key objective (iv) to establish approaches to connect 'omics perturbations to phenotypic alterations was addressed. Generally, it was considered promising to strive to link gene expression changes and pathway perturbations to the phenotype by mapping them to specific adverse outcome pathways. While further work is necessary before gene expression changes can be used to establish safe levels of substance exposure, the ECETOC workshop provided important incentives towards achieving this goal.
castagne R, Boulange CL, Karaman I, et al., 2017, Improving visualisation and interpretation of metabolome-wide association studies (MWAS): an application in a population-based cohort using untargeted 1H NMR metabolic profiling., Journal of Proteome Research, Vol: 16, Pages: 3623-3633, ISSN: 1535-3893
1H NMR spectroscopy of biofluids generates reproducible data allowing detection and quantification of small molecules in large population cohorts. Statistical models to analyze such data are now well-established, and the use of univariate metabolome wide association studies (MWAS) investigating the spectral features separately has emerged as a computationally efficient and interpretable alternative to multivariate models. The MWAS rely on the accurate estimation of a metabolome wide significance level (MWSL) to be applied to control the family wise error rate. Subsequent interpretation requires efficient visualization and formal feature annotation, which, in-turn, call for efficient prioritization of spectral variables of interest. Using human serum 1H NMR spectroscopic profiles from 3948 participants from the Multi-Ethnic Study of Atherosclerosis (MESA), we have performed a series of MWAS for serum levels of glucose. We first propose an extension of the conventional MWSL that yields stable estimates of the MWSL across the different model parameterizations and distributional features of the outcome. We propose both efficient visualization methods and a strategy based on subsampling and internal validation to prioritize the associations. Our work proposes and illustrates practical and scalable solutions to facilitate the implementation of the MWAS approach and improve interpretation in large cohort studies.
Weber RJM, Lawson TN, Salek RM, et al., 2016, Computational tools and workflows in metabolomics: An international survey highlights the opportunity for harmonisation through Galaxy, Metabolomics, Vol: 13, ISSN: 1573-3890
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.