I am a Reader within the Computational and Systems Medicine Section of the Department of Surgery and Cancer. My overall research interests lie at the interface between two broad areas:
multivariate data analysis,
More specifically, on the computational side these include, machine learning, bioinformatics, chemometrics, and multivariate statistics, and on the experimental side, the fields of genomics, proteomics and metabolomics. I am interested in applying diverse computational and mathematical methods in order to disentangle the mass of information at multiple biological levels generated by the –omics technologies. The ultimate aim is to synthesise the different information provided by each of these techniques, thus facilitating the modular approach to understanding biological function known as systems biology. This broad aim leads to several themes in my current research:
- Improving information extraction from Nuclear Magnetic Resonance (NMR) spectroscopy & Liquid Chromatography–Mass Spectrometry (LC-MS) metabolic profiles
- Novel methods for predictive modelling of post-genomic data
- Statistical association networks as complex phenotypes in post-genomics
- Statistical integration and visualisation of metabolic profiles with other post-genomic data
- Time series analysis of post-genomic data
The PhenoMeNal project will develop an integrated, secure, permanent, on-demand service-driven, privacy-compliant and sustainable e-infrastructure for the processing, analysis and information-mining of metabolomics and associated biomedical data. The project focusses on developing computational workflows and engines for dealing with this highly complex data, arising from facilities such as the National Phenome Centre.
Bayesian analysis of metabolic NMR spectra
In this collaboration with Dr Maria De Iorio, UCL, we have developed a Bayesian model of NMR spectra which aims to automatically assign and quantify resonances of metabolites in complex 1-dimensional spectra of biofluids and tissues. We have released a publically available R package called the Bayesian AuTomated Metabolite Analyser for NMR spectra (BATMAN).
BATMAN has an R-Forge site where you can download the package. Alternatively you can install direct from R by typing:
Simulation of NMR metabolic profiles
With Dr Maria De Iorio, now at UCL, we have developed a MATLAB program which is able to simulate realistic 1-dimensional 1H NMR spectra of complex mixtures of metabolites. The programme 'MetAssimulo' allows the user to specify the levels and variability of metabolites to be incorporated, and by default will simulate two groups corresponding to 'cases' and 'controls'. MetAssimulo can also simulate correlations within and between metabolites, as well as shifts in peak positions often seen in metabolic spectra. The picture below shows two NMR spectra of human urine. One is real and one is simulated. Can you tell which is which?
Time series analysis in metabolomics
With Giovanni Montana and Maurice Berk we have developed new techniques for time series analysis in metabolomics. Traditional time series methods require hundreds of time points and typically monitor just a few variables. In metabolic profiling, we monitor hundreds to thousands of variables over just a few (typicall 3-10) time points. The plot below shows an example from our smoothing splines mixed effects (SME) model where the time course for two different groups of animals is being compared for a single metabolite. See publication.
Pathway tools for integrating multi-omics data
With Hector Keun, Rachel Cavill (Imperial) and Ralf Herwig and Atans Kamburov (MPIMG, Germany) we have developed new tools which allow researchers to interpret data from multiple omics experiments in a pathway context. The pathway approach allows a higher sensitivity and more global overview of the effects in a biological study and is particularly suited to integrating data from different levels of biomolecular organisation (e.g. metabolomics, transcriptomics) obtained in heterogeneous experiments. The figure blow shows a clustering of 118 drugs (rows) with hundreds of pathways (columns) indicating the overall pathway response of different cancer cell lines to drug treatment, as measured by metabolomics and transcriptomics. See the paper.
Differential Correlation Networks
A characteristic of omics data is that the individual measurements are highly correlated. That is, the level of a gene or metabolite tends to vary in a similar way to that of other genes/metabolites. These correlations can be visualised as a network in which correlated molecules are connected. The correlation network can be thought of as a new type of 'fingerprint' of the biological status of an organism and is therefore of interest in its own right. With Maria De Iorio (UCL) we have developed methods for analysing how correlation networks change in response to different biological conditions. We term these 'differential correlation networks'. The picture below shows which links in the blood lipoprotein correlation network are different bewteen normal people and those with pre-diabetics symptoms. See the paper.
Classification tools for metabolomics
COMET Expert System: This figure shows a schematic of an 'expert system' constructed to help predict the toxicity of novel drugs. On the left NMR spectra of rat urine are shown (a,b), a model of normality is developed (c) over a time course (d). The right panel shows a similarity matrix comparing metabolic profiles from 62 treatments affecting the liver or kidney reflecting a high degree of organ specificity. See Ebbels et al. J. Prot. Res. 2007.
RIGOROUS STATISTICAL MODELS OF LIQUID CHROMATOGRAPHY MASS SPECTROMETRY (LC-MS) DATA
LC-MS is one of the most widely used technologies in metabolomics and other bioanalytical sciences today. The measurement process is very complex and thus most methods for processing the data rely on heuristic approaches. With Andreas Ipsen we have developed a new model of LC-Time of Flight-MS, based on the physical ion counting process at the detector, which is able to model the data more accurately than ever before. The model has so far led to two applications: detection of co-eluting compounds and construction of rigorous confidence intervals on isotope ratios. Both of these applications help in the difficult task of identifying unknown molecules - one of the key bottlenecks in metabolomics today. The picture below shows the ion counts from two partially coeluting species. The colours display the p-value from the new test indicating clearly that these two peaks do not coelute. More details in the papers 1, 2 & 3
David R, Ebbels T, Gooderham N, 2016, Synergistic and Antagonistic Mutation Responses of Human MCL-5 Cells to Mixtures of Benzo[a]pyrene and 2-Amino-1-Methyl-6-Phenylimidazo[4,5-b]pyridine: Dose-Related Variation in the Joint Effects of Common Dietary Carcinogens, Environmental Health Perspectives, Vol:124, ISSN:0091-6765, Pages:88-96
et al., 2016, Data standards can boost metabolomics research, and if there is a will, there is a way, Metabolomics, Vol:12, ISSN:1573-3882
et al., 2016, Workflow for Integrated Processing of Multicohort Untargeted (1)H NMR Metabolomics Data in Large-Scale Metabolic Epidemiology., J Proteome Res
et al., 2016, Power Analysis and Sample Size Determination in Metabolic Phenotyping, Analytical Chemistry, Vol:88, ISSN:0003-2700, Pages:5179-5188
et al., 2016, Modelling the acid/base <sup>1</sup>H NMR chemical shift limits of metabolites in human urine, Metabolomics, Vol:12, ISSN:1573-3882