Imperial College London

Emeritus ProfessorJeremyNicholson

Faculty of MedicineDepartment of Metabolism, Digestion and Reproduction

Emeritus Professor of Biological Chemistry
 
 
 
//

Contact

 

+44 (0)20 7594 3195j.nicholson Website

 
 
//

Assistant

 

Ms Wendy Torto +44 (0)20 7594 3225

 
//

Location

 

Office no. 665Sir Alexander Fleming BuildingSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@article{Zou:2016:10.1021/acs.analchem.5b04020,
author = {Zou, X and Holmes, E and Nicholson, JK and Loo, RL},
doi = {10.1021/acs.analchem.5b04020},
journal = {Analytical Chemistry},
pages = {5670--5679},
title = {Automatic Spectroscopic Data Categorization by Clustering Analysis (ASCLAN): A Data-Driven Approach for Distinguishing Discriminatory Metabolites for Phenotypic Subclasses},
url = {http://dx.doi.org/10.1021/acs.analchem.5b04020},
volume = {88},
year = {2016}
}

RIS format (EndNote, RefMan)

TY  - JOUR
AB - We propose a novel data-driven approach aiming to reliably distinguish discriminatory metabolites from nondiscriminatory metabolites for a given spectroscopic data set containing two biological phenotypic subclasses. The automatic spectroscopic data categorization by clustering analysis (ASCLAN) algorithm aims to categorize spectral variables within a data set into three clusters corresponding to noise, nondiscriminatory and discriminatory metabolites regions. This is achieved by clustering each spectral variable based on the r2 value representing the loading weight of each spectral variable as extracted from a orthogonal partial least-squares discriminant (OPLS-DA) model of the data set. The variables are ranked according to r2 values and a series of principal component analysis (PCA) models are then built for subsets of these spectral data corresponding to ranges of r2 values. The Q2X value for each PCA model is extracted. K-means clustering is then applied to the Q2X values to generate two clusters based on minimum Euclidean distance criterion. The cluster consisting of lower Q2X values is deemed devoid of metabolic information (noise), while the cluster consists of higher Q2X values is then further subclustered into two groups based on the r2 values. We considered the cluster with high Q2X but low r2 values as nondiscriminatory, while the cluster with high Q2X and r2 values as discriminatory variables. The boundaries between these three clusters of spectral variables, on the basis of the r2 values were considered as the cut off values for defining the noise, nondiscriminatory and discriminatory variables. We evaluated the ASCLAN algorithm using six simulated 1H NMR spectroscopic data sets representing small, medium and large data sets (N = 50, 500, and 1000 samples per group, respectively), each with a reduced and full resolution set of variables (0.005 and 0.0005 ppm, respectively). ASCLAN correctly identified all discriminatory metabolites and showed zero fals
AU - Zou,X
AU - Holmes,E
AU - Nicholson,JK
AU - Loo,RL
DO - 10.1021/acs.analchem.5b04020
EP - 5679
PY - 2016///
SN - 1086-4377
SP - 5670
TI - Automatic Spectroscopic Data Categorization by Clustering Analysis (ASCLAN): A Data-Driven Approach for Distinguishing Discriminatory Metabolites for Phenotypic Subclasses
T2 - Analytical Chemistry
UR - http://dx.doi.org/10.1021/acs.analchem.5b04020
UR - http://hdl.handle.net/10044/1/34557
VL - 88
ER -