Imperial College London

ProfessorJ SimonKroll

Faculty of MedicineDepartment of Infectious Disease

Emeritus Professor,Paediatrics&Molecular Infectious Diseases
 
 
 
//

Contact

 

+44 (0)20 7594 3695s.kroll

 
 
//

Assistant

 

Dr Robert Boyle +44 (0)20 7594 3990

 
//

Location

 

245Wright Fleming WingSt Mary's Campus

//

Summary

 

Publications

Citation

BibTex format

@article{Rowe:2019:10.1186/s40168-019-0653-2,
author = {Rowe, WPM and Carrieri, AP and Alcon-Giner, C and Caim, S and Shaw, A and Sim, K and Kroll, JS and Hall, LJ and Pyzer-Knapp, EO and Winn, MD},
doi = {10.1186/s40168-019-0653-2},
journal = {Microbiome},
title = {Streaming histogram sketching for rapid microbiome analytics},
url = {http://dx.doi.org/10.1186/s40168-019-0653-2},
volume = {7},
year = {2019}
}

RIS format (EndNote, RefMan)

TY  - JOUR
AB - BackgroundThe growth in publically available microbiome data in recent years has yielded an invaluable resource for genomic research, allowing for the design of new studies, augmentation of novel datasets and reanalysis of published works. This vast amount of microbiome data, as well as the widespread proliferation of microbiome research and the looming era of clinical metagenomics, means there is an urgent need to develop analytics that can process huge amounts of data in a short amount of time.To address this need, we propose a new method for the compact representation of microbiome sequencing data using similarity-preserving sketches of streaming k-mer spectra. These sketches allow for dissimilarity estimation, rapid microbiome catalogue searching and classification of microbiome samples in near real time.ResultsWe apply streaming histogram sketching to microbiome samples as a form of dimensionality reduction, creating a compressed ‘histosketch’ that can efficiently represent microbiome k-mer spectra. Using public microbiome datasets, we show that histosketches can be clustered by sample type using the pairwise Jaccard similarity estimation, consequently allowing for rapid microbiome similarity searches via a locality sensitive hashing indexing scheme.Furthermore, we use a ‘real life’ example to show that histosketches can train machine learning classifiers to accurately label microbiome samples. Specifically, using a collection of 108 novel microbiome samples from a cohort of premature neonates, we trained and tested a random forest classifier that could accurately predict whether the neonate had received antibiotic treatment (97% accuracy, 96% precision) and could subsequently be used to classify microbiome data streams in less than 3 s.ConclusionsOur method offers a new approach to rapidly process microbiome data streams, allowing samples to be rapidly clustered, indexed and classified. We also provide our implementation, Histosk
AU - Rowe,WPM
AU - Carrieri,AP
AU - Alcon-Giner,C
AU - Caim,S
AU - Shaw,A
AU - Sim,K
AU - Kroll,JS
AU - Hall,LJ
AU - Pyzer-Knapp,EO
AU - Winn,MD
DO - 10.1186/s40168-019-0653-2
PY - 2019///
SN - 2049-2618
TI - Streaming histogram sketching for rapid microbiome analytics
T2 - Microbiome
UR - http://dx.doi.org/10.1186/s40168-019-0653-2
UR - http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000461393200001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=1ba7043ffcc86c417c072aa74d649202
UR - http://hdl.handle.net/10044/1/69826
VL - 7
ER -