Imperial College London

DrJohnLees

Faculty of MedicineSchool of Public Health

Visiting Researcher
 
 
 
//

Contact

 

+44 (0)20 7594 2939j.lees Website

 
 
//

Location

 

UG4Sir Alexander Fleming BuildingSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@article{Lees:2018:10.12688/wellcomeopenres.14265.1,
author = {Lees, JA and Kendall, M and Parkhill, J and Colijn, C and Bentley, SD and Harris, SR},
doi = {10.12688/wellcomeopenres.14265.1},
journal = {Wellcome Open Research},
pages = {33--33},
title = {Evaluation of phylogenetic reconstruction methods using bacterial whole genomes: a simulation based study},
url = {http://dx.doi.org/10.12688/wellcomeopenres.14265.1},
volume = {3},
year = {2018}
}

RIS format (EndNote, RefMan)

TY  - JOUR
AB - Background: Phylogenetic reconstruction is a necessary first step in many analyses which use whole genome sequence data from bacterial populations. There are many available methods to infer phylogenies, and these have various advantages and disadvantages, but few unbiased comparisons of the range of approaches have been made. Methods: We simulated data from a defined "true tree" using a realistic evolutionary model. We built phylogenies from this data using a range of methods, and compared reconstructed trees to the true tree using two measures, noting the computational time needed for different phylogenetic reconstructions. We also used real data from Streptococcus pneumoniae alignments to compare individual core gene trees to a core genome tree. Results: We found that, as expected, maximum likelihood trees from good quality alignments were the most accurate, but also the most computationally intensive. Using less accurate phylogenetic reconstruction methods, we were able to obtain results of comparable accuracy; we found that approximate results can rapidly be obtained using genetic distance based methods. In real data we found that highly conserved core genes, such as those involved in translation, gave an inaccurate tree topology, whereas genes involved in recombination events gave inaccurate branch lengths. We also show a tree-of-trees, relating the results of different phylogenetic reconstructions to each other. Conclusions: We recommend three approaches, depending on requirements for accuracy and computational time. Quicker approaches that do not perform full maximum likelihood optimisation may be useful for many analyses requiring a phylogeny, as generating a high quality input alignment is likely to be the major limiting factor of accurate tree topology. We have publicly released our simulated data and code to enable further comparisons.
AU - Lees,JA
AU - Kendall,M
AU - Parkhill,J
AU - Colijn,C
AU - Bentley,SD
AU - Harris,SR
DO - 10.12688/wellcomeopenres.14265.1
EP - 33
PY - 2018///
SN - 2398-502X
SP - 33
TI - Evaluation of phylogenetic reconstruction methods using bacterial whole genomes: a simulation based study
T2 - Wellcome Open Research
UR - http://dx.doi.org/10.12688/wellcomeopenres.14265.1
UR - https://www.ncbi.nlm.nih.gov/pubmed/29774245
UR - http://hdl.handle.net/10044/1/59675
VL - 3
ER -