Imperial College London

Professor Sarah Fidler BSc. MBBS. FRCP. PhD

Faculty of MedicineDepartment of Infectious Disease

Professor of HIV and Communicable Diseases
 
 
 
//

Contact

 

+44 (0)20 7594 6230s.fidler

 
 
//

Location

 

clinical trial centre Winston Churchill wingMedical SchoolSt Mary's Campus

//

Summary

 

Publications

Citation

BibTex format

@article{Yebra:2016:10.1038/srep39489,
author = {Yebra, G and Hodcroft, EB and Ragonnet-Cronin, ML and Pillay, D and Brown, AJL and PANGEAHIV, Consortium and ICONIC, Project},
doi = {10.1038/srep39489},
journal = {Scientific Reports},
title = {Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic.},
url = {http://dx.doi.org/10.1038/srep39489},
volume = {6},
year = {2016}
}

RIS format (EndNote, RefMan)

TY  - JOUR
AB - HIV molecular epidemiology studies analyse viral pol gene sequences due to their availability, but whole genome sequencing allows to use other genes. We aimed to determine what gene(s) provide(s) the best approximation to the real phylogeny by analysing a simulated epidemic (created as part of the PANGEA_HIV project) with a known transmission tree. We sub-sampled a simulated dataset of 4662 sequences into different combinations of genes (gag-pol-env, gag-pol, gag, pol, env and partial pol) and sampling depths (100%, 60%, 20% and 5%), generating 100 replicates for each case. We built maximum-likelihood trees for each combination using RAxML (GTR + Γ), and compared their topologies to the corresponding true tree's using CompareTree. The accuracy of the trees was significantly proportional to the length of the sequences used, with the gag-pol-env datasets showing the best performance and gag and partial pol sequences showing the worst. The lowest sampling depths (20% and 5%) greatly reduced the accuracy of tree reconstruction and showed high variability among replicates, especially when using the shortest gene datasets. In conclusion, using longer sequences derived from nearly whole genomes will improve the reliability of phylogenetic reconstruction. With low sample coverage, results can be highly variable, particularly when based on short sequences.
AU - Yebra,G
AU - Hodcroft,EB
AU - Ragonnet-Cronin,ML
AU - Pillay,D
AU - Brown,AJL
AU - PANGEAHIV,Consortium
AU - ICONIC,Project
DO - 10.1038/srep39489
PY - 2016///
SN - 2045-2322
TI - Using nearly full-genome HIV sequence data improves phylogeny reconstruction in a simulated epidemic.
T2 - Scientific Reports
UR - http://dx.doi.org/10.1038/srep39489
UR - https://www.ncbi.nlm.nih.gov/pubmed/28008945
UR - http://hdl.handle.net/10044/1/50246
VL - 6
ER -