Publications
100 results found
Walter KS, Colijn C, Cohen T, et al., 2020, Genomic variant-identification methods may alter Mycobacterium tuberculosis transmission inferences, MICROBIAL GENOMICS, Vol: 6, ISSN: 2057-5858
- Author Web Link
- Cite
- Citations: 8
Tindale LC, Stockdale JE, Coombe M, et al., 2020, Evidence for transmission of COVID-19 prior to symptom onset, ELIFE, Vol: 9, ISSN: 2050-084X
- Author Web Link
- Cite
- Citations: 150
Harrow GL, Lees JA, Hanage WP, et al., 2020, Negative frequency-dependent selection and asymmetrical transformation stabilise multi-strain bacterial population structures, Publisher: Cold Spring Harbor Laboratory
<jats:title>Abstract</jats:title><jats:p><jats:italic>Streptococcus pneumoniae</jats:italic>can be split into multiple strains, each with a characteristic combination of core and accessory genome variation, able to co-circulate and compete within the same hosts. Previous analyses of epidemiological datasets suggested the short-term vaccine-associated dynamics of<jats:italic>S. pneumoniae</jats:italic>strains may be mediated through multi-locus negative frequency-dependent selection (NFDS), acting to maintain accessory loci at equilibrium frequencies. To test whether this model could explain how such multi-strain populations were generated, it was modified to incorporate recombination. The outputs of simulations featuring symmetrical recombination were compared with genomic data on locus frequencies and distributions between genotypes, pairwise genetic distances and tree shape. These demonstrated NFDS prevented the loss of variation through neutral drift, but generated unstructured populations of diverse isolates. Making recombination asymmetrical, favouring deletion of accessory loci over insertion, alongside multi-locus NFDS significantly improved the fit to genomic data. In a population at equilibrium, structuring into multiple strains was stable due to outbreeding depression, resulting from recombinants with reduced accessory genomes having lower fitness than their parental genotypes. As many bacteria inhibit the integration of insertions into their chromosomes, this combination of asymmetrical recombination and multi-locus NFDS may underlie the co-existence of strains within a single ecological niche.</jats:p>
Mulberry N, Rutherford A, Colijn C, 2020, Systematic comparison of coexistence in models of drug-sensitive and drug-resistant pathogen strains, THEORETICAL POPULATION BIOLOGY, Vol: 133, Pages: 150-158, ISSN: 0040-5809
- Author Web Link
- Cite
- Citations: 5
Metzig C, Gould M, Noronha R, et al., 2020, Classification of origin with feature selection and network construction for folk tunes, PATTERN RECOGNITION LETTERS, Vol: 133, Pages: 356-364, ISSN: 0167-8655
- Author Web Link
- Cite
- Citations: 1
Hayati M, Biller P, Colijn C, 2020, Predicting the short-term success of human influenza virus variants with machine learning, PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, Vol: 287, ISSN: 0962-8452
- Author Web Link
- Cite
- Citations: 4
Metzig C, Colijn C, 2020, A maximum entropy method for the prediction of size distributions, Entropy, Vol: 22, Pages: 1-15, ISSN: 1099-4300
We propose a method to derive the stationary size distributions of a system, and the degreedistributions of networks, using maximisation of the Gibbs-Shannon entropy. We apply this to apreferential attachment-type algorithm for systems of constant size, which contains exit of balls andurns (or nodes and edges for the network case). Knowing mean size (degree) and turnover rate, thepower law exponent and exponential cutoff can be derived. Our results are confirmed by simulationsand by computation of exact probabilities. We also apply this entropy method to reproduce existingresults like the Maxwell-Boltzmann distribution for the velocity of gas particles, the Barabasi-Albertmodel and multiplicative noise systems.
Colijn C, Corander J, Croucher NJ, 2020, Designing ecologically optimized pneumococcal vaccines using population genomics, Nature Microbiology, Vol: 5, Pages: 473-485, ISSN: 2058-5276
Streptococcus pneumoniae (the pneumococcus) is a common nasopharyngeal commensal that can cause invasive pneumococcal disease (IPD). Each component of current protein–polysaccharide conjugate vaccines (PCVs) generally induces immunity specific to one of the approximately 100 pneumococcal serotypes, and typically eliminates it from carriage and IPD through herd immunity. Overall carriage rates remain stable owing to replacement by non-PCV serotypes. Consequently, the net change in IPD incidence is determined by the relative invasiveness of the pre- and post-PCV-carried pneumococcal populations. In the present study, we identified PCVs expected to minimize the post-vaccine IPD burden by applying Bayesian optimization to an ecological model of serotype replacement that integrated epidemiological and genomic data. We compared optimal formulations for reducing infant-only or population-wide IPD, and identified potential benefits to including non-conserved pneumococcal carrier proteins. Vaccines were also devised to minimize IPD resistant to antibiotic treatment, despite the ecological model assuming that resistance levels in the carried population would be preserved. We found that expanding infant-administered PCV valency is likely to result in diminishing returns, and that complementary pairs of infant- and adult-administered vaccines could be a superior strategy. PCV performance was highly dependent on the circulating pneumococcal population, further highlighting the advantages of a diversity of anti-pneumococcal vaccination strategies.
Knight GM, Davies NG, Colijn C, et al., 2019, Mathematical modelling for antibiotic resistance control policy: do we know enough?, BMC INFECTIOUS DISEASES, Vol: 19
- Author Web Link
- Cite
- Citations: 24
Xu Y, Cancino-Munoz I, Torres-Puente M, et al., 2019, High-resolution mapping of tuberculosis transmission: Whole genome sequencing and phylogenetic modelling of a cohort from Valencia Region, Spain, PLOS MEDICINE, Vol: 16, ISSN: 1549-1277
- Author Web Link
- Cite
- Citations: 35
Hall MD, Colijn C, 2019, Transmission Trees on a Known Pathogen Phylogeny: Enumeration and Sampling, MOLECULAR BIOLOGY AND EVOLUTION, Vol: 36, Pages: 1333-1343, ISSN: 0737-4038
- Author Web Link
- Cite
- Citations: 8
Stimson J, Gardy J, Mathema B, et al., 2019, Beyond the SNP Threshold: Identifying Outbreak Clusters Using Inferred Transmissions, MOLECULAR BIOLOGY AND EVOLUTION, Vol: 36, Pages: 587-603, ISSN: 0737-4038
- Author Web Link
- Cite
- Citations: 56
Mabud TS, Delgado Alves MDL, Ko AI, et al., 2019, Evaluating strategies for control of tuberculosis in prisons and prevention of spillover into communities: An observational and modeling study from Brazil (vol 16, e1002737, 2019), PLOS MEDICINE, Vol: 16, ISSN: 1549-1277
Metzig C, Ratmann O, Bezemer D, et al., 2019, Phylogenies from dynamic networks, PLoS Computational Biology, Vol: 15, ISSN: 1553-734X
The relationship between the underlying contact network over which a pathogen spreads and the pathogen phylogenetic trees that are obtained presents an opportunity to use sequence data to learn about contact networks that are difficult to study empirically. However, this relationship is not explicitly known and is usually studied in simulations, often with the simplifying assumption that the contact network is static in time, though human contact networks are dynamic. We simulate pathogen phylogenetic trees on dynamic Erdős-Renyi random networks and on two dynamic networks with skewed degree distribution, of which one is additionally clustered. We use tree shape features to explore how adding dynamics changes the relationships between the overall network structure and phylogenies. Our tree features include the number of small substructures (cherries, pitchforks) in the trees, measures of tree imbalance (Sackin index, Colless index), features derived from network science (diameter, closeness), as well as features using the internal branch lengths from the tip to the root. Using principal component analysis we find that the network dynamics influence the shapes of phylogenies, as does the network type. We also compare dynamic and time-integrated static networks. We find, in particular, that static network models like the widely used Barabasi-Albert model can be poor approximations for dynamic networks. We explore the effects of mis-specifying the network on the performance of classifiers trained identify the transmission rate (using supervised learning methods). We find that both mis-specification of the underlying network and its parameters (mean degree, turnover rate) have a strong adverse effect on the ability to estimate the transmission parameter. We illustrate these results by classifying HIV trees with a classifier that we trained on simulated trees from different networks, infection rates and turnover rates. Our results point to the importance of correctly est
Mabud TS, Delgado Alves MDL, Ko AI, et al., 2019, Evaluating strategies for control of tuberculosis in prisons and prevention of spillover into communities: An observational and modeling study from Brazil, PLOS MEDICINE, Vol: 16, ISSN: 1549-1277
- Author Web Link
- Cite
- Citations: 30
Ayabina D, Ronning JO, Alfsnes K, et al., 2018, Genome-based transmission modelling separates imported tuberculosis from recent transmission within an immigrant population, MICROBIAL GENOMICS, Vol: 4, ISSN: 2057-5858
- Author Web Link
- Cite
- Citations: 5
Yang C, Lu L, Warren JL, et al., 2018, Internal migration and transmission dynamics of tuberculosis in Shanghai, China: an epidemiological, spatial, genomic analysis, LANCET INFECTIOUS DISEASES, Vol: 18, Pages: 788-795, ISSN: 1473-3099
- Author Web Link
- Cite
- Citations: 47
Lees JA, Kendall M, Parkhill J, et al., 2018, Evaluation of phylogenetic reconstruction methods using bacterial whole genomes: a simulation based study, Wellcome Open Research, Vol: 3, Pages: 33-33, ISSN: 2398-502X
Background: Phylogenetic reconstruction is a necessary first step in many analyses which use whole genome sequence data from bacterial populations. There are many available methods to infer phylogenies, and these have various advantages and disadvantages, but few unbiased comparisons of the range of approaches have been made. Methods: We simulated data from a defined "true tree" using a realistic evolutionary model. We built phylogenies from this data using a range of methods, and compared reconstructed trees to the true tree using two measures, noting the computational time needed for different phylogenetic reconstructions. We also used real data from Streptococcus pneumoniae alignments to compare individual core gene trees to a core genome tree. Results: We found that, as expected, maximum likelihood trees from good quality alignments were the most accurate, but also the most computationally intensive. Using less accurate phylogenetic reconstruction methods, we were able to obtain results of comparable accuracy; we found that approximate results can rapidly be obtained using genetic distance based methods. In real data we found that highly conserved core genes, such as those involved in translation, gave an inaccurate tree topology, whereas genes involved in recombination events gave inaccurate branch lengths. We also show a tree-of-trees, relating the results of different phylogenetic reconstructions to each other. Conclusions: We recommend three approaches, depending on requirements for accuracy and computational time. Quicker approaches that do not perform full maximum likelihood optimisation may be useful for many analyses requiring a phylogeny, as generating a high quality input alignment is likely to be the major limiting factor of accurate tree topology. We have publicly released our simulated data and code to enable further comparisons.
Kendall ML, Ayabina P, Xu Y, et al., 2018, Estimating Transmission from Genetic and Epidemiological Data: A Metric to Compare Transmission Trees, Statistical Science, Vol: 33, Pages: 70-85, ISSN: 0883-4237
Reconstructing who infected whom is a central challenge in analysing epidemiological data. Recently, advances in sequencing technology have led to increasing interest in Bayesian approaches to inferring who infected whom using genetic data from pathogens. The logic behind such approaches is that isolates that are nearly genetically identical are more likely to have been recently transmitted than those that are very different. A number of methods have been developed to perform this inference. However, testing their convergence, examining posterior sets of transmission trees and comparing methods’ performance are challenged by the fact that the object of inference—the transmission tree—is a complicated discrete structure. We introduce a metric on transmission trees to quantify distances between them. The metric can accommodate trees with unsampled individuals, and highlights differences in the source case and in the number of infections per infector. We illustrate its performance on simple simulated scenarios and on posterior transmission trees from a TB outbreak. We find that the metric reveals where the posterior is sensitive to the priors, and where collections of trees are composed of distinct clusters. We use the metric to define median trees summarising these clusters. Quantitative tools to compare transmission trees to each other will be required for assessing MCMC convergence, exploring posterior trees and benchmarking diverse methods as this field continues to mature.
Yaesoubi R, Trotter C, Colijn C, et al., 2018, The cost-effectiveness of alternative vaccination strategies for polyvalent meningococcal vaccines in Burkina Faso: A transmission dynamic modeling study, PLOS MEDICINE, Vol: 15, ISSN: 1549-1676
- Author Web Link
- Cite
- Citations: 10
Grandjean L, Gilman RH, Iwamoto T, et al., 2017, Convergent evolution and topologically disruptive polymorphisms among multidrug-resistant tuberculosis in Peru., PLoS ONE, Vol: 12, Pages: e0189838-e0189838, ISSN: 1932-6203
BACKGROUND: Multidrug-resistant tuberculosis poses a major threat to the success of tuberculosis control programs worldwide. Understanding how drug-resistant tuberculosis evolves can inform the development of new therapeutic and preventive strategies. METHODS: Here, we use novel genome-wide analysis techniques to identify polymorphisms that are associated with drug resistance, adaptive evolution and the structure of the phylogenetic tree. A total of 471 samples from different patients collected between 2009 and 2013 in the Lima suburbs of Callao and Lima South were sequenced on the Illumina MiSeq platform with 150bp paired-end reads. After alignment to the reference H37Rv genome, variants were called using standardized methodology. Genome-wide analysis was undertaken using custom written scripts implemented in R software. RESULTS: High quality homoplastic single nucleotide polymorphisms were observed in genes known to confer drug resistance as well as genes in the Mycobacterium tuberculosis ESX secreted protein pathway, pks12, and close to toxin/anti-toxin pairs. Correlation of homoplastic variant sites identified that many were significantly correlated, suggestive of epistasis. Variation in genes coding for ESX secreted proteins also significantly disrupted phylogenetic structure. Mutations in ESX genes in key antigenic epitope positions were also found to disrupt tree topology. CONCLUSION: Variation in these genes have a biologically plausible effect on immunogenicity and virulence. This makes functional characterization warranted to determine the effects of these polymorphisms on bacterial fitness and transmission.
Sartelli M, Weber DG, Ruppe E, et al., 2017, Erratum to: Antimicrobials: a global alliance for optimizing their rational use in intra-abdominal infections (AGORA), World Journal of Emergency Surgery, Vol: 12, ISSN: 1749-7922
Cobey S, Baskerville EB, Colijn C, et al., 2017, Host population structure and treatment frequency maintain balancing selection on drug resistance, JOURNAL OF THE ROYAL SOCIETY INTERFACE, Vol: 14, ISSN: 1742-5689
- Author Web Link
- Cite
- Citations: 22
Fyson N, King J, Belcher T, et al., 2017, A curated genome-scale metabolic model of Bordetella pertussis metabolism, PLOS COMPUTATIONAL BIOLOGY, Vol: 13
- Author Web Link
- Cite
- Citations: 3
Ratmann O, Wymant C, Colijn C, et al., 2017, HIV-1 full-genome phylogenetics of generalized epidemics in sub-Saharan Africa: impact of missing nucleotide characters in next-generation sequences, Aids Research and Human Retroviruses, Vol: 33, Pages: 1083-1098, ISSN: 1931-8405
To characterize HIV-1 transmission dynamics in regions where the burden of HIV-1 is greatest, the “Phylogenetics and Networks for Generalised HIV Epidemics in Africa” consortium (PANGEA-HIV) is sequencing full-genome viral isolates from across sub-Saharan Africa. We report the first 3,985 PANGEA-HIV consensus sequences from four cohort sites (Rakai Community Cohort Study, n = 2,833; MRC/UVRI Uganda, n = 701; Mochudi Prevention Project, n = 359; Africa Health Research Institute Resistance Cohort, n = 92). Next-generation sequencing success rates varied: more than 80% of the viral genome from the gag to the nef genes could be determined for all sequences from South Africa, 75% of sequences from Mochudi, 60% of sequences from MRC/UVRI Uganda, and 22% of sequences from Rakai. Partial sequencing failure was primarily associated with low viral load, increased for amplicons closer to the 3′ end of the genome, was not associated with subtype diversity except HIV-1 subtype D, and remained significantly associated with sampling location after controlling for other factors. We assessed the impact of the missing data patterns in PANGEA-HIV sequences on phylogeny reconstruction in simulations. We found a threshold in terms of taxon sampling below which the patchy distribution of missing characters in next-generation sequences (NGS) has an excess negative impact on the accuracy of HIV-1 phylogeny reconstruction, which is attributable to tree reconstruction artifacts that accumulate when branches in viral trees are long. The large number of PANGEA-HIV sequences provides unprecedented opportunities for evaluating HIV-1 transmission dynamics across sub-Saharan Africa and identifying prevention opportunities. Molecular epidemiological analyses of these data must proceed cautiously because sequence sampling remains below the identified threshold and a considerable negative impact of missing characters on phyloge
Klinkenberg D, Backer JA, Didelot X, et al., 2017, Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks, Plos Computational Biology, Vol: 13, ISSN: 1553-7358
Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more conf
Jombart T, Kendall M, Almagro-Garcia J, et al., 2017, Treespace: statistical exploration of landscapes of phylogenetic trees, Molecular Ecology Resources, Vol: 17, Pages: 1385-1392, ISSN: 1755-0998
The increasing availability of large genomic data sets as well as the advent of Bayesian phylogenetics facilitates the investigation of phylogenetic incongruence, which can result in the impossibility of representing phylogenetic relationships using a single tree. While sometimes considered as a nuisance, phylogenetic incongruence can also reflect meaningful biological processes as well as relevant statistical uncertainty, both of which can yield valuable insights in evolutionary studies. We introduce a new tool for investigating phylogenetic incongruence through the exploration of phylogenetic tree landscapes. Our approach, implemented in the R package treespace, combines tree metrics and multivariate analysis to provide low-dimensional representations of the topological variability in a set of trees, which can be used for identifying clusters of similar trees and group-specific consensus phylogenies. treespace also provides a user-friendly web interface for interactive data analysis and is integrated alongside existing standards for phylogenetics. It fills a gap in the current phylogenetics toolbox in R and will facilitate the investigation of phylogenetic results.
Colijn C, Plazzotta G, 2017, A metric on phylogenetic tree shapes, Systematic Biology, Vol: 67, Pages: 113-126, ISSN: 1076-836X
The shapes of evolutionary trees are influenced by the nature of the evolutionary process but comparisons of trees from different processes are hindered by the challenge of completely describing tree shape. We present a full characterization of the shapes of rooted branching trees in a form that lends itself to natural tree comparisons. We use this characterization to define a metric, in the sense of a true distance function, on tree shapes. The metric distinguishes trees from random models known to produce different tree shapes. It separates trees derived from tropical versus USA influenza A sequences, which reflect the differing epidemiology of tropical and seasonal flu. We describe several metrics based on the same core characterization, and illustrate how to extend the metric to incorporate trees’ branch lengths or other features such as overall imbalance. Our approach allows us to construct addition and multiplication on trees, and to create a convex metric on tree shapes which formally allows computation of average tree shapes.
Colijn C, Jones N, Johnston I, et al., 2017, Towards precision healthcare: context and mathematical challenges, Frontiers in Physiology, Vol: 8, ISSN: 1664-042X
Precision medicine refers to the idea of delivering the right treatment to the right patient at the right time, usually with a focus on a data-centred approach to this task. In this perspective piece, we use the term "precision healthcare" to describe the development of precision approaches that bridge from the individual to the population, taking advantage of individual-level data, but also taking the social context into account. These problems give rise to a broad spectrum of technical, scientific, policy, ethical and social challenges, and new mathematical techniques will be required to meet them. To ensure that the science underpin-ning "precision" is robust, interpretable and well-suited to meet the policy, ethical and social questions that such approaches raise, the mathematical methods for data analysis should be transparent, robust and able to adapt to errors and uncertainties. In particular, precision methodologies should capture the complexity of data, yet produce tractable descriptions at the relevant resolution while preserving intelligibility and traceability, so that they can be used by practitioners to aid decision-making. Through several case studies in this domain of precision healthcare, we argue that this vision requires the development of new mathematical frameworks, both in modelling and in data analysis and interpretation.
Didelot X, Fraser C, Gardy J, et al., 2017, Genomic infectious disease epidemiology in partially sampled and ongoing outbreaks, Molecular Biology and Evolution, Vol: 34, Pages: 997-1007, ISSN: 1537-1719
Genomic data is increasingly being used to understand infectious disease epidemiology. Isolates from a given outbreak are sequenced, and the patterns of shared variation are used to infer which isolates within the outbreak are most closely related to each other. Unfortunately, thephylogenetic trees typically used to represent this variation are not directly informative about who infected whom { a phylogenetic tree is not a transmission tree. However, a transmission tree can be inferred from a phylogeny while accounting for within-host genetic diversity by colouring the branches of a phylogeny according to which host those branches were in. Here we extend this approach and show that it can be applied to partially sampled and ongoing outbreaks. This requires computing the correct probability of an observed transmission tree and we herein demonstrate how to do this for a large class of epidemiological models. Wealso demonstrate how the branch colouring approach can incorporate a variable number of unique colours to represent unsampled intermediates in transmission chains. The resulting algorithm is a reversible jump Monte-Carlo Markov Chain, which we apply to both simulated data and real data from an outbreak of tuberculosis. By accounting for unsampled cases and an outbreak which may not have reached its end, our method is uniquely suited to use in a public health environment during real-time outbreak investigations. We implemented this transmission tree inference methodology in an R package called TransPhylo, which is freely available from https://github.com/xavierdidelot/TransPhylo
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.