Imperial College London


Faculty of MedicineSchool of Public Health

Senior Lecturer in Bacterial Genomics



+44 (0)20 7594 3820n.croucher




UG5Norfolk PlaceSt Mary's Campus





Publication Type

76 results found

De Ste Croix M, Chen Y, Vacca I, Manso AS, Johnston C, Polard P, Kwun MJ, Bentley SD, Croucher NJ, Bayliss CD, Haigh RD, Oggioni MRet al., 2019, Recombination of the phase variable spnIII locus is independent of all known pneumococcal site-specific recombinases, Journal of Bacteriology, Vol: 201, ISSN: 0021-9193

Streptococcus pneumoniae is one of the world's leading bacterial pathogens, causing pneumonia, septicaemia and meningitis. In recent years it has been shown that genetic rearrangements in a type I restriction-modification system (SpnIII) can impact colony morphology and gene expression. By generating a large panel of mutant strains, we have confirmed a previously reported result that the CreX (also known as IvrR and PsrA) recombinase found within the locus is not essential for hsdS inversions. In addition, mutants of homologous recombination pathways also undergo hsdS inversions. In this work we have shown that these genetic rearrangements, which result in different patterns of genome methylation, occur across a wide variety of serotypes and sequence types including two strains (a 19F and a 6B strain) naturally lacking CreX. Our gene expression analysis, by RNAseq, confirm that the level of creX expression is impacted by these genomic rearrangements. In addition, we have shown that the frequency of hsdS recombination is temperature dependent. Most importantly we have demonstrated that the other known pneumococcal site-specific recombinases XerD, XerS and SPD_0921 are not involved in spnIII recombination, suggesting a currently unknown mechanism is responsible for the recombination of these phase variable type I systems.ImportanceStreptococcus pneumoniae is a leading cause of pneumonia, septicaemia and meningitis. The discovery that genetic rearrangements in a type I restriction modification locus can impact gene regulation and colony morphology have led to a new understanding of how this pathogen switches from harmless coloniser to invasive pathogen. These rearrangements, which alter the DNA specificity of the type I restriction modification enzyme, occur across many different pneumococcal serotypes and sequence types, and in the absence of all known pneumococcal site-specific recombinases. This finding suggests that this is a truly global mechanism of pneumococcal

Journal article

Lehtinen S, Chewapreecha C, Lees J, Hanage WP, Lipsitch M, Croucher NJ, Bentley SD, Turner P, Fraser C, Mostowy RJet al., 2019, Horizontal gene transfer rate is not the primary determinant of observed antibiotic resistance frequencies in Streptococcus pneumoniae

<jats:p>The extent to which evolution is constrained by the rate at which horizontal gene transfer (HGT) allows DNA to move between genetic lineages is an open question, which we address in the context of antibiotic resistance in <jats:italic>Streptococcus pneumoniae</jats:italic>. We analyze microbiological, genomic and epidemiological data from the largest-to-date sequenced pneumococcal carriage study in 955 infants from a refugee camp on the Thailand-Myanmar border. Using a unified framework, we simultaneously test prior hypotheses on rates of HGT and a key evolutionary covariate (duration of carriage) as determinants of resistance frequencies. We conclude that in this setting, there is only weak evidence for the rate of HGT playing a role in the evolutionary dynamics of resistance. Instead, observed resistance frequencies are best explained as the outcome of selection acting on a pool of variants, irrespective of the rate at which resistance determinants move between genetic lineages.</jats:p>

Journal article

Lees JA, Ferwerda B, Kremer PHC, Wheeler NE, Seron MV, Croucher NJ, Gladstone RA, Bootsma HJ, Rots NY, Wijmega-Monsuur AJ, Sanders EAM, Trzcinski K, Wyllie AL, Zwinderman AH, van den Berg LH, van Rheenen W, Veldink JH, Harboe ZB, Lundbo LF, de Groot LCPGM, van Schoor NM, van der Velde N, Angquist LH, Sorensen TIA, Nohr EA, Mentzer AJ, Mills TC, Knight JC, du Plessis M, Nzenze S, Weiser JN, Parkhill J, Madhi S, Benfield T, von Gottberg A, van der Ende A, Brouwer MC, Barrett JC, Bentley SD, van de Beek Det al., 2019, Joint sequencing of human and pathogen genomes reveals the genetics of pneumococcal meningitis, NATURE COMMUNICATIONS, Vol: 10, ISSN: 2041-1723

Journal article

Lehtinen S, Blanquart F, Lipsitch M, Fraser C, Bentley SD, Croucher NJ, Lees JA, Turner Pet al., 2019, On the evolutionary ecology of multidrug resistance in bacteria, PLoS Pathogens, Vol: 15, ISSN: 1553-7366

Resistance against different antibiotics appears on the same bacterial strains more oftenthan expected by chance, leading to high frequencies of multidrug resistance. There are multiple explanations for this observation, but these tend to be specific to subsets of antibioticsand/or bacterial species, whereas the trend is pervasive. Here, we consider the questionin terms of strain ecology: explaining why resistance to different antibiotics is often seen onthe same strain requires an understanding of the competition between strains with differentresistance profiles. This work builds on models originally proposed to explain another aspectof strain competition: the stable coexistence of antibiotic sensitivity and resistance observedin a number of bacterial species. We first identify a partial structural similarity in these models: either strain or host population structure stratifies the pathogen population into evolutionarily independent sub-populations and introduces variation in the fitness effect of resistancebetween these sub-populations, thus creating niches for sensitivity and resistance. We thengeneralise this unified underlying model to multidrug resistance and show that models withthis structure predict high levels of association between resistance to different drugs andhigh multidrug resistance frequencies. We test predictions from this model in six bacterialdatasets and find them to be qualitatively consistent with observed trends. The higher thanexpected frequencies of multidrug resistance are often interpreted as evidence that thesestrains are out-competing strains with lower resistance multiplicity. Our work provides analternative explanation that is compatible with long-term stability in resistance frequencies.

Journal article

Gladstone RA, Lo SW, Lees JA, Croucher NJ, van Tonder AJ, Corander J, Page AJ, Marttinen P, Bentley LJ, Ochoa TJ, Ho PL, du Plessis M, Cornick JE, Kwambana-Adams B, Benisty R, Nzenze SA, Madhi SA, Hawkins PA, Everett DB, Antonio M, Dagan R, Klugman KP, von Gottberg A, McGee L, Breiman RF, Bentley SDet al., 2019, International genomic definition of pneumococcal lineages, to contextualise disease, antibiotic resistance and vaccine impact, EBioMedicine, Vol: 43, Pages: 338-346, ISSN: 2352-3964

BackgroundPneumococcal conjugate vaccines have reduced the incidence of invasive pneumococcal disease, caused by vaccine serotypes, but non-vaccine-serotypes remain a concern. We used whole genome sequencing to study pneumococcal serotype, antibiotic resistance and invasiveness, in the context of genetic background.MethodsOur dataset of 13,454 genomes, combined with four published genomic datasets, represented Africa (40%), Asia (25%), Europe (19%), North America (12%), and South America (5%). These 20,027 pneumococcal genomes were clustered into lineages using PopPUNK, and named Global Pneumococcal Sequence Clusters (GPSCs). From our dataset, we additionally derived serotype and sequence type, and predicted antibiotic sensitivity. We then measured invasiveness using odds ratios that relating prevalence in invasive pneumococcal disease to carriage.FindingsThe combined collections (n = 20,027) were clustered into 621 GPSCs. Thirty-five GPSCs observed in our dataset were represented by >100 isolates, and subsequently classed as dominant-GPSCs. In 22/35 (63%) of dominant-GPSCs both non-vaccine serotypes and vaccine serotypes were observed in the years up until, and including, the first year of pneumococcal conjugate vaccine introduction.Penicillin and multidrug resistance were higher (p < .05) in a subset dominant-GPSCs (14/35, 9/35 respectively), and resistance to an increasing number of antibiotic classes was associated with increased recombination (R2 = 0.27 p < .0001). In 28/35 dominant-GPSCs, the country of isolation was a significant predictor (p < .05) of its antibiogram (mean misclassification error 0.28, SD ± 0.13).We detected increased invasiveness of six genetic backgrounds, when compared to other genetic backgrounds expressing the same serotype. Up to 1.6-fold changes in invasiveness odds ratio were observed.InterpretationWe define GPSCs that can be assigned to any pneumococcal genomic dataset, to aid international comparisons. Existing n

Journal article

McNally A, Kallonen T, Connor C, Abudahab K, Aanensen DM, Horner C, Peacock SJ, Parkhill J, Croucher NJ, Corander Jet al., 2019, Diversification of colonization factors in a multidrug-resistant escherichia coli lineage evolving under negative frequency-dependent selection, mBio, Vol: 10, ISSN: 2150-7511

Escherichia coli is a major cause of bloodstream and urinary tract infections globally. The wide dissemination of multidrug-resistant (MDR) strains of extraintestinal pathogenic E. coli (ExPEC) poses a rapidly increasing public health burden due to narrowed treatment options and increased risk of failure to clear an infection. Here, we present a detailed population genomic analysis of the ExPEC ST131 clone, in which we seek explanations for its success as an emerging pathogenic strain beyond the acquisition of antimicrobial resistance (AMR) genes. We show evidence for evolution toward separate ecological niches for the main clades of ST131 and differential evolution of anaerobic metabolism, key colonization, and virulence factors. We further demonstrate that negative frequency-dependent selection acting across accessory loci is a major mechanism that has shaped the population evolution of this pathogen.IMPORTANCE Infections with multidrug-resistant (MDR) strains of Escherichia coli are a significant global public health concern. To combat these pathogens, we need a deeper understanding of how they evolved from their background populations. By understanding the processes that underpin their emergence, we can design new strategies to limit evolution of new clones and combat existing clones. By combining population genomics with modelling approaches, we show that dominant MDR clones of E. coli are under the influence of negative frequency-dependent selection, preventing them from rising to fixation in a population. Furthermore, we show that this selection acts on genes involved in anaerobic metabolism, suggesting that this key trait, and the ability to colonize human intestinal tracts, is a key step in the evolution of MDR clones of E. coli.

Journal article

Mitchell PK, Azarian T, Croucher NJ, Callendrello A, Thompson CM, Pelton S, Lipsitch M, Hanage WPet al., 2019, Population genomics of pneumococcal carriage in Massachusetts children following introduction of PCV-13, Microbial Genomics, Vol: 5, ISSN: 2057-5858

The 13-valent pneumococcal conjugate vaccine (PCV-13) was introduced in the United States in 2010. Using a large paediatric carriage sample collected from shortly after the introduction of PCV-7 to several years after the introduction of PCV-13, we investigate alterations in the composition of the pneumococcal population following the introduction of PCV-13, evaluating the extent to which the post-vaccination non-vaccine type (NVT) population mirrors that from prior to vaccine introduction and the effect of PCV-13 on vaccine type lineages. Draft genome assemblies from 736 newly sequenced and 616 previously published pneumococcal carriage isolates from children in Massachusetts between 2001 and 2014 were analysed. Isolates were classified into one of 22 sequence clusters (SCs) on the basis of their core genome sequence. We calculated the SC diversity for each sampling period as the probability that any two randomly drawn isolates from that period belong to different SCs. The sampling period immediately after the introduction of PCV-13 (2011) was found to have higher diversity than preceding (2007) or subsequent (2014) sampling periods {Simpson’s D 2007: 0.915 [95 % confidence interval (CI) 0.901, 0.929]; 2011:  0.935 [0.927, 0.942]; 2014 :  0.912 [0.901, 0.923]}. Amongst NVT isolates, we found the distribution of SCs in 2011 to be significantly different from that in 2007 or 2014 (Fisher’s exact test P=0.018, 0.0078), but did not find a difference comparing 2007 to 2014 (Fisher’s exact test P=0.24), indicating greater similarity between samples separated by a longer time period than between samples from closer time periods. We also found changes in the accessory gene content of the NVT population between 2007 and 2011 to have been reduced by 2014. Amongst the new serotypes targeted by PCV-13, four were present in our sample. The proportion of our sample composed of PCV-13-only vaccine serotypes 19A, 6C and 7F decreased between 2007 and 2014, but no

Journal article

Lees JA, Harris SR, Tonkin-Hill G, Gladstone RA, Lo SW, Weiser JN, Corander J, Bentley SD, Croucher NJet al., 2019, Fast and flexible bacterial genomic epidemiology with PopPUNK, Genome Research, Vol: 29, Pages: 304-316, ISSN: 1088-9051

The routine use of genomics for disease surveillance provides the opportunity for high-resolution bacterial epidemiology. Current whole-genome clustering and multilocus typing approaches do not fully exploit core and accessory genomic variation, and they cannot both automatically identify, and subsequently expand, clusters of significantly similar isolates in large data sets spanning entire species. Here, we describe PopPUNK (Population Partitioning Using Nucleotide K -mers), a software implementing scalable and expandable annotation- and alignment-free methods for population analysis and clustering. Variable-length k-mer comparisons are used to distinguish isolates' divergence in shared sequence and gene content, which we demonstrate to be accurate over multiple orders of magnitude using data from both simulations and genomic collections representing 10 taxonomically widespread species. Connections between closely related isolates of the same strain are robustly identified, despite interspecies variation in the pairwise distance distributions that reflects species' diverse evolutionary patterns. PopPUNK can process 103-104 genomes in a single batch, with minimal memory use and runtimes up to 200-fold faster than existing model-based methods. Clusters of strains remain consistent as new batches of genomes are added, which is achieved without needing to reanalyze all genomes de novo. This facilitates real-time surveillance with consistent cluster naming between studies and allows for outbreak detection using hundreds of genomes in minutes. Interactive visualization and online publication is streamlined through the automatic output of results to multiple platforms. PopPUNK has been designed as a flexible platform that addresses important issues with currently used whole-genome clustering and typing methods, and has potential uses across bacterial genetics and public health research.

Journal article

Cremers AJH, Mobegi FM, van der Gaast-de Jongh C, van Weert M, van Opzeeland FJ, Vehkala M, Knol MJ, Bootsma HJ, Valimaki N, Croucher NJ, Meis JF, Bentley S, van Hijum SAFT, Corander J, Zomer AL, Ferwerda G, de Jonge MIet al., 2019, The Contribution of Genetic Variation of Streptococcus pneumoniae to the Clinical Manifestation of Invasive Pneumococcal Disease, CLINICAL INFECTIOUS DISEASES, Vol: 68, Pages: 61-69, ISSN: 1058-4838

Journal article

Campo JJ, Le TQ, Pablo JV, Hung C, Teng AA, Tettelin H, Tate A, Hanage WP, Alderson MR, Liang X, Malley R, Lipsitch M, Croucher NJet al., 2018, Panproteome-wide analysis of antibody responses to whole cell pneumococcal vaccination, Elife, Vol: 7, ISSN: 2050-084X

Pneumococcal whole cell vaccines (WCVs) could cost-effectively protect against a greater strain diversity than current capsule-based vaccines. Immunoglobulin G (IgG) responses to a WCV were characterised by applying longitudinally-sampled sera, available from 35 adult placebo-controlled phase I trial participants, to a panproteome microarray. Despite individuals maintaining distinctive antibody 'fingerprints', responses were consistent across vaccinated cohorts. Seventy-two functionally distinct proteins were associated with WCV-induced increases in IgG binding. These shared characteristics with naturally immunogenic proteins, being enriched for transporters and cell wall metabolism enzymes, likely unusually exposed on the unencapsulated WCV's surface. Vaccine-induced responses were specific to variants of the diverse PclA, PspC and ZmpB proteins, whereas PspA- and ZmpA-induced antibodies recognised a broader set of alleles. Temporal variation in IgG levels suggested a mixture of anamnestic and novel responses. These reproducible increases in IgG binding a limited, but functionally diverse, set of conserved proteins indicate WCV could provide species-wide immunity.

Journal article

Kwun MJ, Oggioni MR, De Ste Croix M, Bentley SD, Croucher NJet al., 2018, Excision-reintegration at a pneumococcal phase-variable restriction-modification locus drives within- and between-strain epigenetic differentiation and inhibits gene acquisition., Nucleic Acids Research, Vol: 46, Pages: 11438-11453, ISSN: 0305-1048

Phase-variation of Type I restriction-modification systems can rapidly alter the sequence motifs they target, diversifying both the epigenetic patterns and endonuclease activity within clonally descended populations. Here, we characterize the Streptococcus pneumoniae SpnIV phase-variable Type I RMS, encoded by the translocating variable restriction (tvr) locus, to identify its target motifs, mechanism and regulation of phase variation, and effects on exchange of sequence through transformation. The specificity-determining hsdS genes were shuffled through a recombinase-mediated excision-reintegration mechanism involving circular intermediate molecules, guided by two types of direct repeat. The rate of rearrangements was limited by an attenuator and toxin-antitoxin system homologs that inhibited recombinase gene transcription. Target motifs for both the SpnIV, and multiple Type II, MTases were identified through methylation-sensitive sequencing of a panel of recombinase-null mutants. This demonstrated the species-wide diversity observed at the tvr locus can likely specify nine different methylation patterns. This will reduce sequence exchange in this diverse species, as the native form of the SpnIV RMS was demonstrated to inhibit the acquisition of genomic islands by transformation. Hence the tvr locus can drive variation in genome methylation both within and between strains, and limits the genomic plasticity of S. pneumoniae.

Journal article

Abudahab K, Prada JM, Yang Z, Bentley SD, Croucher NJ, Corander J, Aanensen DMet al., 2018, PANINI: Pangenome Neighbour Identification for Bacterial Populations., Microbial Genomics, Vol: 4, ISSN: 2057-5858

The standard workhorse for genomic analysis of the evolution of bacterial populations is phylogenetic modelling of mutations in the core genome. However, a notable amount of information about evolutionary and transmission processes in diverse populations can be lost unless the accessory genome is also taken into consideration. Here, we introduce panini (Pangenome Neighbour Identification for Bacterial Populations), a computationally scalable method for identifying the neighbours for each isolate in a data set using unsupervised machine learning with stochastic neighbour embedding based on the t-SNE (t-distributed stochastic neighbour embedding) algorithm. panini is browser-based and integrates with the Microreact platform for rapid online visualization and exploration of both core and accessory genome evolutionary signals, together with relevant epidemiological, geographical, temporal and other metadata. Several case studies with single- and multi-clone pneumococcal populations are presented to demonstrate the ability to identify biologically important signals from gene content data. panini is available at and code at

Journal article

Azarian T, Martinez PP, Arnold BJ, Grant LR, Corander J, Fraser C, Croucher NJ, Hammitt LL, Reid R, Santosham M, Weatherholtz RC, Bentley SD, OBrien KL, Lipsitch M, Hanage WPet al., 2018, Predicting evolution using frequency-dependent selection in bacterial populations

<jats:title>Abstract</jats:title><jats:p>Predicting how pathogen populations will change over time is challenging. Such has been the case with <jats:italic>Streptococcus pneumoniae</jats:italic>, an important human pathogen, and the pneumococcal conjugate vaccines (PCVs), which target only a fraction of the strains in the population. Here, we use the frequencies of accessory genes to accurately predict changes in the pneumococcal population after vaccination, hypothesizing that these frequencies reflect negative frequency-dependent selection (NFDS) on the gene products. We find that the standardized fitness of a strain estimated by an NFDS-based model at the time the vaccine is introduced accurately predicts the direction of the strain’s prevalence change observed after vaccine introduction. Further, we are able to accurately predict the equilibrium post-vaccine population composition and assess the migration and invasion capacity of emerging lineages. In general, we provide a method for predicting the impact of an intervention on pneumococcal populations and other bacterial pathogens for which NFDS is a driving force.</jats:p><jats:sec><jats:title>One Sentence Summary</jats:title><jats:p>We develop estimates of pneumococcal strain fitness based on the frequencies of accessory genes in a population, and test them using our ability to predict the impact of vaccination.</jats:p></jats:sec>

Thesis dissertation

Didelot X, Croucher NJ, Bentley SD, Harris SR, Wilson DJet al., 2018, Bayesian inference of ancestral dates on bacterial phylogenetic trees, Nucleic Acids Research, ISSN: 0305-1048

The sequencing and comparative analysis of a collection of bacterial genomes from a single species or lineage of interest can lead to key insights into its evolution, ecology or epidemiology. The tool of choice for such a study is often to build a phylogenetic tree, and more specifically when possible a dated phylogeny, in which the dates of all common ancestors are estimated. Here, we propose a new Bayesian methodology to construct dated phylogenies which is specifically designed for bacterial genomics. Unlike previous Bayesian methods aimed at building dated phylogenies, we consider that the phylogenetic relationships between the genomes have been previously evaluated using a standard phylogenetic method, which makes our methodology much faster and scalable. This two-step approach also allows us to directly exploit existing phylogenetic methods that detect bacterial recombination, and therefore to account for the effect of recombination in the construction of a dated phylogeny. We analysed many simulated datasets in order to benchmark the performance of our approach in a wide range of situations. Furthermore, we present applications to three different real datasets from recent bacterial genomic studies. Our methodology is implemented in a R package called BactDating which is freely available for download at

Journal article

Croucher NJ, Lochen A, Bentley SD, 2018, Pneumococcal Vaccines: Host Interactions, Population Dynamics, and Design Principles, ANNUAL REVIEW OF MICROBIOLOGY, VOL 72, Vol: 72, Pages: 521-549, ISSN: 0066-4227

Journal article

Puranen S, Pesonen M, Pensar J, Xu YY, Lees JA, Bentley SD, Croucher NJ, Corander Jet al., 2018, SuperDCA for genome-wide epistasis analysis, MICROBIAL GENOMICS, Vol: 4, ISSN: 2057-5858

Journal article

Croucher NJ, Apagyi KJ, Fraser C, 2018, Transformation asymmetry and the evolution of the bacterial accessory genome, Molecular Biology and Evolution, Vol: 35, Pages: 575-581, ISSN: 1537-1719

Bacterial transformation can insert or delete genomic islands (GIs), depending on the donor and recipient genotypes, if an homologous recombination spans the GI’s integration site and includes sufficiently long flanking homologous arms. Combining mathematical models of recombination with experiments using pneumococci found GI insertion rates declined geometrically with the GI’s size. The decrease in acquisition frequency with length (1.08x10−3 bp−1) was higher than a previous estimate of the analogous rate at which core genome recombinations terminated. Although most efficient for shorter GIs, transformation-mediated deletion frequencies did not vary consistently with GI length, with removal of 10 kb GIs approximately 50% as efficient as acquisition of base substitutions. Fragments of two kilobases, typical of transformation event sizes, could drive all these deletions independent of island length. The strong asymmetry of transformation, and its capacity to efficiently remove GIs, suggests non-mobile accessory loci will decline in frequency without preservation by selection.

Journal article

Wymant C, Blanquart F, Golubchik T, Gall A, Bakker M, Bezemer D, Croucher NJ, Hall M, Hillebregt M, Ong SH, Ratmann O, Albert J, Bannert N, Fellay J, Fransen K, Gourlay A, Grabowski MK, Gunsenheimer-Bartmeyer B, Günthard HF, Kivelä P, Kouyos R, Laeyendecker O, Liitsola K, Meyer L, Porter K, Ristola M, van Sighem A, Berkhout B, Cornelissen M, Kellam P, Reiss P, Fraser C, BEEHIVE Collaborationet al., 2018, Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver, Virus Evolution, Vol: 4, ISSN: 2057-1577

Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user's choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver's constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied sh

Journal article

Campo JJ, Croucher NJ, Turner P, Tate A, Le T, Pablo J, Hung C, Teng A, Hanage WP, Lipsitch M, Goldblatt D, Alderson M, Liang Xet al., 2017, UNDERSTANDING THE IMMUNE RESPONSE TO STREPTOCOCCUS PNEUMONIAE FROM VACCINATION AND CARRIAGE ON A PROTEOME SCALE, 65th Annual Meeting of the American-Society-of-Tropical-Medicine-and-Hygiene (ASTMH), Publisher: AMER SOC TROP MED & HYGIENE, Pages: 367-367, ISSN: 0002-9637

Conference paper

Corander J, Fraser C, Gutmann MU, Arnold B, Hanage WP, Bentley SD, Lipsitch M, Croucher NJet al., 2017, Frequency-dependent selection in vaccine-associated pneumococcal population dynamics, Nature Ecology and Evolution, Vol: 1, Pages: 1950-1960, ISSN: 2397-334X

Many bacterial species are composed of multiple lineages distinguished by extensive variation in gene content. These often cocirculate in the same habitat, but the evolutionary and ecological processes that shape these complex populations are poorly understood. Addressing these questions is particularly important for Streptococcus pneumoniae, a nasopharyngeal commensal and respiratory pathogen, because the changes in population structure associated with the recent introduction of partial-coverage vaccines have substantially reduced pneumococcal disease. Here we show that pneumococcal lineages from multiple populations each have a distinct combination of intermediate-frequency genes. Functional analysis suggested that these loci may be subject to negative frequency-dependent selection (NFDS) through interactions with other bacteria, hosts or mobile elements. Correspondingly, these genes had similar frequencies in four populations with dissimilar lineage compositions. These frequencies were maintained following substantial alterations in lineage prevalences once vaccination programmes began. Fitting a multilocus NFDS model of post-vaccine population dynamics to three genomic datasets using Approximate Bayesian Computation generated reproducible estimates of the influence of NFDS on pneumococcal evolution, the strength of which varied between loci. Simulations replicated the stable frequency of lineages unperturbed by vaccination, patterns of serotype switching and clonal replacement. This framework highlights how bacterial ecology affects the impact of clinical interventions.Accessory loci are shown to have similar frequencies in diverse Streptococcus pneumoniae populations, suggesting negative frequency-dependent selection drives post-vaccination population restructuring.

Journal article

Hadfield J, Croucher NJ, Goater RJ, Abudahab K, Aanensen DM, Harris SRet al., 2017, Phandango: an interactive viewer for bacterial population genomics., Bioinformatics, Vol: 34, Pages: 292-293, ISSN: 1367-4803

Summary: Fully exploiting the wealth of data in current bacterial population genomics datasets requires synthesising and integrating different types of analysis across millions of base pairs in hundreds or thousands of isolates. Current approaches often use static representations of phylogenetic, epidemiological, statistical and evolutionary analysis results that are difficult to relate to one another. Phandango is an interactive application running in a web browser allowing fast exploration of large-scale population genomics datasets combining the output from multiple genomic analysis methods in an intuitive and interactive manner. Availability: Phandango is a web application freely available for use at and includes a diverse collection of datasets as examples. Source code together with a detailed wiki page is available on GitHub at Contact:,

Journal article

De Ste Croix M, Vacca I, Kwun MJ, Ralph JD, Bentley SD, Haigh R, Croucher NJ, Oggioni MRet al., 2017, Phase-variable methylation and epigenetic regulation by type I restriction-modification systems, FEMS Microbiology Reviews, Vol: 41, Pages: S3-S15, ISSN: 0168-6445

Epigenetic modifications in bacteria, such as DNA methylation, have been shown to affect gene regulation, thereby generating cells that are isogenic but with distinctly different phenotypes. Restriction–modification (RM) systems contain prototypic methylases that are responsible for much of bacterial DNA methylation. This review focuses on a distinctive group of type I RM loci that , through phase variation, can modify their methylation target specificity and can thereby switch bacteria between alternative patterns of DNA methylation. Phase variation occurs at the level of the target recognition domains of the hsdS (specificity) gene via reversible recombination processes acting upon multiple hsdS alleles. We describe the global distribution of such loci throughout the prokaryotic kingdom and highlight the differences in loci structure across the various bacterial species. Although RM systems are often considered simply as an evolutionary response to bacteriophages, these multi-hsdS type I systems have also shown the capacity to change bacterial phenotypes. The ability of these RM systems to allow bacteria to reversibly switch between different physiological states, combined with the existence of such loci across many species of medical and industrial importance, highlights the potential of phase-variable DNA methylation to act as a global regulatory mechanism in bacteria.

Journal article

Lees JA, Croucher NJ, Goldblatt D, Nosten F, Parkhill J, Turner C, Turner P, Bentley SDet al., 2017, Genome-wide identification of lineage and locus specific variation associated with pneumococcal carriage duration, eLife, Vol: 6, ISSN: 2050-084X

Streptococcus pneumoniae is a leading cause of invasive disease in infants, especiallyin low-income settings. Asymptomatic carriage in the nasopharynx is a prerequisite for disease, butvariability in its duration is currently only understood at the serotype level. Here we developed amodel to calculate the duration of carriage episodes from longitudinal swab data, and combinedthese results with whole genome sequence data. We estimated that pneumococcal genomicvariation accounted for 63% of the phenotype variation, whereas the host traits considered here(age and previous carriage) accounted for less than 5%. We further partitioned this heritability intoboth lineage and locus effects, and quantified the amount attributable to the largest sources ofvariation in carriage duration: serotype (17%), drug-resistance (9%) and other significant locuseffects (7%). A pan-genome-wide association study identified prophage sequences as beingassociated with decreased carriage duration independent of serotype, potentially by disruption ofthe competence mechanism. These findings support theoretical models of pneumococcalcompetition and antibiotic resistance.

Journal article

Mostowy RJ, Croucher NJ, De Maio N, Chewapreecha C, Salter SJ, Turner P, Aanensen DM, Bentley SD, Didelot X, Fraser Cet al., 2017, Pneumococcal capsule synthesis locus cps as evolutionary hotspot with potential to generate novel serotypes by recombination, Molecular Biology and Evolution, Vol: 34, Pages: 2537-2554, ISSN: 1537-1719

Diversity of the polysaccharide capsule in Streptococcus pneumoniae -- main surface antigen and the target of the currently used pneumococcal vaccines -- constitutes a major obstacle in eliminating pneumococcal disease. Such diversity is genetically encoded by almost 100 variants of the capsule biosynthesis locus, cps. However, the evolutionary dynamics of the capsule remains not fully understood. Here, using genetic data from 4,519 bacterial isolates, we found cps to be an evolutionary hotspot with elevated substitution and recombination rates. These rates were a consequence of relaxed purifying selection and positive, diversifying selection acting at this locus, supporting the hypothesis that the capsule has an increased potential to generate novel diversity compared to the rest of the genome. Diversifying selection was particularly evident in the region of wzd/wze genes, which are known to regulate capsule expression and hence the bacterium's ability to cause disease. Using a novel, capsule-centred approach, we analysed the evolutionary history of twelve major serogroups. Such analysis revealed their complex diversification scenarios, which were principally driven by recombination with other serogroups and other streptococci. Patterns of recombinational exchanges between serogroups could not be explained by serotype frequency alone, thus pointing to non-random associations between co-colonising serotypes. Finally, we discovered a previously unobserved mosaic serotype 39X, which was confirmed to carry a viable and structurally novel capsule. Adding to previous discoveries of other mosaic capsules in densely sampled collections, these results emphasise the strong adaptive potential of the bacterium by its ability to generate novel antigenic diversity by recombination.

Journal article

Skwark MJ, Croucher NJ, Puranen S, Chewapreecha C, Pesonen M, Xu YY, Turner P, Harris SR, Beres SB, Musser JM, Parkhill J, Bentley SD, Aurell E, Corander Jet al., 2017, Interacting networks of resistance, virulence and core machinery genes identified by genome-wide epistasis analysis, PLoS Genetics, Vol: 13, ISSN: 1553-7390

Recent advances in the scale and diversity of population genomic datasets for bacteria now provide the potential for genome-wide patterns of co-evolution to be studied at the resolution of individual bases. Here we describe a new statistical method, genomeDCA, which uses recent advances in computational structural biology to identify the polymorphic loci under the strongest co-evolutionary pressures. We apply genomeDCA to two large population data sets representing the major human pathogens Streptococcus pneumoniae (pneumococcus) and Streptococcus pyogenes (group A Streptococcus). For pneumococcus we identified 5,199 putative epistatic interactions between 1,936 sites. Over three-quarters of the links were between sites within the pbp2x, pbp1a and pbp2b genes, the sequences of which are critical in determining non-susceptibility to beta-lactam antibiotics. A network-based analysis found these genes were also coupled to that encoding dihydrofolate reductase, changes to which underlie trimethoprim resistance. Distinct from these antibiotic resistance genes, a large network component of 384 protein coding sequences encompassed many genes critical in basic cellular functions, while another distinct component included genes associated with virulence. The group A Streptococcus (GAS) data set population represents a clonal population with relatively little genetic variation and a high level of linkage disequilibrium across the genome. Despite this, we were able to pinpoint two RNA pseudouridine synthases, which were each strongly linked to a separate set of loci across the chromosome, representing biologically plausible targets of co-selection. The population genomic analysis method applied here identifies statistically significantly co-evolving locus pairs, potentially arising from fitness selection interdependence reflecting underlying protein-protein interactions, or genes whose product activities contribute to the same phenotype. This discovery approach greatly enhance

Journal article

Mostowy R, Croucher NJ, Andam CP, Corander J, Hanage WP, Marttinen Pet al., 2017, Efficient Inference of Recent and Ancestral Recombination within Bacterial Populations, MOLECULAR BIOLOGY AND EVOLUTION, Vol: 34, Pages: 1167-1182, ISSN: 0737-4038

Prokaryotic evolution is affected by horizontal transfer of genetic material through recombination. Inference of an evolutionary tree of bacteria thus relies on accurate identification of the population genetic structure and recombination-derived mosaicism. Rapidly growing databases represent a challenge for computational methods to detect recombinations in bacterial genomes. We introduce a novel algorithm called fastGEAR which identifies lineages in diverse microbial alignments, and recombinations between them and from external origins. The algorithm detects both recent recombinations (affecting a few isolates) and ancestral recombinations between detected lineages (affecting entire lineages), thus providing insight into recombinations affecting deep branches of the phylogenetic tree. In simulations, fastGEAR had comparable power to detect recent recombinations and outstanding power to detect the ancestral ones, compared with state-of-the-art methods, often with a fraction of computational cost. We demonstrate the utility of the method by analyzing a collection of 616 whole-genomes of a recombinogenic pathogen Streptococcus pneumoniae, for which the method provided a high-resolution view of recombination across the genome. We examined in detail the penicillin-binding genes across the Streptococcus genus, demonstrating previously undetected genetic exchanges between different species at these three loci. Hence, fastGEAR can be readily applied to investigate mosaicism in bacterial genes across multiple species. Finally, fastGEAR correctly identified many known recombination hotspots and pointed to potential new ones. Matlab code and Linux/Windows executables are available at (last accessed February 6, 2017).

Journal article

Lees JA, Kremer PHC, Manso AS, Croucher NJ, Ferwerda B, Serón MV, Oggioni MR, Parkhill J, Brouwer MC, van der Ende A, van de Beek D, Bentley SDet al., 2017, Large scale genomic analysis shows no evidence for pathogen adaptation between the blood and cerebrospinal fluid niches during bacterial meningitis., Microbial Genomics, Vol: 3, ISSN: 2057-5858

Recent studies have provided evidence for rapid pathogen genome diversification, some of which could potentially affect the course of disease. We have previously described such variation seen between isolates infecting the blood and cerebrospinal fluid (CSF) of a single patient during a case of bacterial meningitis. Here, we performed whole-genome sequencing of paired isolates from the blood and CSF of 869 meningitis patients to determine whether such variation frequently occurs between these two niches in cases of bacterial meningitis. Using a combination of reference-free variant calling approaches, we show that no genetic adaptation occurs in either invaded niche during bacterial meningitis for two major pathogen species, Streptococcus pneumoniae and Neisseria meningitidis. This study therefore shows that the bacteria capable of causing meningitis are already able to do this upon entering the blood, and no further sequence change is necessary to cross the blood-brain barrier. Our findings place the focus back on bacterial evolution between nasopharyngeal carriage and invasion, or diversity of the host, as likely mechanisms for determining invasiveness.

Journal article

Lehtinen S, Blanquart F, Croucher NJ, Turner P, Lipsitch M, Fraser Cet al., 2017, Evolution of antibiotic resistance is linked to any genetic mechanism affecting bacterial duration of carriage, PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, Vol: 114, Pages: 1075-1080, ISSN: 0027-8424

Journal article

Mostowy RJ, Croucher NJ, De Maio N, Chewapreecha C, Salter SJ, Turner P, Aanensen DM, Bentley SD, Didelot X, Fraser Cet al., 2017, Frequent recombination of pneumococcal capsule highlights future risks of emergence of novel serotypes, Publisher: Cold Spring Harbor Laboratory

<jats:title>Abstract</jats:title><jats:p>Capsular diversity of <jats:italic>Streptococcus pneumoniae</jats:italic> constitutes a major obstacle in eliminating the pneumococcal disease. Such diversity is genetically encoded by almost 100 variants of the capsule polysaccharide locus (<jats:italic>cps</jats:italic>). However, the evolutionary dynamics of the capsule – the target of the currently used vaccines – remains not fully understood. Here, using genetic data from 4,469 bacterial isolates, we found <jats:italic>cps</jats:italic> to be an evolutionary hotspot with elevated substitution and recombination rates. These rates were a consequence of altered selection at this locus, supporting the hypothesis that the capsule has an increased potential to generate novel diversity compared to the rest of the genome. Analysis of twelve serogroups revealed their complex evolutionary history, which was principally driven by recombination with other serogroups and other streptococci. We observed significant variation in recombination rates between different serogroups. This variation could only be partially explained by the lineage-specific recombination rate, the remaining factors being likely driven by serogroup-specific ecology and epidemiology. Finally, we discovered two previously unobserved mosaic serotypes in the densely sampled collection from Mae La, Thailand, here termed 10X and 21X. Our results thus emphasise the strong adaptive potential of the bacterium by its ability to generate novel serotypes by recombination.</jats:p>

Working paper

Croucher NJ, Campo JJ, Le TQ, Liang X, Bentley SD, Hanage WP, Lipsitch Met al., 2017, Diverse evolutionary patterns of pneumococcal antigens identified by pangenome-wide immunological screening, Proceedings of the National Academy of Sciences, Vol: 114, Pages: E357-E366, ISSN: 0027-8424

Characterizing the immune response to pneumococcal proteins is critical in understanding this bacterium’s epidemiology and vaccinology. Probing a custom-designed proteome microarray with sera from 35 healthy US adults revealed a continuous distribution of IgG affinities for 2,190 potential antigens from the species-wide pangenome. Reproducibly elevated IgG binding was elicited by 208 “antibody binding targets” (ABTs), which included 109 variants of the diverse pneumococcal surface proteins A and C (PspA and PspC) and zinc metalloprotease A and B (ZmpA and ZmpB) proteins. Functional analysis found ABTs were enriched in motifs for secretion and cell surface association, with extensive representation of cell wall synthesis machinery, adhesins, transporter solute-binding proteins, and degradative enzymes. ABTs were associated with stronger evidence for evolving under positive selection, although this varied between functional categories, as did rates of diversification through recombination. Particularly rapid variation was observed at some immunogenic accessory loci, including a phage protein and a phase-variable glycosyltransferase ubiquitous among the diverse set of genomic islands encoding the serine-rich PsrP glycoprotein. Nevertheless, many antigens were conserved in the core genome, and strains’ antigenic profiles were generally stable. No strong evidence was found for any epistasis between antigens driving population dynamics, or redundancy between functionally similar accessory ABTs, or age stratification of antigen profiles. These results highlight the paradox of why substantial variation is observed in only a subset of epitopes. This result may indicate only some interactions between immunoglobulins and ABTs clear pneumococcal colonization or that acquired immunity to pneumococci is an accumulation of individually weak responses to ABTs evolving under different levels of functional constraint.

Journal article

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00402647&limit=30&person=true