Publications

Jombart T, Jarvis CI, Mesfin S, Tabal N, Mossoko M, Mpia LM, Abedi AA, Chene S, Forbin EE, Belizaire MRD, de Radigues X, Ngombo R, Tutu Y, Finger F, Crowe M, Supsup WJE, Nsio J, Yam A, Diallo B, Gueye AS, Ahuka-Mundeke S, Yao M, Fall ISet al., 2020, The cost of insecurity: from flare-up to control of a major Ebola virus disease hotspot during the outbreak in the Democratic Republic of the Congo, 2019, EUROSURVEILLANCE, Vol: 25, Pages: 19-22, ISSN: 1560-7917

Author Web Link
Cite
Citations: 10

Journal article

Endo A, Abbott S, Kucharski AJ, Funk S, Eggo RM, Quilty BJ, Bosse NI, van Zandvoort K, Munday JD, Flasche S, Rosello A, Jit M, John Edmunds W, Gimma A, Liu Y, Prem K, Gibbs H, Diamond C, Jarvis CI, Davies N, Sun F, Hellewell J, Russell TW, Jombart T, Clifford S, Klepac P, Medley G, Pearson CABet al., 2020, Estimating the overdispersion in COVID-19 transmission using outbreak sizes outside China, Wellcome Open Research, Vol: 5

Background: A novel coronavirus disease (COVID-19) outbreak has now spread to a number of countries worldwide. While sustained transmission chains of human-to-human transmission suggest high basic reproduction number R 0, variation in the number of secondary transmissions (often characterised by so-called superspreading events) may be large as some countries have observed fewer local transmissions than others. Methods: We quantified individual-level variation in COVID-19 transmission by applying a mathematical model to observed outbreak sizes in affected countries. We extracted the number of imported and local cases in the affected countries from the World Health Organization situation report and applied a branching process model where the number of secondary transmissions was assumed to follow a negative-binomial distribution. Results: Our model suggested a high degree of individual-level variation in the transmission of COVID-19. Within the current consensus range of R 0 (2-3), the overdispersion parameter k of a negative-binomial distribution was estimated to be around 0.1 (median estimate 0.1; 95% CrI: 0.05-0.2 for R0 = 2.5), suggesting that 80% of secondary transmissions may have been caused by a small fraction of infectious individuals (~10%). A joint estimation yielded likely ranges for R 0 and k (95% CrIs: R 0 1.4-12; k 0.04-0.2); however, the upper bound of R 0 was not well informed by the model and data, which did not notably differ from that of the prior distribution. Conclusions: Our finding of a highly-overdispersed offspring distribution highlights a potential benefit to focusing intervention efforts on superspreading. As most infected individuals do not contribute to the expansion of an epidemic, the effective reproduction number could be drastically reduced by preventing relatively rare superspreading events.

Abstract
Cite
Citations: 267

Journal article

Jombart T, van Zandvoort K, Russell TW, Jarvis CI, Gimma A, Abbott S, Clifford S, Funk S, Gibbs H, Liu Y, Pearson CAB, Bosse NI, Centre for the Mathematical Modelling of Infectious Diseases COVID-19 Working Group, Eggo RM, Kucharski AJ, Edmunds WJet al., 2020, Inferring the number of COVID-19 cases from recently reported deaths., Wellcome Open Res, Vol: 5, ISSN: 2398-502X

We estimate the number of COVID-19 cases from newly reported deaths in a population without previous reports. Our results suggest that by the time a single death occurs, hundreds to thousands of cases are likely to be present in that population. This suggests containment via contact tracing will be challenging at this point, and other response strategies should be considered. Our approach is implemented in a publicly available, user-friendly, online tool.

Journal article

Parisi A, Tu LTP, Mather AE, Jombart T, Ha TT, Nguyen PHL, Nguyen HTT, Carrique-Mas J, Campbell J, Glass K, Kirk MD, Baker Set al., 2020, Differential antimicrobial susceptibility profiles between symptomatic and asymptomatic non-typhoidal Salmonella infections in Vietnamese children, EPIDEMIOLOGY AND INFECTION, Vol: 148, ISSN: 0950-2688

Author Web Link
Cite
Citations: 3

Journal article

Dighe A, Jombart T, Van Kerkhove MD, Ferguson Net al., 2019, A systematic review of MERS-CoV seroprevalence and RNA prevalence in dromedary camels: implications for animal vaccination, Epidemics, Vol: 29, ISSN: 1755-4365

Human infection with Middle East Respiratory Syndrome Coronavirus (MERS-CoV) is driven by recurring dromedary-to-human spill-over events, leading decision-makers to consider dromedary vaccination. Dromedary vaccine candidates in the development pipeline are showing hopeful results, but gaps in our understanding of the epidemiology of MERS-CoV in dromedaries must be addressed to design and evaluate potential vaccination strategies. We aim to bring together existing measures of MERS-CoV infection in dromedary camels to assess the distribution of infection, highlighting knowledge gaps and implications for animal vaccination. We systematically reviewed the published literature on MEDLINE, EMBASE and Web of Science that reported seroprevalence and/or prevalence of active MERS-CoV infection in dromedary camels from both cross-sectional and longitudinal studies. 60 studies met our eligibility criteria. Qualitative syntheses determined that MERS-CoV seroprevalence increased with age up to 80–100% in adult dromedaries supporting geographically widespread endemicity of MERS-CoV in dromedaries in both the Arabian Peninsula and countries exporting dromedaries from Africa. The high prevalence of active infection measured in juveniles and at sites where dromedary populations mix should guide further investigation – particularly of dromedary movement – and inform vaccination strategy design and evaluation through mathematical modelling.

Journal article

Thompson R, Stockwin J, van Gaalen R, Polonsky J, Kamvar Z, Demarsh A, Dahlqwist E, Miguel E, Jombart T, Lessler J, Cauchemez S, Cori Aet al., 2019, Improved inference of time-varying reproduction numbers during infectious disease outbreaks, Epidemics, Vol: 29, Pages: 1-11, ISSN: 1755-4365

Accurate estimation of the parameters characterising infectious disease transmission is vital for optimising control interventions during epidemics. A valuable metric for assessing the current threat posed by an outbreak is the time-dependent reproduction number, i.e. the expected number of secondary cases caused by each infected individual. This quantity can be estimated using data on the numbers of observed new cases at successive times during an epidemic and the distribution of the serial interval (the time between symptomatic cases in a transmission chain). Some methods for estimating the reproduction number rely on pre-existing estimates of the serial interval distribution and assume that the entire outbreak is driven by local transmission. Here we show that accurate inference of current transmissibility, and the uncertainty associated with this estimate, requires: (i) up-to-date observations of the serial interval to be included, and; (ii) cases arising from local transmission to be distinguished from those imported from elsewhere. We demonstrate how pathogen transmissibility can be inferred appropriately using datasets from outbreaks of H1N1 influenza, Ebola virus disease and Middle-East Respiratory Syndrome. We present a tool for estimating the reproduction number in real-time during infectious disease outbreaks accurately, which is available as an R software package (EpiEstim 2.2). It is also accessible as an interactive, user-friendly online interface (EpiEstim App), permitting its use by non-specialists. Our tool is easy to apply for assessing the transmission potential, and hence informing control, during future outbreaks of a wide range of invading pathogens.

Journal article

Weinert LA, Chaudhuri RR, Wang J, Peters SE, Corander J, Jombart T, Baig A, Howell KJ, Vehkala M, Valimaki N, Harris D, Tran TBC, Nguyen VVC, Campbell J, Schultsz C, Parkhill J, Bentley SD, Langford PR, Rycroft AN, Wren BW, Farrar J, Baker S, Hoa NT, Holden MTG, Tucker AW, Maskell DJ, Bosse JT, Li Y, Maglennon GA, Matthews D, Cuccui J, Terra Vet al., 2019, Publisher Correction: Genomic signatures of human and animal disease in the zoonotic pathogen Streptococcus suis (vol 6, 6740, 2015), Nature Communications, Vol: 10, ISSN: 2041-1723

Journal article

Moraga P, Dorigatti I, Kamvar ZN, Piatkowski P, Toikkanen SE, Nagraj VP, Donnelly CA, Jombart Tet al., 2019, epiflows: an R package for risk assessment of travel-related spread of disease, F1000Research, Vol: 7, Pages: 1374-1374

<ns4:p>As international travel increases worldwide, new surveillance tools are needed to help identify locations where diseases are most likely to be spread and prevention measures need to be implemented. In this paper we present <ns4:italic>epiflows</ns4:italic>, an R package for risk assessment of travel-related spread of disease. <ns4:italic>epiflows</ns4:italic> produces estimates of the expected number of symptomatic and/or asymptomatic infections that could be introduced to other locations from the source of infection. Estimates (average and confidence intervals) of the number of infections introduced elsewhere are obtained by integrating data on the cumulative number of cases reported, population movement, length of stay and information on the distributions of the incubation and infectious periods of the disease. The package also provides tools for geocoding and visualization. We illustrate the use of <ns4:italic>epiflows</ns4:italic> by assessing the risk of travel-related spread of yellow fever cases in Southeast Brazil in December 2016 to May 2017.</ns4:p>

Journal article

Moraga P, Dorigatti I, Kamvar ZN, Piatkowski P, Toikkanen SE, Nagraj VP, Donnelly CA, Jombart Tet al., 2019, epiflows: an R package for risk assessment of travel-related spread of disease, F1000Research, Vol: 7, Pages: 1374-1374

<ns4:p>As international travel increases worldwide, new surveillance tools are needed to help identify locations where diseases are most likely to be spread and prevention measures need to be implemented. In this paper we present <ns4:italic>epiflows</ns4:italic>, an R package for risk assessment of travel-related spread of disease. <ns4:italic>epiflows</ns4:italic> produces estimates of the expected number of symptomatic and/or asymptomatic infections that could be introduced to other locations from the source of infection. Estimates (average and confidence intervals) of the number of infections introduced elsewhere are obtained by integrating data on the cumulative number of cases reported, population movement, length of stay and information on the distributions of the incubation and infectious periods of the disease. The package also provides tools for geocoding and visualization. We illustrate the use of <ns4:italic>epiflows</ns4:italic> by assessing the risk of travel-related spread of yellow fever cases in Southeast Brazil in December 2016 to May 2017.</ns4:p>

Journal article

Cori A, Kamvar ZN, Stockwin J, Jombart T, Thompson R, Dahlqwist Eet al., 2019, annecori/EpiEstim: EpiEstim Cran 2.2-1

new CRAN version of EpiEstim including all new features described in Thompson et al. (currently in review in Epidemics journal).

Abstract
Cite

Software

Stockwin J, Thompson R, Cori A, Jombart T, Kamvar ZN, Fitzjohn Ret al., 2019, jstockwin/EpiEstimApp: v1.0.0

Source code for the EpiEstim app.

Abstract
Cite

Software

Polonsky JA, Baidjoe A, Kamvar ZN, Cori A, Durski K, Edmunds WJ, Eggo RM, Funk S, Kaiser L, Keating P, de Waroux OLP, Marks M, Moraga P, Morgan O, Nouvellet P, Ratnayake R, Roberts CH, Whitworth J, Jombart Tet al., 2019, Outbreak analytics: a developing data science for informing the response to emerging pathogens, Philosophical Transactions B: Biological Sciences, Vol: 374, ISSN: 0962-8436

Despite continued efforts to improve health systems worldwide, emerging pathogen epidemics remain a major public health concern. Effective response to such outbreaks relies on timely intervention, ideally informed by all available sources of data. The collection, visualization and analysis of outbreak data are becoming increasingly complex, owing to the diversity in types of data, questions and available methods to address them. Recent advances have led to the rise of outbreak analytics, an emerging data science focused on the technological and methodological aspects of the outbreak data pipeline, from collection to analysis, modelling and reporting to inform outbreak response. In this article, we assess the current state of the field. After laying out the context of outbreak response, we critically review the most common analytics components, their inter-dependencies, data requirements and the type of information they can provide to inform operations in real time. We discuss some challenges and opportunities and conclude on the potential role of outbreak analytics for improving our understanding of, and response to outbreaks of emerging pathogens.This article is part of the theme issue ‘Modelling infectious disease outbreaks in humans, animals and plants: epidemic forecasting and control‘. This theme issue is linked with the earlier issue ‘Modelling infectious disease outbreaks in humans, animals and plants: approaches and important themes’.

Journal article

Leiva C, Taboada S, Kenny NJ, Combosch D, Giribet G, Jombar T, Riesgo Aet al., 2019, Population substructure and signals of divergent adaptive selection despite admixture in the sponge Dendrilla antarctica from shallow waters surrounding the Antarctic Peninsula, MOLECULAR ECOLOGY, Vol: 28, Pages: 3151-3170, ISSN: 0962-1083

Author Web Link
Cite
Citations: 19

Journal article

Sewell T, Zhu J, Rhodes J, Hagen F, Mels JF, Fisher M, Jombart Tet al., 2019, Non-random distribution of azole resistance across the global population of Aspergillus fumigatus, mBio, Vol: 10, ISSN: 2150-7511

The emergence of azole resistance in the pathogenic fungus Aspergillus fumigatus has continued to increase, with the dominant resistance mechanisms, consisting of a 34-nucleotide tandem repeat (TR34)/L98H and TR46/Y121F/T289A, now showing a structured global distribution. Using hierarchical clustering and multivariate analysis of 4,049 A. fumigatus isolates collected worldwide and genotyped at nine microsatellite loci using analysis of short tandem repeats of A. fumigatus (STRAf), we show that A. fumigatus can be subdivided into two broad clades and that cyp51A alleles TR34/L98H and TR46/Y121F/T289A are unevenly distributed across these two populations. Diversity indices show that azole-resistant isolates are genetically depauperate compared to their wild-type counterparts, compatible with selective sweeps accompanying the selection of beneficial mutations. Strikingly, we found that azole-resistant clones with identical microsatellite profiles were globally distributed and sourced from both clinical and environmental locations, confirming that azole resistance is an international public health concern. Our work provides a framework for the analysis of A. fumigatus isolates based on their microsatellite profile, which we have incorporated into a freely available, user-friendly R Shiny application (AfumID) that provides clinicians and researchers with a method for the fast, automated characterization of A. fumigatus genetic relatedness. Our study highlights the effect that azole drug resistance is having on the genetic diversity of A. fumigatus and emphasizes its global importance upon this medically important pathogenic fungus.IMPORTANCE Azole drug resistance in the human-pathogenic fungus Aspergillus fumigatus continues to emerge, potentially leading to untreatable aspergillosis in immunosuppressed hosts. Two dominant, environmentally associated resistance mechanisms, which are thought to have evolved through selection by the agricultural application of azole fungic

Journal article

Campbell F, Cori A, Ferguson N, Jombart Tet al., 2019, Bayesian inference of transmission chains using timing of symptoms, pathogen genomes and contact data, PLoS Computational Biology, Vol: 15, ISSN: 1553-734X

There exists significant interest in developing statistical and computational tools for inferring ‘who infected whom’ in an infectious disease outbreak from densely sampled case data, with most recent studies focusing on the analysis of whole genome sequence data. However, genomic data can be poorly informative of transmission events if mutations accumulate too slowly to resolve individual transmission pairs or if there exist multiple pathogens lineages within-host, and there has been little focus on incorporating other types of outbreak data. We present here a methodology that uses contact data for the inference of transmission trees in a statistically rigorous manner, alongside genomic data and temporal data. Contact data is frequently collected in outbreaks of pathogens spread by close contact, including Ebola virus (EBOV), severe acute respiratory syndrome coronavirus (SARS-CoV) and Mycobacterium tuberculosis (TB), and routinely used to reconstruct transmission chains. As an improvement over previous, ad-hoc approaches, we developed a probabilistic model that relates a set of contact data to an underlying transmission tree and integrated this in the outbreaker2 inference framework. By analyzing simulated outbreaks under various contact tracing scenarios, we demonstrate that contact data significantly improves our ability to reconstruct transmission trees, even under realistic limitations on the coverage of the contact tracing effort and the amount of non-infectious mixing between cases. Indeed, contact data is equally or more informative than fully sampled whole genome sequence data in certain scenarios. We then use our method to analyze the early stages of the 2003 SARS outbreak in Singapore and describe the range of transmission scenarios consistent with contact data and genetic sequence in a probabilistic manner for the first time. This simple yet flexible model can easily be incorporated into existing tools for outbreak reconstruction and should

Journal article

Jombart T, Kamvar ZN, Cai J, Pulliam J, Chisholm S, Fitzjohn R, Schumacher J, Bhatia Set al., 2019, reconhub/incidence: Incidence version 1.7.0

Incidence can now handle standardised weeks starting on any day thanks to the aweek package :tada:library(incidence)library(ggplot2)library(cowplot)d <- as.Date("2019-03-11") + -7:6setNames(d, weekdays(d))#> Monday Tuesday Wednesday Thursday Friday #> "2019-03-04" "2019-03-05" "2019-03-06" "2019-03-07" "2019-03-08" #> Saturday Sunday Monday Tuesday Wednesday #> "2019-03-09" "2019-03-10" "2019-03-11" "2019-03-12" "2019-03-13" #> Thursday Friday Saturday Sunday #> "2019-03-14" "2019-03-15" "2019-03-16" "2019-03-17"imon <- incidence(d, "mon week") # also ISO weekitue <- incidence(d, "tue week")iwed <- incidence(d, "wed week")ithu <- incidence(d, "thu week")ifri <- incidence(d, "fri week")isat <- incidence(d, "sat week")isun <- incidence(d, "sun week") # also MMWR week and EPI weekpmon <- plot(imon, show_cases = TRUE, labels_week = FALSE)ptue <- plot(itue, show_cases = TRUE, labels_week = FALSE)pwed <- plot(iwed, show_cases = TRUE, labels_week = FALSE)pthu <- plot(ithu, show_cases = TRUE, labels_week = FALSE)pfri <- plot(ifri, show_cases = TRUE, labels_week = FALSE)psat <- plot(isat, show_cases = TRUE, labels_week = FALSE)psun <- plot(isun, show_cases = TRUE, labels_week = FALSE)s <- scale_x_date(limits = c(as.Date("2019-02-26"), max(d) + 7L))plot_grid(pmon + s,ptue + s,pwed + s,pthu + s,pfri + s,psat + s,psun + s)multi-weeks/months/years can now be handledlibrary(incidence)library(outbreaks)d <- ebola_sim_clean$linelist$date_of_onseth <- ebola_sim_clean$linelist$hospitalplot(incidence(d, interval = "1 epiweek", group = h))plot(incidence(d, interval = "2 epiweeks", group = h))plot(incide

Abstract
Cite

Software

Dighe A, Jombart T, van Kerkhove M, Ferguson Net al., 2019, A mathematical model of the transmission of middle East respiratory syndrome coronavirus in dromedary camels (Camelus dromedarius), Publisher: ELSEVIER SCI LTD, Pages: 1-1, ISSN: 1201-9712

Author Web Link
Cite
Citations: 7

Conference paper

Kamvar Z, Cai J, Pulliam JRC, Schumacher J, Jombart Tet al., 2019, Epidemic curves made easy using the R package incidence [version 1; referees: awaiting peer review], F1000Research, Vol: 8, ISSN: 2046-1402

The epidemiological curve (epicurve) is one of the simplest yet most useful tools used by field epidemiologists, modellers, and decision makers for assessing the dynamics of infectious disease epidemics. Here, we present the free, open-source package incidence for the R programming language, which allows users to easily compute, handle, and visualise epicurves from unaggregated linelist data. This package was built in accordance with the development guidelines of the R Epidemics Consortium (RECON), which aim to ensure robustness and reliability through extensive automated testing, documentation, and good coding practices. As such, it fills an important gap in the toolbox for outbreak analytics using the R software, and provides a solid building block for further developments in infectious disease modelling. incidence is available from https://www.repidemicsconsortium.org/incidence.

Journal article

Jombart T, Kamvar ZN, Cai J, Pulliam J, Chisholm S, Fitzjohn R, Schumacher J, Bhatia Set al., 2019, reconhub/incidence 1.5

☣:chart_with_upwards_trend::chart_with_downwards_trend:☣ Compute and visualise incidence

Abstract
Cite

Software

Kamvar ZN, Cai J, Pulliam JRC, Schumacher J, Jombart Tet al., 2019, Epidemic curves made easy using the R package incidence., F1000Research, Vol: 8, ISSN: 2046-1402

The epidemiological curve (epicurve) is one of the simplest yet most useful tools used by field epidemiologists, modellers, and decision makers for assessing the dynamics of infectious disease epidemics. Here, we present the free, open-source package incidence for the R programming language, which allows users to easily compute, handle, and visualise epicurves from unaggregated linelist data. This package was built in accordance with the development guidelines of the R Epidemics Consortium (RECON), which aim to ensure robustness and reliability through extensive automated testing, documentation, and good coding practices. As such, it fills an important gap in the toolbox for outbreak analytics using the R software, and provides a solid building block for further developments in infectious disease modelling. incidence is available from https://www.repidemicsconsortium.org/incidence.

Abstract
Cite
Citations: 19

Journal article

Cori A, Nouvellet P, Garske T, Bourhy H, Nakouné E, Jombart Tet al., 2018, A graph-based evidence synthesis approach to detecting outbreak clusters: An application to dog rabies, PLoS Computational Biology, Vol: 14, ISSN: 1553-734X

Early assessment of infectious disease outbreaks is key to implementing timely and effective control measures. In particular, rapidly recognising whether infected individuals stem from a single outbreak sustained by local transmission, or from repeated introductions, is crucial to adopt effective interventions. In this study, we introduce a new framework for combining several data streams, e.g. temporal, spatial and genetic data, to identify clusters of related cases of an infectious disease. Our method explicitly accounts for underreporting, and allows incorporating preexisting information about the disease, such as its serial interval, spatial kernel, and mutation rate. We define, for each data stream, a graph connecting all cases, with edges weighted by the corresponding pairwise distance between cases. Each graph is then pruned by removing distances greater than a given cutoff, defined based on preexisting information on the disease and assumptions on the reporting rate. The pruned graphs corresponding to different data streams are then merged by intersection to combine all data types; connected components define clusters of cases related for all types of data. Estimates of the reproduction number (the average number of secondary cases infected by an infectious individual in a large population), and the rate of importation of the disease into the population, are also derived. We test our approach on simulated data and illustrate it using data on dog rabies in Central African Republic. We show that the outbreak clusters identified using our method are consistent with structures previously identified by more complex, computationally intensive approaches.

Journal article

Jombart T, Kamvar Z, Cai J, Chisholm S, Fitzjohn R, Schumacher J, Bhatia Set al., 2018, reconhub/incidence: Incidence version 1.5.3

This is a patch release that fixes an issue with handling single-group incidence curves.You can install this version like so:remotes::install_github("reconhub/incidence@1.5.3")

Abstract
Cite

Software

Thioulouse J, Dufour AB, Jombart T, Dray S, Siberchicot A, Pavoine Set al., 2018, Multivariate analysis of ecological data with ade4, ISBN: 9781493988488

This book introduces the ade4 package for R which provides multivariate methods for the analysis of ecological data. It is implemented around the mathematical concept of the duality diagram, and provides a unified framework for multivariate analysis. The authors offer a detailed presentation of the theoretical framework of the duality diagram and also of its application to real-world ecological problems. These two goals may seem contradictory, as they concern two separate groups of scientists, namely statisticians and ecologists. However, statistical ecology has become a scientific discipline of its own, and the good use of multivariate data analysis methods by ecologists implies a fair knowledge of the mathematical properties of these methods. The organization of the book is based on ecological questions, but these questions correspond to particular classes of data analysis methods. The first chapters present both usual and multiway data analysis methods. Further chapters are dedicated for example to the analysis of spatial data, of phylogenetic structures, and of biodiversity patterns. One chapter deals with multivariate data analysis graphs. In each chapter, the basic mathematical definitions of the methods and the outputs of the R functions available in ade4 are detailed in two different boxes. The text of the book itself can be read independently from these boxes. Thus the book offers the opportunity to find information about the ecological situation from which a question raises alongside the mathematical properties of methods that can be applied to answer this question, as well as the details of software outputs. Each example and all the graphs in this book come with executable R code.

Abstract
Cite
Citations: 233

Book

Campbell F, Didelot X, Fitzjohn R, Ferguson N, Cori A, Jombart Tet al., 2018, outbreaker2: a modular platform for outbreak reconstruction, BMC Bioinformatics, Vol: 19, ISSN: 1471-2105

Background:Reconstructing individual transmission events in an infectious disease outbreak can provide valuable information and help inform infection control policy. Recent years have seen considerable progress in the development of methodologies for reconstructing transmission chains using both epidemiological and genetic data. However, only a few of these methods have been implemented in software packages, and with little consideration for customisability and interoperability. Users are therefore limited to a small number of alternatives, incompatible tools with fixed functionality, or forced to develop their own algorithms at considerable personal effort.Results:Here we present outbreaker2, a flexible framework for outbreak reconstruction. This R package re-implements and extends the original model introduced with outbreaker, but most importantly also provides a modular platform allowing users to specify custom models within an optimised inferential framework. As a proof of concept, we implement the within-host evolutionary model introduced with TransPhylo, which is very distinct from the original genetic model in outbreaker, and demonstrate how even complex model results can be successfully included with minimal effort.Conclusions:outbreaker2provides a valuable starting point for future outbreak reconstruction tools, and represents a unifying platform that promotes customisability and interoperability. Implemented in the R software, outbreaker2joins a growing body of tools for outbreak analysis

Journal article

Nagraj VP, Randhawa N, Campbell F, Crellen T, Sudre B, Jombart Tet al., 2018, epicontacts: Handling, visualisation and analysis of epidemiological contacts, f1000research Open for Science

Epidemiological outbreak data is often captured in line list and contact format to facilitate contact tracing for outbreak control. epicontacts is an R package that provides a unique data structure for combining these data into a single object in order to facilitate more efficient visualisation and analysis. The package incorporates interactive visualisation functionality as well as network analysis techniques. Originally developed as part of the Hackout3 event, it is now developed, maintained and featured as part of the R Epidemics Consortium (RECON). The package is available for download from the Comprehensive R Archive Network (CRAN) and GitHub .

Abstract
Cite

Journal article

Nagraj VP, Randhawa N, Campbell F, Crellen T, Sudre B, Jombart Tet al., 2018, epicontacts: Handling, visualisation and analysis of epidemiological contacts, F1000Research, ISSN: 2046-1402

Epidemiological outbreak data is often captured in line list and contact format to facilitate contact tracing for outbreak control. epicontacts is an R package that provides a unique data structure for combining these data into a single object in order to facilitate more efficient visualisation and analysis. The package incorporates interactive visualisation functionality as well as network analysis techniques. Originally developed as part of the Hackout3 event, it is now developed, maintained and featured as part of the R Epidemics Consortium (RECON). The package is available for download from the Comprehensive R Archive Network (CRAN) and GitHub .

Abstract
Cite

Journal article

Moraga P, Dorigatti I, Kamvar Z, Piatkowski P, Toikkanen S, Nagraj VP, Donnelly C, Jombart Tet al., 2018, epiflows : an R package for risk assessment of travel- related spread of disease [version 1; referees: 2 approved with reservations], F1000Research, Vol: 7, ISSN: 2046-1402

As international travel increases worldwide, new surveillance tools are needed to help identify locations where diseases are most likely to be spread and prevention measures need to be implemented. In this paper we present epiflows, an R package for risk assessment of travel-related spread of disease. epiflows produces estimates of the expected number of symptomatic and/or asymptomatic infections that could be introduced to other locations from the source of infection. Estimates (average and confidence intervals) of the number of infections introduced elsewhere are obtained by integrating data on the cumulative number of cases reported, population movement, length of stay and information on the distributions of the incubation and infectious periods of the disease. The package also provides tools for geocoding and visualization. We illustrate the use of epiflows by assessing the risk of travel-related spread of yellow fever cases in Southeast Brazil in December 2016 to May 2017.

Journal article

Beugin M-P, Gayet T, Pontier D, Devillard S, Jombart Tet al., 2018, A fast likelihood solution to the genetic clustering problem, Methods in Ecology and Evolution, Vol: 9, Pages: 1006-1016, ISSN: 2041-210X

The investigation of genetic clusters in natural populations is an ubiquitous problem in a range of fields relying on the analysis of genetic data, such as molecular ecology, conservation biology and microbiology. Typically, genetic clusters are defined as distinct panmictic populations, or parental groups in the context of hybridisation. Two types of methods have been developed for identifying such clusters: model-based methods, which are usually computer-intensive but yield results which can be interpreted in the light of an explicit population genetic model, and geometric approaches, which are less interpretable but remarkably faster.Here, we introduce snapclust, a fast maximum-likelihood solution to the genetic clustering problem, which allies the advantages of both model-based and geometric approaches. Our method relies on maximising the likelihood of a fixed number of panmictic populations, using a combination of geometric approach and fast likelihood optimisation, using the Expectation-Maximisation (EM) algorithm. It can be used for assigning genotypes to populations and optionally identify various types of hybrids between two parental populations. Several goodness-of-fit statistics can also be used to guide the choice of the retained number of clusters.Using extensive simulations, we show that snapclust performs comparably to current gold standards for genetic clustering as well as hybrid detection, with some advantages for identifying hybrids after several backcrosses, while being orders of magnitude faster than other model-based methods. We also illustrate how snapclust can be used for identifying the optimal number of clusters, and subsequently assign individuals to various hybrid classes simulated from an empirical microsatellite dataset.snapclust is implemented in the package adegenet for the free software R, and is therefore easily integrated into existing pipelines for genetic data analysis. It can be applied to any kind of co-dominant markers, and ca

Journal article

Dupuis JR, Bremer FT, Jombart T, Sim SB, Geib SMet al., 2018, mvmapper: Interactive spatial mapping of genetic structures, Molecular Ecology Resources, Vol: 18, Pages: 362-367, ISSN: 1755-098X

Characterizing genetic structure across geographic space is a fundamental challenge in population genetics. Multivariate statistical analyses are powerful tools for summarizing genetic variability, but geographic information and accompanying metadata are not always easily integrated into these methods in a user-friendly fashion. Here, we present a deployable Python-based web-tool, mvmapper, for visualizing and exploring results of multivariate analyses in geographic space. This tool can be used to map results of virtually any multivariate analysis of georeferenced data, and routines for exporting results from a number of standard methods have been integrated in the R package adegenet, including principal components analysis (PCA), spatial PCA, discriminant analysis of principal components, principal coordinates analysis, nonmetric dimensional scaling and correspondence analysis. mvmapper's greatest strength is facilitating dynamic and interactive exploration of the statistical and geographic frameworks side by side, a task that is difficult and time-consuming with currently available tools. Source code and deployment instructions, as well as a link to a hosted instance of mvmapper, can be found at https://popphylotools.github.io/mvMapper/.

Journal article

Campbell F, Strang C, Ferguson N, Cori A, Jombart Tet al., 2018, When are pathogen genome sequences informative of transmission events?, PLoS Pathogens, Vol: 14, ISSN: 1553-7366

Recent years have seen the development of numerous methodologies for reconstructing transmission trees in infectious disease outbreaks from densely sampled whole genome sequence data. However, a fundamental and as of yet poorly addressed limitation of such approaches is the requirement for genetic diversity to arise on epidemiological timescales. Specifically, the position of infected individuals in a transmission tree can only be resolved by genetic data if mutations have accumulated between the sampled pathogen genomes. To quantify and compare the useful genetic diversity expected from genetic data in different pathogen outbreaks, we introduce here the concept of ‘transmission divergence’, defined as the number of mutations separating whole genome sequences sampled from transmission pairs. Using parameter values obtained by literature review, we simulate outbreak scenarios alongside sequence evolution using two models described in the literature to describe transmission divergence of ten major outbreak-causing pathogens. We find that while mean values vary significantly between the pathogens considered, their transmission divergence is generally very low, with many outbreaks characterised by large numbers of genetically identical transmission pairs. We describe the impact of transmission divergence on our ability to reconstruct outbreaks using two outbreak reconstruction tools, the R packages outbreaker and phybreak, and demonstrate that, in agreement with previous observations, genetic sequence data of rapidly evolving pathogens such as RNA viruses can provide valuable information on individual transmission events. Conversely, sequence data of pathogens with lower mean transmission divergence, including Streptococcus pneumoniae, Shigella sonnei and Clostridium difficile, provide little to no information about individual transmission events. Our results highlight the informational limitations of genetic sequence data in certain outbreak scenarios, and

Journal article

DrThibautJombart

Contact

Location

Summary