67 results found
Walker TM, Miotto P, Köser CU, et al., 2022, The 2021 WHO catalogue of Mycobacterium tuberculosis complex mutations associated with drug resistance: a genotypic analysis., The Lancet Microbe, Vol: 3, Pages: e265-e273, ISSN: 2666-5247
Background: Molecular diagnostics are considered the most promising route to achieving rapid, universal drug susceptibility testing for Mycobacterium tuberculosiscomplex (MTBC). We aimed to generate a WHO endorsed catalogue of mutations to serve as a global standard for interpreting molecular information for drug resistance prediction. Methods: A candidate gene approach was used to identify mutations as associated with resistance, or consistent with susceptibility, for 13 WHO endorsed anti-tuberculosis drugs. 38,215 MTBC isolates with paired whole-genome sequencing and phenotypic drug susceptibility testing data were amassed from 45 countries. For each mutation, a contingency table of binary phenotypes and presence or absence of the mutation computed positive predictive value, and Fisher's exact tests generated odds ratios and Benjamini-Hochberg corrected p-values. Mutations were graded as Associated with Resistance if present in at least 5 isolates, if the odds ratio was >1 with a statistically significant corrected p-value, and if the lower bound of the 95% confidence interval on the positive predictive value for phenotypic resistance was >25%. A series of expert rules were applied for final confidence grading of each mutation. Findings: 15,667 associations were computed for 13,211 unique mutations linked to one or more drugs. 1,149/15,667 (7·3%) mutations were classified as associated with phenotypic resistance and 107/15,667 (0·7%) were deemed consistent with susceptibility. For rifampicin, isoniazid, ethambutol, fluoroquinolones, and streptomycin, the mutations' pooled sensitivity was >80%. Specificity was over 95% for all drugs except ethionamide (91·4%), moxifloxacin (91·6%) and ethambutol (93·3%). Only two resistance mutations were classified for bedaquiline, delamanid, clofazimine, and linezolid as prevalence of phenotypic resistance was low for these drugs. Interpretation: This first WHO endorsed catalogue of mol
Mansouri M, Khakabimamaghani S, Chindelevitch L, et al., 2022, Aristotle: stratified causal discovery for omics data, BMC BIOINFORMATICS, Vol: 23, ISSN: 1471-2105
Chindelevitch L, Hayati M, Poon AFY, et al., 2021, Network science inspires novel tree shape statistics, PLOS ONE, Vol: 16, ISSN: 1932-6203
Bhatt S, 2021, Understanding the effectiveness of government interventions against the resurgence of COVID-19 in Europe, Nature Communications, Vol: 12, Pages: 1-12, ISSN: 2041-1723
As European governments face resurging waves of COVID-19, non-pharmaceutical interventions (NPIs) continue to be the primary tool for infection control. However, updated estimates of their relative effectiveness have been absent for Europe’s second wave, largely due to a lack of collated data that considers the increased subnational variation and diversity of NPIs. We collect the largest dataset of NPI implementation dates in Europe, spanning 114 subnational areas in 7 countries, with a systematic categorisation of interventions tailored to the second wave. Using a hierarchical Bayesian transmission model, we estimate the effectiveness of 17 NPIs from local case and death data. We manually validate the data, address limitations in modelling from previous studies, and extensively test the robustness of our estimates. The combined effect of all NPIs was smaller relative to estimates from the first half of 2020, indicating the strong influence of safety measures and individual protective behaviours--such as distancing--that persisted after the first wave. Closing specific businesses was highly effective. Gathering restrictions were highly effective but only for the strictest limits. We find smaller effects for closing educational institutions compared to the first wave, suggesting that safer operation of schools was possible with a set of stringent safety measures including testing and tracing, preventing mixing, and smaller classes. These results underscore that effectiveness estimates from the early stage of an epidemic are measured relative to pre-pandemic behaviour. Updated estimates are required to inform policy in an ongoing pandemic.
Zabeti H, Dexter N, Safari AH, et al., 2021, INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosis, ALGORITHMS FOR MOLECULAR BIOLOGY, Vol: 16
Gabbassov E, Moreno-Molina M, Comas I, et al., 2021, SplitStrains, a tool to identify and separate mixed Mycobacterium tuberculosis infections from WGS data., Microbial Genomics, Vol: 7, Pages: 1-16, ISSN: 2057-5858
The occurrence of multiple strains of a bacterial pathogen such as M. tuberculosis or C. difficile within a single human host, referred to as a mixed infection, has important implications for both healthcare and public health. However, methods for detecting it, and especially determining the proportion and identities of the underlying strains, from WGS (whole-genome sequencing) data, have been limited. In this paper we introduce SplitStrains, a novel method for addressing these challenges. Grounded in a rigorous statistical model, SplitStrains not only demonstrates superior performance in proportion estimation to other existing methods on both simulated as well as real M. tuberculosis data, but also successfully determines the identity of the underlying strains. We conclude that SplitStrains is a powerful addition to the existing toolkit of analytical methods for data coming from bacterial pathogens and holds the promise of enabling previously inaccessible conclusions to be drawn in the realm of public health microbiology.
Meidanis J, Chindelevitch L, 2021, Fast median computation for symmetric, orthogonal matrices under the rank distance, Linear Algebra and Its Applications, Vol: 614, Pages: 394-414, ISSN: 0024-3795
Biological genomes can be represented as square, symmetric,orthogonal, 0-1 matrices. It turns out that the rank distanceapplied to two genome matrices has a biological significance:it is related to the smallest number of basic rearrangementmutations, such as reversals, translocations, transpositions(taken with weight 2), etc. that explain the differencesbetween the two genomes. Therefore, closer genomes willproduce smaller rank distances.An important tool in this context is the median problem:given three genomes A, B, and C, find a fourth genome Mthat minimizes d(A, M) + d(B, M) + d(C, M). For genomematrices, the computational complexity of this problem iscurrently unknown. However, for orthogonal matrices, thereare fast algorithms that solve it exactly.One such algorithm uses a “walk towards the median”paradigm. Starting from any of the input matrices, say, B, thealgorithm produces rank-1 “steps”, which are rank-1 matricesthat, added to B, decrease its rank distance to both A and Csimultaneously. It can be shown that such steps always existfor orthogonal matrices, and can be found in polynomial time.The algorithm stops when no more improvement can be made,which is equivalent to saying that B lies between A and C interms of the rank distance (the triangle inequality becomesan equality). There is an O(nω+1) algorithm implementingthis idea, where ω is the matrix multiplication exponent.Here we propose a novel scheme that works for symmetric orthogonal matrices, and produces a median, also guaranteedto be symmetric, in O(nω) time.There is another O(nω) time algorithm that produces theso-called MI median, which agrees with the majority in thesubspaces where A = B, B = C, or C = A, and is equalto the identity in the orthogonal complement of the sum ofthese subspaces. However, this algorithm produces a differentmedian, and has only been proved to be correct for genomicmatrices. The algorithm we present here is more general.
Brauner JM, Mindermann S, Sharma M, et al., 2021, Inferring the effectiveness of government interventions against COVID-19, Science, Vol: 371, ISSN: 0036-8075
Governments are attempting to control the COVID-19 pandemic with nonpharmaceutical interventions (NPIs). However, the effectiveness of different NPIs at reducing transmission is poorly understood. We gathered chronological data on the implementation of NPIs for several European and non-European countries between January and the end of May 2020. We estimated the effectiveness of these NPIs, which range from limiting gathering sizes and closing businesses or educational institutions to stay-at-home orders. To do so, we used a Bayesian hierarchical model that links NPI implementation dates to national case and death counts and supported the results with extensive empirical validation. Closing all educational institutions, limiting gatherings to 10 people or less, and closing face-to-face businesses each reduced transmission considerably. The additional effect of stay-at-home orders was comparatively small.
Safari AH, Sedaghat N, Zabeti H, et al., 2021, Predicting drug resistance in M. tuberculosis using a Long-term Recurrent Convolutional Network, 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB), Publisher: ASSOC COMPUTING MACHINERY
Sharma M, Mindermann S, Brauner J, et al., 2020, How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19?, Neural Information Processing Systems (NeurIPS 2020), Publisher: NeurIPS, ISSN: 1049-5258
To what extent are effectiveness estimates of nonpharmaceutical interventions (NPIs) against COVID-19 influenced by the assumptions our models make? To answer this question, we investigate 2 state-of-the-art NPI effectiveness models and propose 6 variants that make different structural assumptions. In particular, we investigate how well NPI effectiveness estimates generalise to unseen countries, and their sensitivity to unobserved factors. Models which account for noise in disease transmission compare favourably. We further evaluate how robust estimates are to different choices of epidemiological parameters and data. Focusing on models that assume transmission noise, we find that previously published results are robust across these choices and across different models. Finally, we mathematically ground the interpretation of NPI effectiveness estimates when certain common assumptions do not hold.
Ezewudo M, Borens A, Chiner-Oms Á, et al., 2020, Author Correction: Integrating standardized whole genome sequence analysis with a global Mycobacterium tuberculosis antibiotic resistance knowledgebase, Scientific Reports, Vol: 10, Pages: 1-1, ISSN: 2045-2322
Brauner JM, Mindermann S, Sharma M, et al., 2020, epidemics/COVIDNPIs: Inferring the effectiveness of government interventions against COVID-19
epidemics/COVIDNPIs: Inferring the effectiveness of government interventions against COVID-19: Pre-release
Gan GL, Nguyen MH, Willie E, et al., 2020, Geographic heterogeneity impacts drug resistance predictions in <i>Mycobacterium tuberculosis</i>
<jats:title>Abstract</jats:title><jats:p>The efficacy of antibiotic drug treatments in tuberculosis (TB) is significantly threatened by the development of drug resistance. There is a need for a robust diagnostic system that can accurately predict drug resistance in patients. In recent years, researchers have been taking advantage of whole-genome sequencing (WGS) data to infer antibiotic resistance. In this work we investigate the power of machine learning tools in inferring drug resistance from WGS data on three distinct datasets differing in their geographical diversity.</jats:p><jats:p>We analyzed data from the Relational Sequencing TB Data Platform, which comprises global isolates from 32 different countries, the PATRIC database, containing isolates contributed by researchers around the world, and isolates collected by the British Columbia Centre for Disease Control in Canada. We predicted drug resistance to the first-line drugs: isoniazid, rifampicin, ethambutol, pyrazinamide, and streptomycin. We focused on the genes which previous evidence suggests are involved in drug resistance in TB.</jats:p><jats:p>We called single-nucleotide polymorphisms using the Snippy pipeline, then applied different machine learning models. Following best practices, we chose the best parameters for each model via cross-validation on the training set and evaluated the performance via the sensitivity-specificity tradeoffs on the testing set.</jats:p><jats:p>To the best of our knowledge, our study is the first to predict antibiotic resistance in TB across multiple datasets. We obtained a performance comparable to that seen in previous studies, but observed that performance may be negatively affected when training on one dataset and testing on another, suggesting the importance of geographical heterogeneity in drug resistance predictions. In addition, we investigated the importance of each gene within each model, and recapitulated som
Chindelevitch L, Zabeti H, Dexter N, et al., 2020, An interpretable classification method for predicting drug resistance in M. tuberculosis, International Workshop on Algorithms in Bioinformatics, Publisher: Schloss Dagstuhl -- Leibniz-Zentrum fuer Informatik, Pages: 2:1-2:18, ISSN: 1868-8969
Motivation: The prediction of drug resistance and the identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Modern methods based on testing against a catalogue of previously identified mutations often yield poor predictive performance. On the other hand, machine learning techniques have demonstrated high predictive accuracy, but many of them lack interpretability to aid in identifying specific mutations which lead to resistance. We propose a novel technique, inspired by the group testing problem and Boolean compressed sensing, which yields highly accurate predictions and interpretable results at the same time. Results: We develop a modified version of the Boolean compressed sensing problem for identifying drug resistance, and implement its formulation as an integer linear program. This allows us to characterize the predictive accuracy of the technique and select an appropriate metric to optimize. A simple adaptation of the problem also allows us to quantify the sensitivity-specificity trade-off of our model under different regimes. We test the predictive accuracy of our approach on a variety of commonly used antibiotics in treating tuberculosis and find that it has accuracy comparable to that of standard machine learning models and points to several genes with previously identified association to drug resistance.
Hayati M, Chindelevitch L, 2020, Computing the distribution of the Robinson-Foulds distance, Computational Biology and Chemistry, Vol: 87, ISSN: 1476-9271
With the exponential growth of genome databases, the importance of phylogenetics has increased dramatically over the past years. Studying phylogenetic trees enables us not only to understand how genes, genomes, and species evolve, but also helps us predict how they might change in future. One of the crucial aspects of phylogenetics is the comparison of two or more phylogenetic trees. There are different metrics for computing the dissimilarity between a pair of trees. The Robinson-Foulds (RF) distance is one of the widely used metrics on the space of labeled trees. The distribution of the RF distance from a given tree has been studied before, but the fastest known algorithm for computing this distribution is a slow, albeit polynomial-time, O(l5) algorithm. In this paper, we modify the dynamic programming algorithm for computing the distribution of this distance for a given tree by leveraging the number-theoretic transform (NTT), and improve the running time from O(l5) to O(l3 log l), where l is the number of tips of the tree. In addition to its practical usefulness, our method represents a theoretical novelty, as it is, to our knowledge, one of the rare applications of the number-theoretic transform for solving a computational biology problem.
Hemez C, Clarelli F, Palmer AC, et al., 2020, Mechanisms of antibiotic action shape the fitness landscapes of resistance mutations
<jats:title>Abstract</jats:title><jats:p>Antibiotic-resistant pathogens are a major public health threat. Understanding how an antibiotic’s mechanism of action influences the emergence of resistance could help to improve the design of new drugs and to preserve the effectiveness of existing ones. To this end, we developed a model that links bacterial population dynamics with antibiotic-target binding kinetics. Our approach allows us to derive mechanistic insights on drug activity from population-scale experimental data and to quantify the interplay between drug mechanism and resistance selection. We find that whether a drug acts as a bacteriostatic or bactericidal agent has little influence on resistance selection. We also show that heterogeneous drug-target binding within a population enables antibiotic-resistant bacteria to evolve secondary mutations, even when drug doses remain above the resistant strain’s minimum inhibitory concentration. Our work suggests that antibiotic doses beyond this “secondary mutation selection window” could safeguard against the emergence of high-fitness resistant strains during treatment.</jats:p>
Zabeti H, Dexter N, Safari AH, et al., 2020, INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosis
<jats:title>Abstract</jats:title><jats:sec><jats:title>Motivation</jats:title><jats:p>Prediction of drug resistance and identification of its mechanisms in bacteria such as <jats:italic>Mycobacterium tuberculosis</jats:italic>, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria. On the one hand, approaches based on testing strains against a catalogue of previously identified mutations often yield poor predictive performance; on the other hand, machine learning techniques typically have higher predictive accuracy, but often lack interpretability and may learn patterns that produce accurate predictions for the wrong reasons. Current interpretable methods may either exhibit a lower accuracy or lack the flexibility needed to generalize them to previously unseen data.</jats:p></jats:sec><jats:sec><jats:title>Contribution</jats:title><jats:p>In this paper we propose a novel technique, inspired by the group testing and Boolean compressed sensing, which yields highly accurate predictions, interpretable results, and is flexible enough to be optimized for various evaluation metrics at the same time.</jats:p></jats:sec><jats:sec><jats:title>Results</jats:title><jats:p>We test the predictive accuracy of our approach on five first-line and seven second-line antibiotics used for treating tuberculosis. We find that it has a higher or comparable accuracy to that of commonly used machine learning models, and is able to identify variants in genes with previously reported association to drug resistance. Our method is intrinsically interpretable, and can be customized for different evaluation metrics. Our implementation is available at <jats:ext-link xmlns:xlink="http://www.w3.org/1
Brauner JM, Mindermann S, Sharma M, et al., 2020, The effectiveness of eight nonpharmaceutical interventions against COVID-19 in 41 countries
<jats:title>Abstract</jats:title><jats:sec><jats:title>Background</jats:title><jats:p>Governments are attempting to control the COVID-19 pandemic with nonpharmaceutical interventions (NPIs). However, it is still largely unknown how effective different NPIs are at reducing transmission. Data-driven studies can estimate the effectiveness of NPIs while minimizing assumptions, but existing analyses lack sufficient data and validation to robustly distinguish the effects of individual NPIs.</jats:p></jats:sec><jats:sec><jats:title>Methods</jats:title><jats:p>We collect chronological data on NPIs in 41 countries between January and May 2020, using independent double entry by researchers to ensure high data quality. We estimate NPI effectiveness with a Bayesian hierarchical model, by linking NPI implementation dates to national case and death counts. To our knowledge, this is the largest and most thoroughly validated data-driven study of NPI effectiveness to date.</jats:p></jats:sec><jats:sec><jats:title>Results</jats:title><jats:p>We model each NPI’s effect as a multiplicative (percentage) reduction in the reproduction number <jats:italic>R</jats:italic>. We estimate the mean reduction in R across the countries in our data for eight NPIs: mandating mask-wearing in (some) public spaces (2%; 95% CI: −14%–16%), limiting gatherings to 1000 people or less (2%; −20%–22%), to 100 people or less (21%; 1%–39%), to 10 people or less (36%; 16%–53%), closing some high-risk businesses (31%; 13%–46%), closing most nonessential businesses (40%; 22%–55%), closing schools and universities (39%; 21%–55%), and issuing stay-at-home orders (18%; 4%–31%). These results are supported by extensive empirical validation, including 15 sensitivity analyses.</jats:p></jats:sec><jats:sec><jats:title
Chindelevitch L, Katebi M, Feijao P, et al., 2020, PathOGiST: A Novel Method for Clustering Pathogen Isolates by Combining Multiple Genotyping Signals, International Conference on Algorithms for Computational Biology
Gan GL, Willie E, Chauve C, et al., 2019, Deconvoluting the diversity of within-host pathogen strains in a multi-locus sequence typing framework, BMC Bioinformatics, Vol: 20, Pages: 1-10, ISSN: 1471-2105
BackgroundBacterial pathogens exhibit an impressive amount of genomic diversity. This diversity can be informative of evolutionary adaptations, host-pathogen interactions, and disease transmission patterns. However, capturing this diversity directly from biological samples is challenging.ResultsWe introduce a framework for understanding the within-host diversity of a pathogen using multi-locus sequence types (MLST) from whole-genome sequencing (WGS) data. Our approach consists of two stages. First we process each sample individually by assigning it, for each locus in the MLST scheme, a set of alleles and a proportion for each allele. Next, we associate to each sample a set of strain types using the alleles and the strain proportions obtained in the first step. We achieve this by using the smallest possible number of previously unobserved strains across all samples, while using those unobserved strains which are as close to the observed ones as possible, at the same time respecting the allele proportions as closely as possible. We solve both problems using mixed integer linear programming (MILP). Our method performs accurately on simulated data and generates results on a real data set of Borrelia burgdorferi genomes suggesting a high level of diversity for this pathogen.ConclusionsOur approach can apply to any bacterial pathogen with an MLST scheme, even though we developed it with Borrelia burgdorferi, the etiological agent of Lyme disease, in mind. Our work paves the way for robust strain typing in the presence of within-host heterogeneity, overcoming an essential challenge currently not addressed by any existing methodology for pathogen genomics.
Hayati M, Shadgar B, Chindelevitch L, 2019, A new resolution function to evaluate tree shape statistics, PLoS One, Vol: 14, Pages: 1-16, ISSN: 1932-6203
Phylogenetic trees are frequently used in biology to study the relationships between a number of species or organisms. The shape of a phylogenetic tree contains useful information about patterns of speciation and extinction, so powerful tools are needed to investigate the shape of a phylogenetic tree. Tree shape statistics are a common approach to quantifying the shape of a phylogenetic tree by encoding it with a single number. In this article, we propose a new resolution function to evaluate the power of different tree shape statistics to distinguish between dissimilar trees. We show that the new resolution function requires less time and space in comparison with the previously proposed resolution function for tree shape statistics. We also introduce a new class of tree shape statistics, which are linear combinations of two existing statistics that are optimal with respect to a resolution function, and show evidence that the statistics in this class converge to a limiting linear combination as the size of the tree increases. Our implementation is freely available at https://github.com/WGS-TB/TreeShapeStats.
Thain N, Le C, Crossa A, et al., 2019, Towards better prediction of Mycobacterium tuberculosis lineages from MIRU-VNTR data, Infection, Genetics and Evolution, Vol: 72, Pages: 59-66, ISSN: 1567-1348
The determination of lineages from strain-based molecular genotyping information is an important problem in tuberculosis. Mycobacterial interspersed repetitive unit-variable number tandem repeat (MIRU-VNTR) typing is a commonly used molecular genotyping approach that uses counts of the number of times pre-specified loci repeat in a strain. There are three main approaches for determining lineage based on MIRU-VNTR data - one based on a direct comparison to the strains in a curated database, and two others, on machine learning algorithms trained on a large collection of labeled data.All existing methods have limitations. The direct approach imposes an arbitrary threshold on how much a database strain can differ from a given one to be informative. On the other hand, the machine learning-based approaches require a substantial amount of labeled data. Notably, all three methods exhibit suboptimal classification accuracy without additional data.We explore several computational approaches to address these limitations. First, we show that eliminating the arbitrary threshold improves the performance of the direct approach. Second, we introduce RuleTB, an alternative direct method that proposes a concise set of rules for determining lineages. Lastly, we propose StackTB, a machine learning approach that requires only a fraction of the training data to outperform the accuracy of both existing machine learning methods.Our approaches demonstrate superior performance on a training dataset collected in New York City over 10 years, and the improvement in performance translates to a held-out testing set. We conclude that our methods provide opportunities for improving the determination of pathogenic lineages based on MIRU-VNTR data.
Meidanis J, Chindelevitch L, 2019, A cubic algorithm for the generalized rank medianof three genomes, RECOMB International conference on Comparative Genomics
Chindelevitch L, Meidanis J, 2019, Rank distance sheds light on genome evolution, Conference of the International Linear Algebra Society (ILAS 2019)
Miraskarshahi R, Zabeti H, Stephen T, et al., 2019, MCS2: minimal coordinated supports for fast enumeration of minimal cut sets in metabolic networks, Bioinformatics, Vol: 35, Pages: i615-i623, ISSN: 1367-4803
Motivation:Constraint-based modeling of metabolic networks helps researchers gain insight into the metabolic processes of many organisms, both prokaryotic and eukaryotic. Minimal cut sets (MCSs) are minimal sets of reactions whose inhibition blocks a target reaction in a metabolic network. Most approaches for finding the MCSs in constrained-based models require, either as an intermediate step or as a byproduct of the calculation, the computation of the set of elementary flux modes (EFMs), a convex basis for the valid flux vectors in the network. Recently, Ballerstein et al. proposed a method for computing the MCSs of a network without first computing its EFMs, by creating a dual network whose EFMs are a superset of the MCSs of the original network. However, their dual network is always larger than the original network and depends on the target reaction. Here we propose the construction of a different dual network, which is typically smaller than the original network and is independent of the target reaction, for the same purpose. We prove the correctness of our approach, minimal coordinated support (MCS2), and describe how it can be modified to compute the few smallest MCSs for a given target reaction.Results:We compare MCS2 to the method of Ballerstein et al. and two other existing methods. We show that MCS2 succeeds in calculating the full set of MCSs in many models where other approaches cannot finish within a reasonable amount of time. Thus, in addition to its theoretical novelty, our approach provides a practical advantage over existing methods.Availability and implementation:MCS2 is freely available at https://github.com/RezaMash/MCS under the GNU 3.0 license.
MotivationDespite the remarkable advances in sequencing and computational techniques, noise in the data and complexity of the underlying biological mechanisms render deconvolution of the phylogenetic relationships between cancer mutations difficult. Besides that, the majority of the existing datasets consist of bulk sequencing data of single tumor sample of an individual. Accurate inference of the phylogenetic order of mutations is particularly challenging in these cases and the existing methods are faced with several theoretical limitations. To overcome these limitations, new methods are required for integrating and harnessing the full potential of the existing data.ResultsWe introduce a method called Hintra for intra-tumor heterogeneity detection. Hintra integrates sequencing data for a cohort of tumors and infers tumor phylogeny for each individual based on the evolutionary information shared between different tumors. Through an iterative process, Hintra learns the repeating evolutionary patterns and uses this information for resolving the phylogenetic ambiguities of individual tumors. The results of synthetic experiments show an improved performance compared to two state-of-the-art methods. The experimental results with a recent Breast Cancer dataset are consistent with the existing knowledge and provide potentially interesting findings.Availability and implementationThe source code for Hintra is available at https://github.com/sahandk/HINTRA.
Chindelevitch L, Pereira Zanetti JP, Meidanis J, 2019, Counting sorting scenariosand intermediate genomes for the rank distance, International Conference on Algorithms for Computational Biology
Chindelevitch L, Hayati M, Poon AFY, et al., 2019, Network science inspires novel tree shape statistics
<jats:label>1</jats:label><jats:title>Abstract</jats:title><jats:p>The shape of phylogenetic trees can be used to gain evolutionary insights. A tree’s shape specifies the connectivity of a tree, while its branch lengths reflect either the time or genetic distance between branching events; well-known measures of tree shape include the Colless and Sackin imbalance, which describe the asymmetry of a tree. In other contexts, network science has become an important paradigm for describing structural features of networks and using them to understand complex systems, ranging from protein interactions to social systems. Network science is thus a potential source of many novel ways to characterize tree shape, as trees are also networks. Here, we tailor tools from network science, including diameter, average path length, and betweenness, closeness, and eigenvector centrality, to summarize phylogenetic tree shapes. We thereby propose tree shape summaries that are complementary to both asymmetry and the frequencies of small configurations. These new statistics can be computed in linear time and scale well to describe the shapes of large trees. We apply these statistics, alongside some conventional tree statistics, to phylogenetic trees from three very different viruses (HIV, dengue fever and measles), from the same virus in different epidemiological scenarios (influenza A and HIV) and from simulation models known to produce trees with different shapes. Using mutual information and supervised learning algorithms, we find that the statistics adapted from network science perform as well as or better than conventional statistics. We describe their distributions and prove some basic results about their extreme values in a tree. We conclude that network science-based tree shape summaries are a promising addition to the toolkit of tree shape features. All our shape summaries, as well as functions to select the most discriminating ones for two sets o
Zanetti JPP, Chindelevitch L, Meidanis J, 2019, Rank distance generalizations for genomes with indels, International Conference on Algorithms for Computational Biology, Publisher: Springer International Publishing, Pages: 152-164, ISSN: 0302-9743
Chindelevitch L, Pereira Zanetti JP, Meidanis J, 2019, Rank distance generalizations for genomes with indels, International Conference on Algorithms for Computational Biology
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.