Publications
44 results found
Aglago EK, Kim AE, Lin Y, et al., 2023, A genetic locus within the FMN1/GREM1 gene region interacts with body mass index in colorectal cancer risk., Cancer Res
Colorectal cancer (CRC) risk can be impacted by genetic, environmental, and lifestyle factors, including diet and obesity. Gene-environment (G×E) interactions can provide biological insights into the effects of obesity on CRC risk. Here, we assessed potential genome-wide G×E interactions between body mass index (BMI) and common single nucleotide polymorphisms (SNPs) for CRC risk using data from 36,415 CRC cases and 48,451 controls from three international CRC consortia (CCFR, CORECT, and GECCO). The G×E tests included the conventional logistic regression using multiplicative terms (one-degree of freedom, 1DF test), the two-step EDGE method, and the joint 3DF test, each of which is powerful for detecting G×E interactions under specific conditions. BMI was associated with higher CRC risk. The two-step approach revealed a statistically significant G×BMI interaction located within the Formin 1/Gremlin 1 (FMN1/GREM1) gene region (rs58349661). This SNP was also identified by the 3DF test, with a suggestive statistical significance in the 1DF test. Among participants with the CC genotype of rs58349661, overweight and obesity categories were associated with higher CRC risk, whereas null associations were observed across BMI categories in those with the TT genotype. Using data from three large international consortia, this study discovered a locus in the FMN1/GREM1 gene region that interacts with BMI on the association with CRC risk. Further studies should examine the potential mechanisms through which this locus modifies the etiologic link between obesity and CRC.
Sanna Passino F, Adams N, Cohen E, et al., 2023, Statistical cybersecurity: a brief discussion of challenges, data structures, and future directions, Harvard Data Science Review, Vol: 5, Pages: 1-10, ISSN: 2644-2353
Komodromos M, Aboagye EO, Evangelou M, et al., 2022, Variational Bayes for high-dimensional proportional hazards models with applications within gene expression, BIOINFORMATICS, Vol: 38, Pages: 3918-3926, ISSN: 1367-4803
Motivation:Few Bayesian methods for analyzing high-dimensional sparse survival data provide scalable variable selection, effect estimation and uncertainty quantification. Such methods often either sacrifice uncertainty quantification by computing maximum a posteriori estimates, or quantify the uncertainty at high (unscalable) computational expense.Results:We bridge this gap and develop an interpretable and scalable Bayesian proportional hazards model for prediction and variable selection, referred to as SVB. Our method, based on a mean-field variational approximation, overcomes the high computational cost of MCMC whilst retaining useful features, providing a posterior distribution for the parameters and offering a natural mechanism for variable selection via posterior inclusion probabilities. The performance of our proposed method is assessed via extensive simulations and compared against other state-of-the-art Bayesian variable selection methods, demonstrating comparable or better performance. Finally, we demonstrate how the proposed method can be used for variable selection on two transcriptomic datasets with censored survival outcomes, and how the uncertainty quantification offered by our method can be used to provide an interpretable assessment of patient risk.Availability and implementation:our method has been implemented as a freely available R package survival.svb (https://github.com/mkomod/survival.svb).
Evangelou M, Rodosthenous T, Shahrezaei V, 2021, Semi-Supervised Classification and Visualization of Multi-View Data, JSM 2021 - Section on Statistical Learning and Data Science
Rodosthenous T, Shahrezaei V, Evangelou M, 2021, S-multi-SNE: Semi-supervised classification and visualisation of multi-view data, Publisher: arXiv
An increasing number of multi-view data are being published by studies in several fields. This type of data corresponds to multiple data-views, each representing a different aspect of the same set of samples. We have recently proposed multi-SNE, an extension of t-SNE, that produces a single visualisation of multi-view data. The multi-SNE approach provides low-dimensional embeddings of the samples, produced by being updated iteratively through the different data-views. Here, we further extend multi-SNE to a semi-supervised approach, that classifies unlabelled samples by regarding the labelling information as an extra data-view. We look deeper into the performance, limitations and strengths of multi-SNE and its extension, S-multi-SNE, by applying the two methods on various multi-view datasets with different challenges. We show that by including the labelling information, the projection of the samples improves drastically and it is accompanied by a strong classification performance.
Rodosthenous T, Shahrezaei V, Evangelou M, 2021, S-multi-SNE: Semi-Supervised Classification and Visualisation of Multi-View Data
An increasing number of multi-view data are being published by studies inseveral fields. This type of data corresponds to multiple data-views, eachrepresenting a different aspect of the same set of samples. We have recentlyproposed multi-SNE, an extension of t-SNE, that produces a single visualisationof multi-view data. The multi-SNE approach provides low-dimensional embeddingsof the samples, produced by being updated iteratively through the differentdata-views. Here, we further extend multi-SNE to a semi-supervised approach,that classifies unlabelled samples by regarding the labelling information as anextra data-view. We look deeper into the performance, limitations and strengthsof multi-SNE and its extension, S-multi-SNE, by applying the two methods onvarious multi-view datasets with different challenges. We show that byincluding the labelling information, the projection of the samples improvesdrastically and it is accompanied by a strong classification performance.
Rodosthenous T, Shahrezaei V, Evangelou M, 2021, S-multi-SNE: Semi-supervised classification and visualisation of multi-view data
An increasing number of multi-view data are being published by studies in several fields. This type of data corresponds to multiple data-views, each representing a different aspect of the same set of samples. We have recently proposed multi-SNE, an extension of t-SNE, that produces a single visualisation of multi-view data. The multi-SNE approach provides low-dimensional embeddings of the samples, produced by being updated iteratively through the different data-views. Here, we further extend multi-SNE to a semi-supervised approach, that classifies unlabelled samples by regarding the labelling information as an extra data-view. We look deeper into the performance, limitations and strengths of multi-SNE and its extension, S-multi-SNE, by applying the two methods on various multi-view datasets with different challenges. We show that by including the labelling information, the projection of the samples improves drastically and it is accompanied by a strong classification performance.
van Vliet NA, Bos MM, Thesing CS, et al., 2021, Higher thyrotropin leads to unfavorable lipid profile and somewhat higher cardiovascular disease risk: evidence from multi-cohort Mendelian randomization and metabolomic profiling, BMC Medicine, Vol: 19, Pages: 1-13, ISSN: 1741-7015
BackgroundObservational studies suggest interconnections between thyroid status, metabolism, and risk of coronary artery disease (CAD), but causality remains to be proven. The present study aimed to investigate the potential causal relationship between thyroid status and cardiovascular disease and to characterize the metabolomic profile associated with thyroid status.MethodsMulti-cohort two-sample Mendelian randomization (MR) was performed utilizing genome-wide significant variants as instruments for standardized thyrotropin (TSH) and free thyroxine (fT4) within the reference range. Associations between TSH and fT4 and metabolic profile were investigated in a two-stage manner: associations between TSH and fT4 and the full panel of 161 metabolomic markers were first assessed hypothesis-free, then directional consistency was assessed through Mendelian randomization, another metabolic profile platform, and in individuals with biochemically defined thyroid dysfunction.ResultsCirculating TSH was associated with 52/161 metabolomic markers, and fT4 levels were associated with 21/161 metabolomic markers among 9432 euthyroid individuals (median age varied from 23.0 to 75.4 years, 54.5% women). Positive associations between circulating TSH levels and concentrations of very low-density lipoprotein subclasses and components, triglycerides, and triglyceride content of lipoproteins were directionally consistent across the multivariable regression, MR, metabolomic platforms, and for individuals with hypo- and hyperthyroidism. Associations with fT4 levels inversely reflected those observed with TSH. Among 91,810 CAD cases and 656,091 controls of European ancestry, per 1-SD increase of genetically determined TSH concentration risk of CAD increased slightly, but not significantly, with an OR of 1.03 (95% CI 0.99–1.07; p value 0.16), whereas higher genetically determined fT4 levels were not associated with CAD risk (OR 1.00 per SD increase of fT4; 95% CI 0.96–1.04;
Mustafa R, Mens MMJ, Huang J, et al., 2021, Associations of Circulatory MicroRNAs and Clinical Traits: A Phenome-wide Mendelian Randomization Analysis, Publisher: WILEY, Pages: 777-778, ISSN: 0741-0395
Rodriguez A, 2021, The link between Attention Deficit Hyperactivity Disorder (ADHD) symptoms and obesity-related traits: Genetic and prenatal explanations, Translational Psychiatry, Vol: 11, Pages: 1-8, ISSN: 2158-3188
Attention-deficit/hyperactivity disorder (ADHD) often co-occurs with obesity, however the potential causality between the traits remains unclear. We examined both genetic and prenatal evidence for causality using Mendelian Randomisation (MR) and polygenic risk scores (PRS). We conducted bi-directional MR on ADHD liability and six obesity-related traits using summary statistics from the largest available meta-analyses of genome-wide association studies. We also examined the shared genetic aetiology between ADHD symptoms (inattention and hyperactivity) and body mass index (BMI) by PRS association analysis using longitudinal data from Northern Finland Birth Cohort 1986 (NFBC1986, n = 2984). Lastly, we examined the impact of prenatal environment by association analysis of maternal pre-pregnancy BMI and offspring ADHD symptoms, adjusted for PRS of both traits, in NFBC1986 dataset. Through MR analyses, we found evidence for bidirectional causality between ADHD liability and obesity-related traits. PRS association analyses showed evidence for genetic overlap between ADHD symptoms and BMI. We found no evidence for a difference between inattention and hyperactivity symptoms, suggesting that neither symptom subtype is driving the association. We found evidence for association between maternal pre-pregnancy BMI and offspring ADHD symptoms after adjusting for both BMI and ADHD PRS (association p-value = 0.027 for inattention, p = 0.008 for hyperactivity). These results are consistent with the hypothesis that the co-occurrence between ADHD and obesity has both genetic and prenatal environmental origins.
Adams N, Riddle-Workman E, Evangelou M, 2021, Multi-Type relational clustering for enterprise cyber-security networks, Pattern Recognition Letters, Vol: 149, Pages: 172-178, ISSN: 0167-8655
Several cyber-security data sources are collected in enterprise networks providing relational information between different types of nodes in the network, namely computers, users and ports. This relational data can be expressed as adjacency matrices detailing inter-type relationships corresponding to relations between nodes of different types and intra-type relationships showing relationships between nodes of the same type. In this paper, we propose an extension of Non-Negative Matrix Tri-Factorisation (NMTF) to simultaneously cluster nodes based on their intra and inter-type relationships. Existing NMTF based clustering methods suffer from long computational times due to large matrix multiplications. In our approach, we enforce stricter cluster indicator constraints on the factor matrices to circumvent these issues. Additionally, to make our proposed approach less susceptible to variation in results due to random initialisation, we propose a novel initialisation procedure based on Non-Negative Double Singular Value Decomposition for multi-type relational clustering. Finally, a new performance measure suitable for assessing clustering performance on unlabelled multi-type relational data sets is presented. Our algorithm is assessed on both a simulated and real computer network against standard approaches showing its strong performance.
Rodosthenous T, Shahrezaei V, Evangelou M, 2021, Semi-supervised classification and visualisation of multi-view data, Joint Statistics Meeting (JSM) 2021, Publisher: American Statistical Association
An increasing number of multi-view data are being published by studies in several fields. This type of data corresponds to multiple data-views, each representing a different aspect of the same set of samples. We have recently proposed multi-SNE, an extension of t-SNE, that produces a single visualisation of multi-view data. The multi-SNE approach provides low-dimensional embeddings of the samples, produced by being updated iteratively through the different data-views. Here, we further extend multi-SNE to a semi-supervised approach, that classifies unlabelled samples byregarding the labelling information as an extra data-view. We look deeper into the performance, limitations and strengths of multi-SNE and its extension, S-multi-SNE, by applying the two methods on various multi-view datasets with different challenges. We show that by including the labelling information, the projection of the samples improves drastically and it is accompanied by a strong classification performance.
Van Vliet NA, Bos MM, Thesing CS, et al., 2021, HIGHER THYROID STIMULATING HORMONE LEADS TO CARDIOVASCULAR DISEASE AND AN UNFAVORABLE LIPID PROFILE: EVIDENCE FROM MULTI-COHORT MENDELIAN RANDOMIZATION AND METABOLOMIC PROFILING, Publisher: ELSEVIER IRELAND LTD, Pages: E40-E40, ISSN: 0021-9150
Frainay C, Pitarch Y, Filippi S, et al., 2021, Atopic dermatitis or eczema? Consequences of ambiguity in disease name for biomedical literature mining, Clinical and Experimental Allergy, Vol: 51, Pages: 1185-1194, ISSN: 0954-7894
BackgroundBiomedical research increasingly relies on computational approaches to extract relevant information from large corpora of publications.ObjectiveTo investigate the consequence of the ambiguity between the use of terms “Eczema” and “Atopic Dermatitis” (AD) from the Information Retrieval perspective, and its impact on meta-analyses, systematic reviews and text mining.MethodsArticles were retrieved by querying the PubMed using terms ‘eczema’ (D003876) and “dermatitis, atopic” (D004485). We used machine learning to investigate the differences between the contexts in which each term is used. We used a decision tree approach and trained model to predict if an article would be indexed with eczema or AD tags. We used text-mining tools to extract biological entities associated with eczema and AD, and investigated the discrepancy regarding the retrieval of key findings according to the terminology used.ResultsAtopic dermatitis query yielded more articles related to veterinary science, biochemistry, cellular and molecular biology; the eczema query linked to public health, infectious disease and respiratory system. Medical Subject Headings terms associated with “AD” or “Eczema” differed, with an agreement between the top 40 lists of 52%. The presence of terms related to cellular mechanisms, especially allergies and inflammation, characterized AD literature. The metabolites mentioned more frequently than expected in articles with AD tag differed from those indexed with eczema. Fewer enriched genes were retrieved when using eczema compared to AD query.Conclusions and Clinical RelevanceThere is a considerable discrepancy when using text mining to extract bio-entities related to eczema or AD. Our results suggest that any systematic approach (particularly when looking for metabolites or genes related to the condition) should be performed using both terms jointly. We propose to use decision tree learning
Rodosthenous T, Shahrezaei V, Evangelou M, 2021, Multi-view Data Visualisation via Manifold Learning, Publisher: arXiv
Rodosthenous T, Shahrezaei V, Evangelou M, 2021, Multi-view Data Visualisation via Manifold Learning
Non-linear dimensionality reduction can be performed by \textit{manifoldlearning} approaches, such as Stochastic Neighbour Embedding (SNE), LocallyLinear Embedding (LLE) and Isometric Feature Mapping (ISOMAP). These methodsaim to produce two or three latent embeddings, primarily to visualise the datain intelligible representations. This manuscript proposes extensions ofStudent's t-distributed SNE (t-SNE), LLE and ISOMAP, for dimensionalityreduction and visualisation of multi-view data. Multi-view data refers tomultiple types of data generated from the same samples. The proposed multi-viewapproaches provide more comprehensible projections of the samples compared tothe ones obtained by visualising each data-view separately. Commonlyvisualisation is used for identifying underlying patterns within the samples.By incorporating the obtained low-dimensional embeddings from the multi-viewmanifold approaches into the K-means clustering algorithm, it is shown thatclusters of the samples are accurately identified. Through the analysis of realand synthetic data the proposed multi-SNE approach is found to have the bestperformance. We further illustrate the applicability of the multi-SNE approachfor the analysis of multi-omics single-cell data, where the aim is to visualiseand identify cell heterogeneity and cell types in biological tissues relevantto health and disease.
Mustafa R, Mens M, Pinto R, et al., 2020, Identifying metabolomic fingerprints of microRNAs in cardiovascular disorders, Publisher: SPRINGERNATURE, Pages: 277-277, ISSN: 1018-4813
Evangelou M, Adams N, 2020, An anomaly detection framework for cyber-security data, Computers and Security, Vol: 97, Pages: 1-10, ISSN: 0167-4048
Data-driven anomaly detection systems unrivalled potential as complementary defence systems to existing signature-based tools as the number of cyber attacks increases. In this manuscript an anomaly detection system is presented that detects any abnormal deviations from the normal behaviour of an individual device. Device behaviour is defined as the number of network traffic events involving the device of interest observed within a pre-specified time period. The behaviour of each device at normal state is modelled to depend on its observed historic behaviour. A number of statistical and machine learning approaches are explored for modelling this relationship and through a comparative study, the Quantile Regression Forests approach is found to have the best predictive power. Based on the prediction intervals of the Quantile Regression Forests an anomaly detection system is proposed that characterises as abnormal, any observed behaviour outside of these intervals. A series of experiments for contaminating normal device behaviour are presented for examining the performance of the anomaly detection system. Through the conducted analysis the proposed anomaly detection system is found to outperform two other detection systems. The presented work has been conducted on two enterprise networks.
Mustafa R, Mens M, Pinto RJ, et al., 2020, Metabolomic signatures of microRNAs in cardiovascular traits: A Mendelian randomization analysis, Annual Meeting of the International-Genetic-Epidemiology-Society, Publisher: WILEY, Pages: 506-506, ISSN: 0741-0395
Lucotte EA, Sugier P-E, Deleuze J-F, et al., 2020, Analysis of the pleiotropy between breast cancer and thyroid cancer, Annual Meeting of the International-Genetic-Epidemiology-Society, Publisher: WILEY, Pages: 504-504, ISSN: 0741-0395
Rodosthenous T, Shahrezaei V, Evangelou M, 2020, Integrating multi-OMICS data through sparse Canonical Correlation Analysis for the prediction of complex traits: A comparison study, Bioinformatics, Vol: 36, Pages: 4616-4625, ISSN: 1367-4803
MotivationRecent developments in technology have enabled researchers to collect multiple OMICS datasets for the same individuals. The conventional approach for understanding the relationships between the collected datasets and the complex trait of interest would be through the analysis of each OMIC dataset separately from the rest, or to test for associations between the OMICS datasets. In this work we show that integrating multiple OMICS datasets together, instead of analysing them separately, improves our understanding of their in-between relationships as well as the predictive accuracy for the tested trait. Several approaches have been proposed for the integration of heterogeneous and high-dimensional (p ≫ n) data, such as OMICS. The sparse variant of Canonical Correlation Analysis (CCA) approach is a promising one that seeks to penalise the canonical variables for producing sparse latent variables while achieving maximal correlation between the datasets. Over the last years, a number of approaches for implementing sparse CCA (sCCA) have been proposed, where they differ on their objective functions, iterative algorithm for obtaining the sparse latent variables and make different assumptions about the original datasets.ResultsThrough a comparative study we have explored the performance of the conventional CCA proposed by Parkhomenko et al. (2009), penalised matrix decomposition CCA proposed by Witten and Tibshirani (2009) and its extension proposed by Suo et al. (2017). The aforementioned methods were modified to allow for different penalty functions. Although sCCA is an unsupervised learning approach for understanding of the in-between relationships, we have twisted the problem as a supervised learning one and investigated how the computed latent variables can be used for predicting complex traits. The approaches were extended to allow for multiple (more than two) datasets where the trait was included as one of the input datasets. Both ways have shown improvement
Karhunen V, Jarvelin M-R, Evangelou M, et al., 2019, A MENDELIAN RANDOMISATION STUDY ON CAUSALITY BETWEEN ATTENTION-DEFICIT/HYPERACTIVITY DISORDER AND MULTIPLE OBESITY-RELATED TRAITS, 27th World Congress of Psychiatric Genetics (WCPG), Publisher: ELSEVIER, Pages: S114-S115, ISSN: 0924-977X
Riddle-Workman E, Evangelou M, Adams N, 2018, Adaptive anomaly detection on network data streams, IEEE Conference on Intelligence and Security Informatics (ISI) 2018, Publisher: IEEE
As the number of cyber-attacks increases, there hasbeen increasing emphasis on developing complementary methodsof detection to the existing signature-based approaches. This workbuilds upon a previously discovered persistent structure withinthe Los Alamos National Laboratory network data sources,to develop a regression based streaming anomaly detectionmechanism that can adapt to the network behaviour over time.The methodology has also been applied to a new data set of thesame network to assess the extent of its pertinence in time.
Evangelou E, Warren HR, Mosen-Ansorena D, et al., 2018, Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits (vol 50, pg 1412, 2018), NATURE GENETICS, Vol: 50, Pages: 1755-1755, ISSN: 1061-4036
- Author Web Link
- Cite
- Citations: 8
Mustafa R, Ghanbari M, Evangelou M, et al., 2018, An enrichment analysis for cardiometabolic traits suggests non-random assignment of genes to microRNAs, International Journal of Molecular Sciences, Vol: 19, ISSN: 1422-0067
MicroRNAs (miRNAs) regulate the expression of majority of genes. However, it is not known whether they regulate genes in random or are organized according to their function. To this end, we chose cardiometabolic disorders as an example and investigated whether genes associated with cardiometabolic disorders are regulated by a random set of miRNAs or a limited number of them. Single-nucleotide polymorphisms (SNPs) reaching genome-wide level significance were retrieved from most recent genome-wide association studies on cardiometabolic traits, which were cross-referenced with Ensembl to identify related genes and combined with miRNA target prediction databases (TargetScan, miRTarBase, or miRecords) to identify miRNAs that regulate them. We retrieved 520 SNPs, of which 355 were intragenic, corresponding to 304 genes. While we found a higher proportion of genes reported from all GWAS that were predicted targets for miRNAs in comparison to all protein coding genes (75.1%), the proportion was even higher for cardiometabolic genes (80.6%). Enrichment analysis was performed within each database. We found that cardiometabolic genes were over-represented in target genes for 29 miRNAs (based on TargetScan) and 3 miRNAs (miR-181a, miR-302d, and miR-372) (based on miRecords) after Benjamini-Hochberg correction for multiple testing. Our work provides evidence for non-random assignment of genes to miRNAs and supports the idea that miRNAs regulate sets of genes that are functionally related.
Evangelou E, Warren HR, Mosen-Ansorena D, et al., 2018, Publisher correction: Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits, Nature Genetics, Vol: 50, Pages: 1755-1755, ISSN: 1061-4036
Correction to: Nature Genetics https://doi.org/10.1038/s41588-018-0205-x, published online 17 September 2018.
Evangelou E, Warren HR, Mosen-Ansorena D, et al., 2018, Genetic analysis of over one million people identifies 535 new loci associated with blood pressure traits, Nature Genetics, Vol: 50, Pages: 1412-1425, ISSN: 1061-4036
High blood pressure is a highly heritable and modifiable risk factor for cardiovascular disease. We report the largest genetic association study of blood pressure traits (systolic, diastolic and pulse pressure) to date in over 1 million people of European ancestry. We identify 535 novel blood pressure loci that not only offer new biological insights into blood pressure regulation but also highlight shared genetic architecture between blood pressure and lifestyle exposures. Our findings identify new biological pathways for blood pressure regulation with potential for improved cardiovascular disease prevention in the future.
Warren HR, Evangelou E, Mosen D, et al., 2018, GENETIC ANALYSIS OF OVER ONE MILLION PEOPLE IDENTIFIES 535 NOVEL LOCI ASSOCIATED WITH BLOOD PRESSURE AND RISK OF CARDIOVASCULAR DISEASE, 28th European Meeting of Hypertension and Cardiovascular Protection of the European-Society-of-Hypertension (ESH), Publisher: LIPPINCOTT WILLIAMS & WILKINS, Pages: E229-E229, ISSN: 0263-6352
Broc C, Evangelou M, Truong T, et al., 2018, Investigating gene- and pathway-environment Interaction analysis approaches, Journal of the French Statistical Society, ISSN: 1962-5197
Pathway analysis can increase power to detect associations with a gene or a pathway by combining severalsignals at the single nucleotide polymorphism (SNP)-level into a single test. In this work, we propose to extend twowell-known self-contained methods, the Fisher’s method (FM) and the Adaptive Rank Truncated Product (ARTP)method to the analysis of gene-environment (GxE) interaction at the gene and pathway-level. It has been previouslysuggested that the permutation procedures that are usually used to derive the significance of these tests are notappropriate for the analysis of GxE interaction and should be replaced by a bootstrap approach. We analyse andcompare the performance of the extension of FM and ARTP using the permutation and the parametric bootstrapprocedure in simulation studies. We illustrate its application by analysing the interaction between night work andcircadian gene polymorphisms in the risk of breast cancer in a case-control study. The ARTP method, adapted for bothgene- and pathway-environment interactions, gives promising results and has been wrapped to the R package PIGEavailable on the CRAN.
Schon C, Adams NM, Evangelou M, 2017, Clustering and monitoring edge behaviour in enterprise network traffic, IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE, Pages: 31-36
This paper takes an unsupervised learning approach for monitoring edge activity within an enterprise computer network. Using NetFlow records, features are gathered across the active connections (edges) in 15-minute time windows. Then, edges are grouped into clusters using the k-means algorithm. This process is repeated over contiguous windows. A series of informative indicators are derived by examining the relationship of edges with the observed cluster structure. This leads to an intuitive method for monitoring network behaviour and a temporal description of edge behaviour at global and local levels.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.