30 results found
Rodosthenous T, Shahrezaei V, Evangelou M, 2021, Multi-view Data Visualisation via Manifold Learning, Publisher: arXiv
Mustafa R, Mens M, Pinto R, et al., 2020, Identifying metabolomic fingerprints of microRNAs in cardiovascular disorders, Publisher: SPRINGERNATURE, Pages: 277-277, ISSN: 1018-4813
Evangelou M, Adams N, 2020, An anomaly detection framework for cyber-security data, Computers and Security, Vol: 97, Pages: 1-10, ISSN: 0167-4048
Data-driven anomaly detection systems unrivalled potential as complementary defence systems to existing signature-based tools as the number of cyber attacks increases. In this manuscript an anomaly detection system is presented that detects any abnormal deviations from the normal behaviour of an individual device. Device behaviour is defined as the number of network traffic events involving the device of interest observed within a pre-specified time period. The behaviour of each device at normal state is modelled to depend on its observed historic behaviour. A number of statistical and machine learning approaches are explored for modelling this relationship and through a comparative study, the Quantile Regression Forests approach is found to have the best predictive power. Based on the prediction intervals of the Quantile Regression Forests an anomaly detection system is proposed that characterises as abnormal, any observed behaviour outside of these intervals. A series of experiments for contaminating normal device behaviour are presented for examining the performance of the anomaly detection system. Through the conducted analysis the proposed anomaly detection system is found to outperform two other detection systems. The presented work has been conducted on two enterprise networks.
Lucotte EA, Sugier P-E, Deleuze J-F, et al., 2020, Analysis of the pleiotropy between breast cancer and thyroid cancer, Annual Meeting of the International-Genetic-Epidemiology-Society, Publisher: WILEY, Pages: 504-504, ISSN: 0741-0395
Mustafa R, Mens M, Pinto RJ, et al., 2020, Metabolomic signatures of microRNAs in cardiovascular traits: A Mendelian randomization analysis, Annual Meeting of the International-Genetic-Epidemiology-Society, Publisher: WILEY, Pages: 506-506, ISSN: 0741-0395
Rodosthenous T, Shahrezaei V, Evangelou M, 2020, Integrating multi-OMICS data through sparse Canonical Correlation Analysis for the prediction of complex traits: A comparison study, Bioinformatics, Vol: 36, Pages: 4616-4625, ISSN: 1367-4803
MotivationRecent developments in technology have enabled researchers to collect multiple OMICS datasets for the same individuals. The conventional approach for understanding the relationships between the collected datasets and the complex trait of interest would be through the analysis of each OMIC dataset separately from the rest, or to test for associations between the OMICS datasets. In this work we show that integrating multiple OMICS datasets together, instead of analysing them separately, improves our understanding of their in-between relationships as well as the predictive accuracy for the tested trait. Several approaches have been proposed for the integration of heterogeneous and high-dimensional (p ≫ n) data, such as OMICS. The sparse variant of Canonical Correlation Analysis (CCA) approach is a promising one that seeks to penalise the canonical variables for producing sparse latent variables while achieving maximal correlation between the datasets. Over the last years, a number of approaches for implementing sparse CCA (sCCA) have been proposed, where they differ on their objective functions, iterative algorithm for obtaining the sparse latent variables and make different assumptions about the original datasets.ResultsThrough a comparative study we have explored the performance of the conventional CCA proposed by Parkhomenko et al. (2009), penalised matrix decomposition CCA proposed by Witten and Tibshirani (2009) and its extension proposed by Suo et al. (2017). The aforementioned methods were modified to allow for different penalty functions. Although sCCA is an unsupervised learning approach for understanding of the in-between relationships, we have twisted the problem as a supervised learning one and investigated how the computed latent variables can be used for predicting complex traits. The approaches were extended to allow for multiple (more than two) datasets where the trait was included as one of the input datasets. Both ways have shown improvement
Karhunen V, Jarvelin M-R, Evangelou M, et al., 2019, A MENDELIAN RANDOMISATION STUDY ON CAUSALITY BETWEEN ATTENTION-DEFICIT/HYPERACTIVITY DISORDER AND MULTIPLE OBESITY-RELATED TRAITS, 27th World Congress of Psychiatric Genetics (WCPG), Publisher: ELSEVIER, Pages: S114-S115, ISSN: 0924-977X
Riddle-Workman E, Evangelou M, Adams N, 2018, Adaptive anomaly detection on network data streams, IEEE Conference on Intelligence and Security Informatics (ISI) 2018, Publisher: IEEE
As the number of cyber-attacks increases, there hasbeen increasing emphasis on developing complementary methodsof detection to the existing signature-based approaches. This workbuilds upon a previously discovered persistent structure withinthe Los Alamos National Laboratory network data sources,to develop a regression based streaming anomaly detectionmechanism that can adapt to the network behaviour over time.The methodology has also been applied to a new data set of thesame network to assess the extent of its pertinence in time.
Evangelou E, Warren HR, Mosen-Ansorena D, et al., 2018, Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits (vol 50, pg 1412, 2018), NATURE GENETICS, Vol: 50, Pages: 1755-1755, ISSN: 1061-4036
Mustafa R, Ghanbari M, Evangelou M, et al., 2018, An enrichment analysis for cardiometabolic traits suggests non-random assignment of genes to microRNAs, International Journal of Molecular Sciences, Vol: 19, ISSN: 1422-0067
MicroRNAs (miRNAs) regulate the expression of majority of genes. However, it is not known whether they regulate genes in random or are organized according to their function. To this end, we chose cardiometabolic disorders as an example and investigated whether genes associated with cardiometabolic disorders are regulated by a random set of miRNAs or a limited number of them. Single-nucleotide polymorphisms (SNPs) reaching genome-wide level significance were retrieved from most recent genome-wide association studies on cardiometabolic traits, which were cross-referenced with Ensembl to identify related genes and combined with miRNA target prediction databases (TargetScan, miRTarBase, or miRecords) to identify miRNAs that regulate them. We retrieved 520 SNPs, of which 355 were intragenic, corresponding to 304 genes. While we found a higher proportion of genes reported from all GWAS that were predicted targets for miRNAs in comparison to all protein coding genes (75.1%), the proportion was even higher for cardiometabolic genes (80.6%). Enrichment analysis was performed within each database. We found that cardiometabolic genes were over-represented in target genes for 29 miRNAs (based on TargetScan) and 3 miRNAs (miR-181a, miR-302d, and miR-372) (based on miRecords) after Benjamini-Hochberg correction for multiple testing. Our work provides evidence for non-random assignment of genes to miRNAs and supports the idea that miRNAs regulate sets of genes that are functionally related.
Evangelou E, Warren HR, Mosen-Ansorena D, et al., 2018, Publisher correction: Genetic analysis of over 1 million people identifies 535 new loci associated with blood pressure traits, Nature Genetics, Vol: 50, Pages: 1755-1755, ISSN: 1061-4036
Correction to: Nature Genetics https://doi.org/10.1038/s41588-018-0205-x, published online 17 September 2018.
Evangelou E, Warren HR, Mosen-Ansorena D, et al., 2018, Genetic analysis of over one million people identifies 535 new loci associated with blood pressure traits, Nature Genetics, Vol: 50, Pages: 1412-1425, ISSN: 1061-4036
High blood pressure is a highly heritable and modifiable risk factor for cardiovascular disease. We report the largest genetic association study of blood pressure traits (systolic, diastolic and pulse pressure) to date in over 1 million people of European ancestry. We identify 535 novel blood pressure loci that not only offer new biological insights into blood pressure regulation but also highlight shared genetic architecture between blood pressure and lifestyle exposures. Our findings identify new biological pathways for blood pressure regulation with potential for improved cardiovascular disease prevention in the future.
Warren HR, Evangelou E, Mosen D, et al., 2018, GENETIC ANALYSIS OF OVER ONE MILLION PEOPLE IDENTIFIES 535 NOVEL LOCI ASSOCIATED WITH BLOOD PRESSURE AND RISK OF CARDIOVASCULAR DISEASE, 28th European Meeting of Hypertension and Cardiovascular Protection of the European-Society-of-Hypertension (ESH), Publisher: LIPPINCOTT WILLIAMS & WILKINS, Pages: E229-E229, ISSN: 0263-6352
Broc C, Evangelou M, Truong T, et al., 2018, Investigating gene- and pathway-environment Interaction analysis approaches, Journal of the French Statistical Society, ISSN: 1962-5197
Pathway analysis can increase power to detect associations with a gene or a pathway by combining severalsignals at the single nucleotide polymorphism (SNP)-level into a single test. In this work, we propose to extend twowell-known self-contained methods, the Fisher’s method (FM) and the Adaptive Rank Truncated Product (ARTP)method to the analysis of gene-environment (GxE) interaction at the gene and pathway-level. It has been previouslysuggested that the permutation procedures that are usually used to derive the significance of these tests are notappropriate for the analysis of GxE interaction and should be replaced by a bootstrap approach. We analyse andcompare the performance of the extension of FM and ARTP using the permutation and the parametric bootstrapprocedure in simulation studies. We illustrate its application by analysing the interaction between night work andcircadian gene polymorphisms in the risk of breast cancer in a case-control study. The ARTP method, adapted for bothgene- and pathway-environment interactions, gives promising results and has been wrapped to the R package PIGEavailable on the CRAN.
Schon C, Adams NM, Evangelou M, 2017, Clustering and monitoring edge behaviour in enterprise network traffic, IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE, Pages: 31-36
This paper takes an unsupervised learning approach for monitoring edge activity within an enterprise computer network. Using NetFlow records, features are gathered across the active connections (edges) in 15-minute time windows. Then, edges are grouped into clusters using the k-means algorithm. This process is repeated over contiguous windows. A series of informative indicators are derived by examining the relationship of edges with the observed cluster structure. This leads to an intuitive method for monitoring network behaviour and a temporal description of edge behaviour at global and local levels.
Gibberd AJ, Evangelou M, Nelson JDB, 2017, The time-varying dependency patterns of NetFlow statistics, IEEE International Conference on Data Mining Workshop Proceedings, Publisher: IEEE
We investigate where and how key dependency structure between measures of network activity change throughout the course of daily activity. Our approach to data-mining is probabilistic in nature, we formulate the identification of dependency patterns as a regularised statistical estimation problem. The resulting model can be interpreted as a set of time-varying graphs and provides a useful visual interpretation of network activity. We believe this is the first application of dynamic graphical modelling to network traffic of this kind. Investigations are performed on 9 days of real-world network traffic across a subset of IP's. We demonstrate that dependency between features may change across time and discuss how these change at an intra and inter-day level. Such variation in feature dependency may have important consequences for the design and implementation of probabilistic intrusion detection systems.
Evangelou M, Adams N, 2016, Predictability of NetFlow data, IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE
The behaviour of individual devices connected to anenterprise network can vary dramatically, as a device’s activitydepends on the user operating the device as well as on all behindthe scenes operations between the device and the network. Beingable to understand and predict a device’s behaviour in a networkcan work as the foundation of an anomaly detection framework,as devices may show abnormal activity as part of a cyber attack.The aim of this work is the construction of a predictive regressionmodel for a device’s behaviour at normal state. The behaviourof a device is presented by a quantitative response and modelledto depend on historic data recorded by NetFlow.
Whitehouse M, Evangelou M, Adams N, 2016, Activity-based temporal anomaly detection in enterprise-cyber security, IEEE International Big Data Analytics for Cybersecurity computing (BDAC'16) Workshop, IEEE International Conference on Intelligence and Security Informatics, Publisher: IEEE
Statistical anomaly detection is emerging as animportant complement to signature-based methods for enterprisenetwork defence. In this paper, we isolate a persistent structurein two different enterprise network data sources. This structureprovides the basis of a regression-based anomaly detectionmethod. The procedure is demonstrated on a large public domaindata set.
Todd J, Evangelou M, Cutler AJ, et al., 2016, Regulatory T Cell Responses in Participants with Type 1 Diabetes after a Single Dose of Interleukin-2: A Non-Randomised, Open Label, Adaptive Dose-Finding Trial, PLOS Medicine, Vol: 13, ISSN: 1549-1277
BackgroundInterleukin-2 (IL-2) has an essential role in the expansion and function of CD4+ regulatory Tcells (Tregs). Tregs reduce tissue damage by limiting the immune response following infectionand regulate autoreactive CD4+ effector T cells (Teffs) to prevent autoimmune diseases,such as type 1 diabetes (T1D). Genetic susceptibility to T1D causes alterations inthe IL-2 pathway, a finding that supports Tregs as a cellular therapeutic target. Aldesleukin(Proleukin; recombinant human IL-2), which is administered at high doses to activate the immune system in cancer immunotherapy, is now being repositioned to treat inflammatoryand autoimmune disorders at lower doses by targeting Tregs.Methods and FindingsTo define the aldesleukin dose response for Tregs and to find doses that increase Tregsphysiologically for treatment of T1D, a statistical and systematic approach was taken byanalysing the pharmacokinetics and pharmacodynamics of single doses of subcutaneousaldesleukin in the Adaptive Study of IL-2 Dose on Regulatory T Cells in Type 1 Diabetes(DILT1D), a single centre, non-randomised, open label, adaptive dose-finding trial with 40adult participants with recently diagnosed T1D. The primary endpoint was the maximumpercentage increase in Tregs (defined as CD3+CD4+CD25highCD127low) from the baselinefrequency in each participant measured over the 7 d following treatment. There was an initiallearning phase with five pairs of participants, each pair receiving one of five preassignedsingle doses from 0.04 × 106 to 1.5 × 106 IU/m2, in order to model the doseresponsecurve. Results from each participant were then incorporated into interim statisticalmodelling to target the two doses most likely to induce 10% and 20% increases in Treg frequencies.Primary analysis of the evaluable population (n = 39) found that the optimaldoses of aldesleukin to induce 10% and 20% increases in Tregs were 0.101 × 106 IU/m2(standard error [SE] = 0.078, 95% CI = −0.052, 0.254
Larsen E, Truong T, Evangelou M, 2016, Exploring GenexEnvironment interactions through pathway analysis, Annual Meeting of the International-Genetic-Epidemiology-Society, Publisher: Wiley, Pages: 648-649, ISSN: 1098-2272
Nasser S, Lazaridis A, Evangelou M, et al., 2016, Correlation of pre-operative CT findings with surgical & histological tumor dissemination patterns at cytoreduction for primary advanced and relapsed epithelial ovarian cancer: A retrospective evaluation, Gynecologic Oncology, Vol: 143, Pages: 264-269, ISSN: 1095-6859
ObjectivesComputed tomography (CT) is an essential part of preoperative planning prior to cytoreductive surgery for primary and relapsed epithelial ovarian cancer (EOC). Our aim is to correlate pre-operative CT results with intraoperative surgical and histopathological findings at debulking surgery.MethodsWe performed a systematic comparison of intraoperative tumor dissemination patterns and surgical resections with preoperative CT assessments of infiltrative disease at key resection sites, in women who underwent multivisceral debulking surgery due to EOC between January 2013 and December 2014 at a tertiary referral center. The key sites were defined as follows: diaphragmatic involvement(DI), splenic disease (SI), large (LBI) and small (SBI) bowel involvement, rectal involvement (RI), porta hepatis involvement (PHI), mesenteric disease (MI) and lymph node involvement (LNI).ResultsA total of 155 patients, mostly with FIGO stage IIIC disease (65%) were evaluated (primary = 105, relapsed = 50). Total macroscopic cytoreduction rates were: 89%. Pre-operative CT findings displayed high specificity across all tumor sites apart from the retroperitoneal lymph node status, with a specificity of 65%.The ability however of the CT to accurately identify sites affected by invasive disease was relatively low with the following sensitivities as relating to final histology:32% (DI), 26% (SI), 46% (LBI), 44% (SBI), 39% (RI), 57% (PHI), 31% (MI), 63% (LNI).ConclusionPre-operative CT imaging shows high specificity but low sensitivity in detecting tumor involvement at key sites in ovarian cancer surgery. CT findings alone should not be used for surgical decision making.
Dopico XC, Evangelou M, Ferreira RC, et al., 2015, Widespread seasonal gene expression reveals annual differences in human immunity and physiology, Nature Communications, Vol: 6, ISSN: 2041-1723
Seasonal variations are rarely considered a contributing component to human tissue function or health, although many diseases and physiological process display annual periodicities. Here we find more than 4,000 protein-coding mRNAs in white blood cells and adipose tissue to have seasonal expression profiles, with inverted patterns observed between Europe and Oceania. We also find the cellular composition of blood to vary by season, and these changes, which differ between the United Kingdom and The Gambia, could explain the gene expression periodicity. With regards to tissue function, the immune system has a profound pro-inflammatory transcriptomic profile during European winter, with increased levels of soluble IL-6 receptor and C-reactive protein, risk biomarkers for cardiovascular, psychiatric and autoimmune diseases that have peak incidences in winter. Circannual rhythms thus require further exploration as contributors to various aspects of human physiology and disease.
Heywood J, Evangelou M, Goymer D, et al., 2015, Effective recruitment of participants to a phase I study using the internet and publicity releases through charities and patient organisations: analysis of the adaptive study of IL-2 dose on regulatory T cells in type 1 diabetes (DILT1D), TRIALS, Vol: 16, ISSN: 1745-6215
Truman LA, Pekalski ML, Kareclas P, et al., 2015, Protocol of the adaptive study of IL-2 dose frequency on regulatory T cells in type 1 diabetes (DILfrequency): a mechanistic, non-randomised, repeat dose, open-label, response-adaptive study, BMJ OPEN, Vol: 5, ISSN: 2044-6055
Evangelou M, Smyth DJ, Fortune MD, et al., 2014, A Method for Gene-Based Pathway Analysis Using Genomewide Association Study Summary Statistics Reveals Nine New Type 1 Diabetes Associations, GENETIC EPIDEMIOLOGY, Vol: 38, Pages: 661-670, ISSN: 0741-0395
Evangelou M, Dudbridge F, Wernisch L, 2014, Two novel pathway analysis methods based on a hierarchical model, Bioinformatics, Vol: 30, Pages: 690-697, ISSN: 1367-4803
Motivation: Over the past few years several pathway analysis methods have been proposed for exploring and enhancing the analysis of genome-wide association data. Hierarchical models have been advocated as a way to integrate SNP and pathway effects in the same model, but their computational complexity has prevented them being applied on a genome-wide scale to date.Methods: We present two novel methods for identifying associated pathways. In the proposed hierarchical model, the SNP effects are analytically integrated out of the analysis, allowing computationally tractable model fitting to genome-wide data. The first method uses Bayes factors for calculating the effect of the pathways, whereas the second method uses a machine learning algorithm and adaptive lasso for finding a sparse solution of associated pathways.Results: The performance of the proposed methods was explored on both simulated and real data. The results of the simulation study showed that the methods outperformed some well-established association methods: the commonly used Fisher’s method for combining P-values and also the recently published BGSA. The methods were applied to two genome-wide association study datasets that aimed to find the genetic structure of platelet function and body mass index, respectively. The results of the analyses replicated the results of previously published pathway analysis of these phenotypes but also identified novel pathways that are potentially involved.Availability: An R package is under preparation. In the meantime, the scripts of the methods are available on request from the authors.
Evangelou M, Rendon A, Ouwehand WH, et al., 2012, Comparison of Methods for Competitive Tests of Pathway Analysis, PLOS ONE, Vol: 7, ISSN: 1932-6203
Evangelou M, Dudbridge F, Wernisch L, 2012, Bayesian Hierarchical Modelling of SNPs and Pathways for Identifying Associated Pathways, 20th Annual Meeting of the International-Genetic-Epidemiology-Society (IGES), Publisher: WILEY-BLACKWELL, Pages: 150-150, ISSN: 0741-0395
Evangelou M, Wernisch L, Dudbridge F, 2012, Comparison of Methods for Enrichment Tests in Pathway Analysis, 20th Annual Meeting of the International-Genetic-Epidemiology-Society (IGES), Publisher: WILEY-BLACKWELL, Pages: 150-151, ISSN: 0741-0395
Rodosthenous T, Shahrezaei V, Evangelou M, Multi-view Data Visualisation via Manifold Learning
Manifold learning approaches, such as Stochastic Neighbour Embedding (SNE),Locally Linear Embedding (LLE) and Isometric Feature Mapping (ISOMAP) have beenproposed for performing non-linear dimensionality reduction. These methods aimto produce two or three latent embeddings, in order to visualise the data inintelligible representations. This manuscript proposes extensions of Student'st-distributed SNE (t-SNE), LLE and ISOMAP, to allow for dimensionalityreduction and subsequent visualisation of multi-view data. Nowadays, it is very common to have multiple data-views on the same samples.Each data-view contains a set of features describing different aspects of thesamples. For example, in biomedical studies it is possible to generate multipleOMICS data sets for the same individuals, such as transcriptomics, genomics,epigenomics, enabling better understanding of the relationships between thedifferent biological processes. Through the analysis of real and simulated datasets, the visualisationperformance of the proposed methods is illustrated. Data visualisations havebeen often utilised for identifying any potential clusters in the data sets. Weshow that by incorporating the low-dimensional embeddings obtained via themulti-view manifold learning approaches into the K-means algorithm, clusters ofthe samples are accurately identified. Our proposed multi-SNE methodoutperforms the corresponding multi-ISOMAP and multi-LLE proposed methods.Interestingly, multi-SNE is found to have comparable performance with methodsproposed in the literature for performing multi-view clustering.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.