Publications
51 results found
Mustafa R, Ghanbari M, Karhunen V, et al., 2023, Phenome-wide association study on miRNA-related sequence variants: the UK Biobank, Human Genomics, Vol: 17, ISSN: 1479-7364
Background:Genetic variants in the coding region could directly affect the structure and expression levels of genes and proteins. However, the importance of variants in the non-coding region, such as microRNAs (miRNAs), remain to be elucidated. Genetic variants in miRNA-related sequences could affect their biogenesis or functionality and ultimately affect disease risk. Yet, their implications and pleiotropic effects on many clinical conditions remain unknown.Methods:Here, we utilised genotyping and hospital records data in the UK Biobank (N = 423,419) to investigate associations between 346 genetic variants in miRNA-related sequences and a wide range of clinical diagnoses through phenome-wide association studies. Further, we tested whether changes in blood miRNA expression levels could affect disease risk through colocalisation and Mendelian randomisation analysis.Results:We identified 122 associations for six variants in the seed region of miRNAs, nine variants in the mature region of miRNAs, and 27 variants in the precursor miRNAs. These included associations with hypertension, dyslipidaemia, immune-related disorders, and others. Nineteen miRNAs were associated with multiple diagnoses, with six of them associated with multiple disease categories. The strongest association was reported between rs4285314 in the precursor of miR-3135b and celiac disease risk (odds ratio (OR) per effect allele increase = 0.37, P = 1.8 × 10–162). Colocalisation and Mendelian randomisation analysis highlighted potential causal role of miR-6891-3p in dyslipidaemia.Conclusions:Our study demonstrates the pleiotropic effect of miRNAs and offers insights to their possible clinical importance.
Aglago EK, Kim A, Lin Y, et al., 2023, Data from A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk
<jats:p><div>Abstract<p>Colorectal cancer risk can be impacted by genetic, environmental, and lifestyle factors, including diet and obesity. Gene-environment interactions (G × E) can provide biological insights into the effects of obesity on colorectal cancer risk. Here, we assessed potential genome-wide G × E interactions between body mass index (BMI) and common SNPs for colorectal cancer risk using data from 36,415 colorectal cancer cases and 48,451 controls from three international colorectal cancer consortia (CCFR, CORECT, and GECCO). The G × E tests included the conventional logistic regression using multiplicative terms (one degree of freedom, 1DF test), the two-step EDGE method, and the joint 3DF test, each of which is powerful for detecting G × E interactions under specific conditions. BMI was associated with higher colorectal cancer risk. The two-step approach revealed a statistically significant G×BMI interaction located within the Formin 1/Gremlin 1 (<i>FMN1/GREM1</i>) gene region (rs58349661). This SNP was also identified by the 3DF test, with a suggestive statistical significance in the 1DF test. Among participants with the CC genotype of rs58349661, overweight and obesity categories were associated with higher colorectal cancer risk, whereas null associations were observed across BMI categories in those with the TT genotype. Using data from three large international consortia, this study discovered a locus in the <i>FMN1/GREM1</i> gene region that interacts with BMI on the association with colorectal cancer risk. Further studies should examine the potential mechanisms through which this locus modifies the etiologic link between obesity and colorectal cancer.</p>Significance:<p>This gene-environment interaction analysis revealed a genetic locus in FMN1/GREM1 that interacts with body mass index in colorectal
Aglago EK, Kim A, Lin Y, et al., 2023, Supplementary Data from A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk
<jats:p><p>supplementary materials</p></jats:p>
Aglago EK, Kim A, Lin Y, et al., 2023, Table 1 from A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk
<jats:p><p>Selected characteristics of the participants.</p></jats:p>
Aglago EK, Kim A, Lin Y, et al., 2023, Data from A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk
<jats:p><div>Abstract<p>Colorectal cancer risk can be impacted by genetic, environmental, and lifestyle factors, including diet and obesity. Gene-environment interactions (G × E) can provide biological insights into the effects of obesity on colorectal cancer risk. Here, we assessed potential genome-wide G × E interactions between body mass index (BMI) and common SNPs for colorectal cancer risk using data from 36,415 colorectal cancer cases and 48,451 controls from three international colorectal cancer consortia (CCFR, CORECT, and GECCO). The G × E tests included the conventional logistic regression using multiplicative terms (one degree of freedom, 1DF test), the two-step EDGE method, and the joint 3DF test, each of which is powerful for detecting G × E interactions under specific conditions. BMI was associated with higher colorectal cancer risk. The two-step approach revealed a statistically significant G×BMI interaction located within the Formin 1/Gremlin 1 (<i>FMN1/GREM1</i>) gene region (rs58349661). This SNP was also identified by the 3DF test, with a suggestive statistical significance in the 1DF test. Among participants with the CC genotype of rs58349661, overweight and obesity categories were associated with higher colorectal cancer risk, whereas null associations were observed across BMI categories in those with the TT genotype. Using data from three large international consortia, this study discovered a locus in the <i>FMN1/GREM1</i> gene region that interacts with BMI on the association with colorectal cancer risk. Further studies should examine the potential mechanisms through which this locus modifies the etiologic link between obesity and colorectal cancer.</p>Significance:<p>This gene-environment interaction analysis revealed a genetic locus in FMN1/GREM1 that interacts with body mass index in colorectal
Aglago EK, Kim A, Lin Y, et al., 2023, Table 2 from A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk
<jats:p><p>Summary of G × BMI analyses using 1DF, two-step, and 3DF analyses.</p></jats:p>
Aglago EK, Kim A, Lin Y, et al., 2023, Supplementary Data from A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk
<jats:p><p>supplementary materials</p></jats:p>
Aglago EK, Kim A, Lin Y, et al., 2023, A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk, CANCER RESEARCH, Vol: 83, Pages: 2572-2583, ISSN: 0008-5472
Aglago EK, Kim A, Lin Y, et al., 2023, Table 1 from A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk
<jats:p><p>Selected characteristics of the participants.</p></jats:p>
Aglago EK, Kim A, Lin Y, et al., 2023, Table 2 from A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk
<jats:p><p>Summary of G × BMI analyses using 1DF, two-step, and 3DF analyses.</p></jats:p>
Sanna Passino F, Adams N, Cohen E, et al., 2023, Statistical cybersecurity: a brief discussion of challenges, data structures, and future directions, Harvard Data Science Review, Vol: 5, Pages: 1-10, ISSN: 2644-2353
Komodromos M, Aboagye EO, Evangelou M, et al., 2022, Variational Bayes for high-dimensional proportional hazards models with applications within gene expression, BIOINFORMATICS, Vol: 38, Pages: 3918-3926, ISSN: 1367-4803
Motivation:Few Bayesian methods for analyzing high-dimensional sparse survival data provide scalable variable selection, effect estimation and uncertainty quantification. Such methods often either sacrifice uncertainty quantification by computing maximum a posteriori estimates, or quantify the uncertainty at high (unscalable) computational expense.Results:We bridge this gap and develop an interpretable and scalable Bayesian proportional hazards model for prediction and variable selection, referred to as SVB. Our method, based on a mean-field variational approximation, overcomes the high computational cost of MCMC whilst retaining useful features, providing a posterior distribution for the parameters and offering a natural mechanism for variable selection via posterior inclusion probabilities. The performance of our proposed method is assessed via extensive simulations and compared against other state-of-the-art Bayesian variable selection methods, demonstrating comparable or better performance. Finally, we demonstrate how the proposed method can be used for variable selection on two transcriptomic datasets with censored survival outcomes, and how the uncertainty quantification offered by our method can be used to provide an interpretable assessment of patient risk.Availability and implementation:our method has been implemented as a freely available R package survival.svb (https://github.com/mkomod/survival.svb).
Evangelou M, Rodosthenous T, Shahrezaei V, 2021, Semi-Supervised Classification and Visualization of Multi-View Data, JSM 2021 - Section on Statistical Learning and Data Science
Rodosthenous T, Shahrezaei V, Evangelou M, 2021, S-multi-SNE: Semi-supervised classification and visualisation of multi-view data, Publisher: arXiv
An increasing number of multi-view data are being published by studies in several fields. This type of data corresponds to multiple data-views, each representing a different aspect of the same set of samples. We have recently proposed multi-SNE, an extension of t-SNE, that produces a single visualisation of multi-view data. The multi-SNE approach provides low-dimensional embeddings of the samples, produced by being updated iteratively through the different data-views. Here, we further extend multi-SNE to a semi-supervised approach, that classifies unlabelled samples by regarding the labelling information as an extra data-view. We look deeper into the performance, limitations and strengths of multi-SNE and its extension, S-multi-SNE, by applying the two methods on various multi-view datasets with different challenges. We show that by including the labelling information, the projection of the samples improves drastically and it is accompanied by a strong classification performance.
Rodosthenous T, Shahrezaei V, Evangelou M, 2021, S-multi-SNE: Semi-supervised classification and visualisation of multi-view data
An increasing number of multi-view data are being published by studies in several fields. This type of data corresponds to multiple data-views, each representing a different aspect of the same set of samples. We have recently proposed multi-SNE, an extension of t-SNE, that produces a single visualisation of multi-view data. The multi-SNE approach provides low-dimensional embeddings of the samples, produced by being updated iteratively through the different data-views. Here, we further extend multi-SNE to a semi-supervised approach, that classifies unlabelled samples by regarding the labelling information as an extra data-view. We look deeper into the performance, limitations and strengths of multi-SNE and its extension, S-multi-SNE, by applying the two methods on various multi-view datasets with different challenges. We show that by including the labelling information, the projection of the samples improves drastically and it is accompanied by a strong classification performance.
van Vliet NA, Bos MM, Thesing CS, et al., 2021, Higher thyrotropin leads to unfavorable lipid profile and somewhat higher cardiovascular disease risk: evidence from multi-cohort Mendelian randomization and metabolomic profiling, BMC Medicine, Vol: 19, Pages: 1-13, ISSN: 1741-7015
BackgroundObservational studies suggest interconnections between thyroid status, metabolism, and risk of coronary artery disease (CAD), but causality remains to be proven. The present study aimed to investigate the potential causal relationship between thyroid status and cardiovascular disease and to characterize the metabolomic profile associated with thyroid status.MethodsMulti-cohort two-sample Mendelian randomization (MR) was performed utilizing genome-wide significant variants as instruments for standardized thyrotropin (TSH) and free thyroxine (fT4) within the reference range. Associations between TSH and fT4 and metabolic profile were investigated in a two-stage manner: associations between TSH and fT4 and the full panel of 161 metabolomic markers were first assessed hypothesis-free, then directional consistency was assessed through Mendelian randomization, another metabolic profile platform, and in individuals with biochemically defined thyroid dysfunction.ResultsCirculating TSH was associated with 52/161 metabolomic markers, and fT4 levels were associated with 21/161 metabolomic markers among 9432 euthyroid individuals (median age varied from 23.0 to 75.4 years, 54.5% women). Positive associations between circulating TSH levels and concentrations of very low-density lipoprotein subclasses and components, triglycerides, and triglyceride content of lipoproteins were directionally consistent across the multivariable regression, MR, metabolomic platforms, and for individuals with hypo- and hyperthyroidism. Associations with fT4 levels inversely reflected those observed with TSH. Among 91,810 CAD cases and 656,091 controls of European ancestry, per 1-SD increase of genetically determined TSH concentration risk of CAD increased slightly, but not significantly, with an OR of 1.03 (95% CI 0.99–1.07; p value 0.16), whereas higher genetically determined fT4 levels were not associated with CAD risk (OR 1.00 per SD increase of fT4; 95% CI 0.96–1.04;
Mustafa R, Mens MMJ, Huang J, et al., 2021, Associations of Circulatory MicroRNAs and Clinical Traits: A Phenome-wide Mendelian Randomization Analysis, Publisher: WILEY, Pages: 777-778, ISSN: 0741-0395
Rodriguez A, 2021, The link between Attention Deficit Hyperactivity Disorder (ADHD) symptoms and obesity-related traits: Genetic and prenatal explanations, Translational Psychiatry, Vol: 11, Pages: 1-8, ISSN: 2158-3188
Attention-deficit/hyperactivity disorder (ADHD) often co-occurs with obesity, however the potential causality between the traits remains unclear. We examined both genetic and prenatal evidence for causality using Mendelian Randomisation (MR) and polygenic risk scores (PRS). We conducted bi-directional MR on ADHD liability and six obesity-related traits using summary statistics from the largest available meta-analyses of genome-wide association studies. We also examined the shared genetic aetiology between ADHD symptoms (inattention and hyperactivity) and body mass index (BMI) by PRS association analysis using longitudinal data from Northern Finland Birth Cohort 1986 (NFBC1986, n = 2984). Lastly, we examined the impact of prenatal environment by association analysis of maternal pre-pregnancy BMI and offspring ADHD symptoms, adjusted for PRS of both traits, in NFBC1986 dataset. Through MR analyses, we found evidence for bidirectional causality between ADHD liability and obesity-related traits. PRS association analyses showed evidence for genetic overlap between ADHD symptoms and BMI. We found no evidence for a difference between inattention and hyperactivity symptoms, suggesting that neither symptom subtype is driving the association. We found evidence for association between maternal pre-pregnancy BMI and offspring ADHD symptoms after adjusting for both BMI and ADHD PRS (association p-value = 0.027 for inattention, p = 0.008 for hyperactivity). These results are consistent with the hypothesis that the co-occurrence between ADHD and obesity has both genetic and prenatal environmental origins.
Adams N, Riddle-Workman E, Evangelou M, 2021, Multi-Type relational clustering for enterprise cyber-security networks, Pattern Recognition Letters, Vol: 149, Pages: 172-178, ISSN: 0167-8655
Several cyber-security data sources are collected in enterprise networks providing relational information between different types of nodes in the network, namely computers, users and ports. This relational data can be expressed as adjacency matrices detailing inter-type relationships corresponding to relations between nodes of different types and intra-type relationships showing relationships between nodes of the same type. In this paper, we propose an extension of Non-Negative Matrix Tri-Factorisation (NMTF) to simultaneously cluster nodes based on their intra and inter-type relationships. Existing NMTF based clustering methods suffer from long computational times due to large matrix multiplications. In our approach, we enforce stricter cluster indicator constraints on the factor matrices to circumvent these issues. Additionally, to make our proposed approach less susceptible to variation in results due to random initialisation, we propose a novel initialisation procedure based on Non-Negative Double Singular Value Decomposition for multi-type relational clustering. Finally, a new performance measure suitable for assessing clustering performance on unlabelled multi-type relational data sets is presented. Our algorithm is assessed on both a simulated and real computer network against standard approaches showing its strong performance.
Rodosthenous T, Shahrezaei V, Evangelou M, 2021, Semi-supervised classification and visualisation of multi-view data, Joint Statistics Meeting (JSM) 2021, Publisher: American Statistical Association
An increasing number of multi-view data are being published by studies in several fields. This type of data corresponds to multiple data-views, each representing a different aspect of the same set of samples. We have recently proposed multi-SNE, an extension of t-SNE, that produces a single visualisation of multi-view data. The multi-SNE approach provides low-dimensional embeddings of the samples, produced by being updated iteratively through the different data-views. Here, we further extend multi-SNE to a semi-supervised approach, that classifies unlabelled samples byregarding the labelling information as an extra data-view. We look deeper into the performance, limitations and strengths of multi-SNE and its extension, S-multi-SNE, by applying the two methods on various multi-view datasets with different challenges. We show that by including the labelling information, the projection of the samples improves drastically and it is accompanied by a strong classification performance.
Van Vliet NA, Bos MM, Thesing CS, et al., 2021, HIGHER THYROID STIMULATING HORMONE LEADS TO CARDIOVASCULAR DISEASE AND AN UNFAVORABLE LIPID PROFILE: EVIDENCE FROM MULTI-COHORT MENDELIAN RANDOMIZATION AND METABOLOMIC PROFILING, Publisher: ELSEVIER IRELAND LTD, Pages: E40-E40, ISSN: 0021-9150
Frainay C, Pitarch Y, Filippi S, et al., 2021, Atopic dermatitis or eczema? Consequences of ambiguity in disease name for biomedical literature mining, Clinical and Experimental Allergy, Vol: 51, Pages: 1185-1194, ISSN: 0954-7894
BackgroundBiomedical research increasingly relies on computational approaches to extract relevant information from large corpora of publications.ObjectiveTo investigate the consequence of the ambiguity between the use of terms “Eczema” and “Atopic Dermatitis” (AD) from the Information Retrieval perspective, and its impact on meta-analyses, systematic reviews and text mining.MethodsArticles were retrieved by querying the PubMed using terms ‘eczema’ (D003876) and “dermatitis, atopic” (D004485). We used machine learning to investigate the differences between the contexts in which each term is used. We used a decision tree approach and trained model to predict if an article would be indexed with eczema or AD tags. We used text-mining tools to extract biological entities associated with eczema and AD, and investigated the discrepancy regarding the retrieval of key findings according to the terminology used.ResultsAtopic dermatitis query yielded more articles related to veterinary science, biochemistry, cellular and molecular biology; the eczema query linked to public health, infectious disease and respiratory system. Medical Subject Headings terms associated with “AD” or “Eczema” differed, with an agreement between the top 40 lists of 52%. The presence of terms related to cellular mechanisms, especially allergies and inflammation, characterized AD literature. The metabolites mentioned more frequently than expected in articles with AD tag differed from those indexed with eczema. Fewer enriched genes were retrieved when using eczema compared to AD query.Conclusions and Clinical RelevanceThere is a considerable discrepancy when using text mining to extract bio-entities related to eczema or AD. Our results suggest that any systematic approach (particularly when looking for metabolites or genes related to the condition) should be performed using both terms jointly. We propose to use decision tree learning
Rodosthenous T, Shahrezaei V, Evangelou M, 2021, Multi-view Data Visualisation via Manifold Learning, Publisher: arXiv
Mustafa R, Mens M, Pinto R, et al., 2020, Identifying metabolomic fingerprints of microRNAs in cardiovascular disorders, Publisher: SPRINGERNATURE, Pages: 277-277, ISSN: 1018-4813
Evangelou M, Adams N, 2020, An anomaly detection framework for cyber-security data, Computers and Security, Vol: 97, Pages: 1-10, ISSN: 0167-4048
Data-driven anomaly detection systems unrivalled potential as complementary defence systems to existing signature-based tools as the number of cyber attacks increases. In this manuscript an anomaly detection system is presented that detects any abnormal deviations from the normal behaviour of an individual device. Device behaviour is defined as the number of network traffic events involving the device of interest observed within a pre-specified time period. The behaviour of each device at normal state is modelled to depend on its observed historic behaviour. A number of statistical and machine learning approaches are explored for modelling this relationship and through a comparative study, the Quantile Regression Forests approach is found to have the best predictive power. Based on the prediction intervals of the Quantile Regression Forests an anomaly detection system is proposed that characterises as abnormal, any observed behaviour outside of these intervals. A series of experiments for contaminating normal device behaviour are presented for examining the performance of the anomaly detection system. Through the conducted analysis the proposed anomaly detection system is found to outperform two other detection systems. The presented work has been conducted on two enterprise networks.
Mustafa R, Mens M, Pinto RJ, et al., 2020, Metabolomic signatures of microRNAs in cardiovascular traits: A Mendelian randomization analysis, Annual Meeting of the International-Genetic-Epidemiology-Society, Publisher: WILEY, Pages: 506-506, ISSN: 0741-0395
Lucotte EA, Sugier P-E, Deleuze J-F, et al., 2020, Analysis of the pleiotropy between breast cancer and thyroid cancer, Annual Meeting of the International-Genetic-Epidemiology-Society, Publisher: WILEY, Pages: 504-504, ISSN: 0741-0395
Rodosthenous T, Shahrezaei V, Evangelou M, 2020, Integrating multi-OMICS data through sparse Canonical Correlation Analysis for the prediction of complex traits: A comparison study, Bioinformatics, Vol: 36, Pages: 4616-4625, ISSN: 1367-4803
MotivationRecent developments in technology have enabled researchers to collect multiple OMICS datasets for the same individuals. The conventional approach for understanding the relationships between the collected datasets and the complex trait of interest would be through the analysis of each OMIC dataset separately from the rest, or to test for associations between the OMICS datasets. In this work we show that integrating multiple OMICS datasets together, instead of analysing them separately, improves our understanding of their in-between relationships as well as the predictive accuracy for the tested trait. Several approaches have been proposed for the integration of heterogeneous and high-dimensional (p ≫ n) data, such as OMICS. The sparse variant of Canonical Correlation Analysis (CCA) approach is a promising one that seeks to penalise the canonical variables for producing sparse latent variables while achieving maximal correlation between the datasets. Over the last years, a number of approaches for implementing sparse CCA (sCCA) have been proposed, where they differ on their objective functions, iterative algorithm for obtaining the sparse latent variables and make different assumptions about the original datasets.ResultsThrough a comparative study we have explored the performance of the conventional CCA proposed by Parkhomenko et al. (2009), penalised matrix decomposition CCA proposed by Witten and Tibshirani (2009) and its extension proposed by Suo et al. (2017). The aforementioned methods were modified to allow for different penalty functions. Although sCCA is an unsupervised learning approach for understanding of the in-between relationships, we have twisted the problem as a supervised learning one and investigated how the computed latent variables can be used for predicting complex traits. The approaches were extended to allow for multiple (more than two) datasets where the trait was included as one of the input datasets. Both ways have shown improvement
Karhunen V, Jarvelin M-R, Evangelou M, et al., 2019, A MENDELIAN RANDOMISATION STUDY ON CAUSALITY BETWEEN ATTENTION-DEFICIT/HYPERACTIVITY DISORDER AND MULTIPLE OBESITY-RELATED TRAITS, 27th World Congress of Psychiatric Genetics (WCPG), Publisher: ELSEVIER, Pages: S114-S115, ISSN: 0924-977X
Riddle-Workman E, Evangelou M, Adams N, 2018, Adaptive anomaly detection on network data streams, IEEE Conference on Intelligence and Security Informatics (ISI) 2018, Publisher: IEEE
As the number of cyber-attacks increases, there hasbeen increasing emphasis on developing complementary methodsof detection to the existing signature-based approaches. This workbuilds upon a previously discovered persistent structure withinthe Los Alamos National Laboratory network data sources,to develop a regression based streaming anomaly detectionmechanism that can adapt to the network behaviour over time.The methodology has also been applied to a new data set of thesame network to assess the extent of its pertinence in time.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.