Publications

Mustafa R, Ghanbari M, Karhunen V, Evangelou M, Dehghan Aet al., 2023, Phenome-wide association study on miRNA-related sequence variants: the UK Biobank, Human Genomics, Vol: 17, ISSN: 1479-7364

Background:Genetic variants in the coding region could directly affect the structure and expression levels of genes and proteins. However, the importance of variants in the non-coding region, such as microRNAs (miRNAs), remain to be elucidated. Genetic variants in miRNA-related sequences could affect their biogenesis or functionality and ultimately affect disease risk. Yet, their implications and pleiotropic effects on many clinical conditions remain unknown.Methods:Here, we utilised genotyping and hospital records data in the UK Biobank (N = 423,419) to investigate associations between 346 genetic variants in miRNA-related sequences and a wide range of clinical diagnoses through phenome-wide association studies. Further, we tested whether changes in blood miRNA expression levels could affect disease risk through colocalisation and Mendelian randomisation analysis.Results:We identified 122 associations for six variants in the seed region of miRNAs, nine variants in the mature region of miRNAs, and 27 variants in the precursor miRNAs. These included associations with hypertension, dyslipidaemia, immune-related disorders, and others. Nineteen miRNAs were associated with multiple diagnoses, with six of them associated with multiple disease categories. The strongest association was reported between rs4285314 in the precursor of miR-3135b and celiac disease risk (odds ratio (OR) per effect allele increase = 0.37, P = 1.8 × 10–162). Colocalisation and Mendelian randomisation analysis highlighted potential causal role of miR-6891-3p in dyslipidaemia.Conclusions:Our study demonstrates the pleiotropic effect of miRNAs and offers insights to their possible clinical importance.

Journal article

Aglago EK, Kim A, Lin Y, Qu C, Evangelou M, Ren Y, Morrison J, Albanes D, Arndt V, Barry EL, Baurley JW, Berndt SI, Bien SA, Bishop DT, Bouras E, Brenner H, Buchanan DD, Budiarto A, Carreras-Torres R, Casey G, Cenggoro TW, Chan AT, Chang-Claude J, Chen X, Conti DV, Devall M, Diez-Obrero V, Dimou N, Drew D, Figueiredo JC, Gallinger S, Giles GG, Gruber SB, Gsur A, Gunter MJ, Hampel H, Harlid S, Hidaka A, Harrison TA, Hoffmeister M, Huyghe JR, Jenkins MA, Jordahl K, Joshi AD, Kawaguchi ES, Keku TO, Kundaje A, Larsson SC, Marchand LL, Lewinger JP, Li L, Lynch BM, Mahesworo B, Mandic M, Obón-Santacana M, Moreno V, Murphy N, Nan H, Nassir R, Newcomb PA, Ogino S, Ose J, Pai RK, Palmer JR, Papadimitriou N, Pardamean B, Peoples AR, Platz EA, Potter JD, Prentice RL, Rennert G, Ruiz-Narvaez E, Sakoda LC, Scacheri PC, Schmit SL, Schoen RE, Shcherbina A, Slattery ML, Stern MC, Su Y-R, Tangen CM, Thibodeau SN, Thomas DC, Tian Y, Ulrich CM, van Duijnhoven FJB, Van Guelpen B, Visvanathan K, Vodicka P, Wang J, White E, Wolk A, Woods MO, Wu AH, Zemlianskaia N, Hsu L, Gauderman WJ, Peters U, Tsilidis KK, Campbell PTet al., 2023, Data from A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk

<jats:p><div>AbstractColorectal cancer risk can be impacted by genetic, environmental, and lifestyle factors, including diet and obesity. Gene-environment interactions (G × E) can provide biological insights into the effects of obesity on colorectal cancer risk. Here, we assessed potential genome-wide G × E interactions between body mass index (BMI) and common SNPs for colorectal cancer risk using data from 36,415 colorectal cancer cases and 48,451 controls from three international colorectal cancer consortia (CCFR, CORECT, and GECCO). The G × E tests included the conventional logistic regression using multiplicative terms (one degree of freedom, 1DF test), the two-step EDGE method, and the joint 3DF test, each of which is powerful for detecting G × E interactions under specific conditions. BMI was associated with higher colorectal cancer risk. The two-step approach revealed a statistically significant G×BMI interaction located within the Formin 1/Gremlin 1 (FMN1/GREM1) gene region (rs58349661). This SNP was also identified by the 3DF test, with a suggestive statistical significance in the 1DF test. Among participants with the CC genotype of rs58349661, overweight and obesity categories were associated with higher colorectal cancer risk, whereas null associations were observed across BMI categories in those with the TT genotype. Using data from three large international consortia, this study discovered a locus in the FMN1/GREM1 gene region that interacts with BMI on the association with colorectal cancer risk. Further studies should examine the potential mechanisms through which this locus modifies the etiologic link between obesity and colorectal cancer.Significance:This gene-environment interaction analysis revealed a genetic locus in FMN1/GREM1 that interacts with body mass index in colorectal

Other

Aglago EK, Kim A, Lin Y, Qu C, Evangelou M, Ren Y, Morrison J, Albanes D, Arndt V, Barry EL, Baurley JW, Berndt SI, Bien SA, Bishop DT, Bouras E, Brenner H, Buchanan DD, Budiarto A, Carreras-Torres R, Casey G, Cenggoro TW, Chan AT, Chang-Claude J, Chen X, Conti DV, Devall M, Diez-Obrero V, Dimou N, Drew D, Figueiredo JC, Gallinger S, Giles GG, Gruber SB, Gsur A, Gunter MJ, Hampel H, Harlid S, Hidaka A, Harrison TA, Hoffmeister M, Huyghe JR, Jenkins MA, Jordahl K, Joshi AD, Kawaguchi ES, Keku TO, Kundaje A, Larsson SC, Marchand LL, Lewinger JP, Li L, Lynch BM, Mahesworo B, Mandic M, Obón-Santacana M, Moreno V, Murphy N, Nan H, Nassir R, Newcomb PA, Ogino S, Ose J, Pai RK, Palmer JR, Papadimitriou N, Pardamean B, Peoples AR, Platz EA, Potter JD, Prentice RL, Rennert G, Ruiz-Narvaez E, Sakoda LC, Scacheri PC, Schmit SL, Schoen RE, Shcherbina A, Slattery ML, Stern MC, Su Y-R, Tangen CM, Thibodeau SN, Thomas DC, Tian Y, Ulrich CM, van Duijnhoven FJB, Van Guelpen B, Visvanathan K, Vodicka P, Wang J, White E, Wolk A, Woods MO, Wu AH, Zemlianskaia N, Hsu L, Gauderman WJ, Peters U, Tsilidis KK, Campbell PTet al., 2023, Supplementary Data from A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk

<jats:p>supplementary materials</jats:p>

Other

Aglago EK, Kim A, Lin Y, Qu C, Evangelou M, Ren Y, Morrison J, Albanes D, Arndt V, Barry EL, Baurley JW, Berndt SI, Bien SA, Bishop DT, Bouras E, Brenner H, Buchanan DD, Budiarto A, Carreras-Torres R, Casey G, Cenggoro TW, Chan AT, Chang-Claude J, Chen X, Conti DV, Devall M, Diez-Obrero V, Dimou N, Drew D, Figueiredo JC, Gallinger S, Giles GG, Gruber SB, Gsur A, Gunter MJ, Hampel H, Harlid S, Hidaka A, Harrison TA, Hoffmeister M, Huyghe JR, Jenkins MA, Jordahl K, Joshi AD, Kawaguchi ES, Keku TO, Kundaje A, Larsson SC, Marchand LL, Lewinger JP, Li L, Lynch BM, Mahesworo B, Mandic M, Obón-Santacana M, Moreno V, Murphy N, Nan H, Nassir R, Newcomb PA, Ogino S, Ose J, Pai RK, Palmer JR, Papadimitriou N, Pardamean B, Peoples AR, Platz EA, Potter JD, Prentice RL, Rennert G, Ruiz-Narvaez E, Sakoda LC, Scacheri PC, Schmit SL, Schoen RE, Shcherbina A, Slattery ML, Stern MC, Su Y-R, Tangen CM, Thibodeau SN, Thomas DC, Tian Y, Ulrich CM, van Duijnhoven FJB, Van Guelpen B, Visvanathan K, Vodicka P, Wang J, White E, Wolk A, Woods MO, Wu AH, Zemlianskaia N, Hsu L, Gauderman WJ, Peters U, Tsilidis KK, Campbell PTet al., 2023, Table 1 from A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk

<jats:p>Selected characteristics of the participants.</jats:p>

Other

Aglago EK, Kim A, Lin Y, Qu C, Evangelou M, Ren Y, Morrison J, Albanes D, Arndt V, Barry EL, Baurley JW, Berndt SI, Bien SA, Bishop DT, Bouras E, Brenner H, Buchanan DD, Budiarto A, Carreras-Torres R, Casey G, Cenggoro TW, Chan AT, Chang-Claude J, Chen X, Conti DV, Devall M, Diez-Obrero V, Dimou N, Drew D, Figueiredo JC, Gallinger S, Giles GG, Gruber SB, Gsur A, Gunter MJ, Hampel H, Harlid S, Hidaka A, Harrison TA, Hoffmeister M, Huyghe JR, Jenkins MA, Jordahl K, Joshi AD, Kawaguchi ES, Keku TO, Kundaje A, Larsson SC, Marchand LL, Lewinger JP, Li L, Lynch BM, Mahesworo B, Mandic M, Obón-Santacana M, Moreno V, Murphy N, Nan H, Nassir R, Newcomb PA, Ogino S, Ose J, Pai RK, Palmer JR, Papadimitriou N, Pardamean B, Peoples AR, Platz EA, Potter JD, Prentice RL, Rennert G, Ruiz-Narvaez E, Sakoda LC, Scacheri PC, Schmit SL, Schoen RE, Shcherbina A, Slattery ML, Stern MC, Su Y-R, Tangen CM, Thibodeau SN, Thomas DC, Tian Y, Ulrich CM, van Duijnhoven FJB, Van Guelpen B, Visvanathan K, Vodicka P, Wang J, White E, Wolk A, Woods MO, Wu AH, Zemlianskaia N, Hsu L, Gauderman WJ, Peters U, Tsilidis KK, Campbell PTet al., 2023, Data from A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk

<jats:p><div>AbstractColorectal cancer risk can be impacted by genetic, environmental, and lifestyle factors, including diet and obesity. Gene-environment interactions (G × E) can provide biological insights into the effects of obesity on colorectal cancer risk. Here, we assessed potential genome-wide G × E interactions between body mass index (BMI) and common SNPs for colorectal cancer risk using data from 36,415 colorectal cancer cases and 48,451 controls from three international colorectal cancer consortia (CCFR, CORECT, and GECCO). The G × E tests included the conventional logistic regression using multiplicative terms (one degree of freedom, 1DF test), the two-step EDGE method, and the joint 3DF test, each of which is powerful for detecting G × E interactions under specific conditions. BMI was associated with higher colorectal cancer risk. The two-step approach revealed a statistically significant G×BMI interaction located within the Formin 1/Gremlin 1 (FMN1/GREM1) gene region (rs58349661). This SNP was also identified by the 3DF test, with a suggestive statistical significance in the 1DF test. Among participants with the CC genotype of rs58349661, overweight and obesity categories were associated with higher colorectal cancer risk, whereas null associations were observed across BMI categories in those with the TT genotype. Using data from three large international consortia, this study discovered a locus in the FMN1/GREM1 gene region that interacts with BMI on the association with colorectal cancer risk. Further studies should examine the potential mechanisms through which this locus modifies the etiologic link between obesity and colorectal cancer.Significance:This gene-environment interaction analysis revealed a genetic locus in FMN1/GREM1 that interacts with body mass index in colorectal

Other

Aglago EK, Kim A, Lin Y, Qu C, Evangelou M, Ren Y, Morrison J, Albanes D, Arndt V, Barry EL, Baurley JW, Berndt SI, Bien SA, Bishop DT, Bouras E, Brenner H, Buchanan DD, Budiarto A, Carreras-Torres R, Casey G, Cenggoro TW, Chan AT, Chang-Claude J, Chen X, Conti DV, Devall M, Diez-Obrero V, Dimou N, Drew D, Figueiredo JC, Gallinger S, Giles GG, Gruber SB, Gsur A, Gunter MJ, Hampel H, Harlid S, Hidaka A, Harrison TA, Hoffmeister M, Huyghe JR, Jenkins MA, Jordahl K, Joshi AD, Kawaguchi ES, Keku TO, Kundaje A, Larsson SC, Marchand LL, Lewinger JP, Li L, Lynch BM, Mahesworo B, Mandic M, Obón-Santacana M, Moreno V, Murphy N, Nan H, Nassir R, Newcomb PA, Ogino S, Ose J, Pai RK, Palmer JR, Papadimitriou N, Pardamean B, Peoples AR, Platz EA, Potter JD, Prentice RL, Rennert G, Ruiz-Narvaez E, Sakoda LC, Scacheri PC, Schmit SL, Schoen RE, Shcherbina A, Slattery ML, Stern MC, Su Y-R, Tangen CM, Thibodeau SN, Thomas DC, Tian Y, Ulrich CM, van Duijnhoven FJB, Van Guelpen B, Visvanathan K, Vodicka P, Wang J, White E, Wolk A, Woods MO, Wu AH, Zemlianskaia N, Hsu L, Gauderman WJ, Peters U, Tsilidis KK, Campbell PTet al., 2023, Table 2 from A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk

<jats:p>Summary of G × BMI analyses using 1DF, two-step, and 3DF analyses.</jats:p>

Other

Aglago EK, Kim A, Lin Y, Qu C, Evangelou M, Ren Y, Morrison J, Albanes D, Arndt V, Barry EL, Baurley JW, Berndt SI, Bien SA, Bishop DT, Bouras E, Brenner H, Buchanan DD, Budiarto A, Carreras-Torres R, Casey G, Cenggoro TW, Chan AT, Chang-Claude J, Chen X, Conti DV, Devall M, Diez-Obrero V, Dimou N, Drew D, Figueiredo JC, Gallinger S, Giles GG, Gruber SB, Gsur A, Gunter MJ, Hampel H, Harlid S, Hidaka A, Harrison TA, Hoffmeister M, Huyghe JR, Jenkins MA, Jordahl K, Joshi AD, Kawaguchi ES, Keku TO, Kundaje A, Larsson SC, Marchand LL, Lewinger JP, Li L, Lynch BM, Mahesworo B, Mandic M, Obón-Santacana M, Moreno V, Murphy N, Nan H, Nassir R, Newcomb PA, Ogino S, Ose J, Pai RK, Palmer JR, Papadimitriou N, Pardamean B, Peoples AR, Platz EA, Potter JD, Prentice RL, Rennert G, Ruiz-Narvaez E, Sakoda LC, Scacheri PC, Schmit SL, Schoen RE, Shcherbina A, Slattery ML, Stern MC, Su Y-R, Tangen CM, Thibodeau SN, Thomas DC, Tian Y, Ulrich CM, van Duijnhoven FJB, Van Guelpen B, Visvanathan K, Vodicka P, Wang J, White E, Wolk A, Woods MO, Wu AH, Zemlianskaia N, Hsu L, Gauderman WJ, Peters U, Tsilidis KK, Campbell PTet al., 2023, Supplementary Data from A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk

<jats:p>supplementary materials</jats:p>

Other

Aglago EK, Kim A, Lin Y, Qu C, Evangelou M, Ren Y, Morrison J, Albanes D, Arndt V, Barry EL, Baurley JW, Berndt S, Bien SA, Bishop DT, Bouras E, Brenner H, Buchanan DD, Budiarto A, Carreras-Torres R, Casey G, Cenggoro TW, Chen AT, Chang-Claude J, Chen X, Conti D, Devall M, Diez-Obrero V, Dimou N, Drew D, Figueiredo JC, Gallinger S, Giles GG, Gruber SB, Gsur A, Gunter MJ, Hampel H, Harlid S, Hidaka A, Harrison TA, Hoffmeister M, Huyghe JR, Jenkins MA, Jordahl K, Joshi AD, Kawaguchi ES, Keku TO, Kundaje A, Larsson SC, Le Marchand L, Lewinger JP, Li L, Lynch BM, Mahesworo B, Mandic M, Obon-Santacana M, Morento V, Murphy N, Men H, Nassir R, Newcomb PA, Ogino S, Ose J, Pai RK, Palmer JR, Papadimitriou N, Pardamean B, Peoples AR, Platz EA, Potter JD, Prentice RL, Rennert G, Ruiz-Narvaez E, Sakoda LC, Scacheri PC, Schmit SL, Schoen RE, Shcherbina A, Slattery ML, Stern MC, Su Y-R, Tangen CM, Thibodeau SN, Thomas DC, Tian Y, Ulrich CM, van Duijnhoven FJB, Van Guelpen B, Visvanathan K, Vodicka P, Wang J, White E, Wolk A, Woods MO, Wu AH, Zemlianskaia N, Hsu L, Gauderman WJ, Peters U, Tsilidis KK, Campbell PTet al., 2023, A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk, CANCER RESEARCH, Vol: 83, Pages: 2572-2583, ISSN: 0008-5472

Journal article

Aglago EK, Kim A, Lin Y, Qu C, Evangelou M, Ren Y, Morrison J, Albanes D, Arndt V, Barry EL, Baurley JW, Berndt SI, Bien SA, Bishop DT, Bouras E, Brenner H, Buchanan DD, Budiarto A, Carreras-Torres R, Casey G, Cenggoro TW, Chan AT, Chang-Claude J, Chen X, Conti DV, Devall M, Diez-Obrero V, Dimou N, Drew D, Figueiredo JC, Gallinger S, Giles GG, Gruber SB, Gsur A, Gunter MJ, Hampel H, Harlid S, Hidaka A, Harrison TA, Hoffmeister M, Huyghe JR, Jenkins MA, Jordahl K, Joshi AD, Kawaguchi ES, Keku TO, Kundaje A, Larsson SC, Marchand LL, Lewinger JP, Li L, Lynch BM, Mahesworo B, Mandic M, Obón-Santacana M, Moreno V, Murphy N, Nan H, Nassir R, Newcomb PA, Ogino S, Ose J, Pai RK, Palmer JR, Papadimitriou N, Pardamean B, Peoples AR, Platz EA, Potter JD, Prentice RL, Rennert G, Ruiz-Narvaez E, Sakoda LC, Scacheri PC, Schmit SL, Schoen RE, Shcherbina A, Slattery ML, Stern MC, Su Y-R, Tangen CM, Thibodeau SN, Thomas DC, Tian Y, Ulrich CM, van Duijnhoven FJB, Van Guelpen B, Visvanathan K, Vodicka P, Wang J, White E, Wolk A, Woods MO, Wu AH, Zemlianskaia N, Hsu L, Gauderman WJ, Peters U, Tsilidis KK, Campbell PTet al., 2023, Table 1 from A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk

<jats:p>Selected characteristics of the participants.</jats:p>

Other

Aglago EK, Kim A, Lin Y, Qu C, Evangelou M, Ren Y, Morrison J, Albanes D, Arndt V, Barry EL, Baurley JW, Berndt SI, Bien SA, Bishop DT, Bouras E, Brenner H, Buchanan DD, Budiarto A, Carreras-Torres R, Casey G, Cenggoro TW, Chan AT, Chang-Claude J, Chen X, Conti DV, Devall M, Diez-Obrero V, Dimou N, Drew D, Figueiredo JC, Gallinger S, Giles GG, Gruber SB, Gsur A, Gunter MJ, Hampel H, Harlid S, Hidaka A, Harrison TA, Hoffmeister M, Huyghe JR, Jenkins MA, Jordahl K, Joshi AD, Kawaguchi ES, Keku TO, Kundaje A, Larsson SC, Marchand LL, Lewinger JP, Li L, Lynch BM, Mahesworo B, Mandic M, Obón-Santacana M, Moreno V, Murphy N, Nan H, Nassir R, Newcomb PA, Ogino S, Ose J, Pai RK, Palmer JR, Papadimitriou N, Pardamean B, Peoples AR, Platz EA, Potter JD, Prentice RL, Rennert G, Ruiz-Narvaez E, Sakoda LC, Scacheri PC, Schmit SL, Schoen RE, Shcherbina A, Slattery ML, Stern MC, Su Y-R, Tangen CM, Thibodeau SN, Thomas DC, Tian Y, Ulrich CM, van Duijnhoven FJB, Van Guelpen B, Visvanathan K, Vodicka P, Wang J, White E, Wolk A, Woods MO, Wu AH, Zemlianskaia N, Hsu L, Gauderman WJ, Peters U, Tsilidis KK, Campbell PTet al., 2023, Table 2 from A Genetic Locus within the FMN1/GREM1 Gene Region Interacts with Body Mass Index in Colorectal Cancer Risk

<jats:p>Summary of G × BMI analyses using 1DF, two-step, and 3DF analyses.</jats:p>

Other

Sanna Passino F, Adams N, Cohen E, Evangelou M, Heard NAet al., 2023, Statistical cybersecurity: a brief discussion of challenges, data structures, and future directions, Harvard Data Science Review, Vol: 5, Pages: 1-10, ISSN: 2644-2353

Journal article

Komodromos M, Aboagye EO, Evangelou M, Filippi S, Ray Ket al., 2022, Variational Bayes for high-dimensional proportional hazards models with applications within gene expression, BIOINFORMATICS, Vol: 38, Pages: 3918-3926, ISSN: 1367-4803

Motivation:Few Bayesian methods for analyzing high-dimensional sparse survival data provide scalable variable selection, effect estimation and uncertainty quantification. Such methods often either sacrifice uncertainty quantification by computing maximum a posteriori estimates, or quantify the uncertainty at high (unscalable) computational expense.Results:We bridge this gap and develop an interpretable and scalable Bayesian proportional hazards model for prediction and variable selection, referred to as SVB. Our method, based on a mean-field variational approximation, overcomes the high computational cost of MCMC whilst retaining useful features, providing a posterior distribution for the parameters and offering a natural mechanism for variable selection via posterior inclusion probabilities. The performance of our proposed method is assessed via extensive simulations and compared against other state-of-the-art Bayesian variable selection methods, demonstrating comparable or better performance. Finally, we demonstrate how the proposed method can be used for variable selection on two transcriptomic datasets with censored survival outcomes, and how the uncertainty quantification offered by our method can be used to provide an interpretable assessment of patient risk.Availability and implementation:our method has been implemented as a freely available R package survival.svb (https://github.com/mkomod/survival.svb).

Journal article

Evangelou M, Rodosthenous T, Shahrezaei V, 2021, Semi-Supervised Classification and Visualization of Multi-View Data, JSM 2021 - Section on Statistical Learning and Data Science

Cite

Journal article

Rodosthenous T, Shahrezaei V, Evangelou M, 2021, S-multi-SNE: Semi-supervised classification and visualisation of multi-view data, Publisher: arXiv

An increasing number of multi-view data are being published by studies in several fields. This type of data corresponds to multiple data-views, each representing a different aspect of the same set of samples. We have recently proposed multi-SNE, an extension of t-SNE, that produces a single visualisation of multi-view data. The multi-SNE approach provides low-dimensional embeddings of the samples, produced by being updated iteratively through the different data-views. Here, we further extend multi-SNE to a semi-supervised approach, that classifies unlabelled samples by regarding the labelling information as an extra data-view. We look deeper into the performance, limitations and strengths of multi-SNE and its extension, S-multi-SNE, by applying the two methods on various multi-view datasets with different challenges. We show that by including the labelling information, the projection of the samples improves drastically and it is accompanied by a strong classification performance.

Working paper

Rodosthenous T, Shahrezaei V, Evangelou M, 2021, S-multi-SNE: Semi-supervised classification and visualisation of multi-view data

An increasing number of multi-view data are being published by studies in several fields. This type of data corresponds to multiple data-views, each representing a different aspect of the same set of samples. We have recently proposed multi-SNE, an extension of t-SNE, that produces a single visualisation of multi-view data. The multi-SNE approach provides low-dimensional embeddings of the samples, produced by being updated iteratively through the different data-views. Here, we further extend multi-SNE to a semi-supervised approach, that classifies unlabelled samples by regarding the labelling information as an extra data-view. We look deeper into the performance, limitations and strengths of multi-SNE and its extension, S-multi-SNE, by applying the two methods on various multi-view datasets with different challenges. We show that by including the labelling information, the projection of the samples improves drastically and it is accompanied by a strong classification performance.

Abstract
Cite

Working paper

van Vliet NA, Bos MM, Thesing CS, Chaker L, Pietzner M, Houtman E, Neville MJ, Li-Gao R, Trompet S, Mustafa R, Ahmadizar F, Beekman M, Bot M, Budde K, Christodoulides C, Dehghan A, Delles C, Elliott P, Evangelou M, Gao H, Ghanbari M, van Herwaarden AE, Ikram MA, Jaeger M, Jukema JW, Karaman I, Karpe F, Kloppenburg M, Meessen JMTA, Meulenbelt I, Milaneschi Y, Mooijaart SP, Mook-Kanamori DO, Netea MG, Netea-Maier RT, Peeters RP, Penninx BWJH, Sattar N, Slagboom PE, Suchiman HED, Volzke H, Willems van Dijk K, Noordam R, van Heemst Det al., 2021, Higher thyrotropin leads to unfavorable lipid profile and somewhat higher cardiovascular disease risk: evidence from multi-cohort Mendelian randomization and metabolomic profiling, BMC Medicine, Vol: 19, Pages: 1-13, ISSN: 1741-7015

BackgroundObservational studies suggest interconnections between thyroid status, metabolism, and risk of coronary artery disease (CAD), but causality remains to be proven. The present study aimed to investigate the potential causal relationship between thyroid status and cardiovascular disease and to characterize the metabolomic profile associated with thyroid status.MethodsMulti-cohort two-sample Mendelian randomization (MR) was performed utilizing genome-wide significant variants as instruments for standardized thyrotropin (TSH) and free thyroxine (fT4) within the reference range. Associations between TSH and fT4 and metabolic profile were investigated in a two-stage manner: associations between TSH and fT4 and the full panel of 161 metabolomic markers were first assessed hypothesis-free, then directional consistency was assessed through Mendelian randomization, another metabolic profile platform, and in individuals with biochemically defined thyroid dysfunction.ResultsCirculating TSH was associated with 52/161 metabolomic markers, and fT4 levels were associated with 21/161 metabolomic markers among 9432 euthyroid individuals (median age varied from 23.0 to 75.4 years, 54.5% women). Positive associations between circulating TSH levels and concentrations of very low-density lipoprotein subclasses and components, triglycerides, and triglyceride content of lipoproteins were directionally consistent across the multivariable regression, MR, metabolomic platforms, and for individuals with hypo- and hyperthyroidism. Associations with fT4 levels inversely reflected those observed with TSH. Among 91,810 CAD cases and 656,091 controls of European ancestry, per 1-SD increase of genetically determined TSH concentration risk of CAD increased slightly, but not significantly, with an OR of 1.03 (95% CI 0.99–1.07; p value 0.16), whereas higher genetically determined fT4 levels were not associated with CAD risk (OR 1.00 per SD increase of fT4; 95% CI 0.96–1.04;

Journal article

Mustafa R, Mens MMJ, Huang J, Roshchupkin G, Uitterlinden AG, Ikram MA, Evangelou M, Ghanbari M, Dehghan Aet al., 2021, Associations of Circulatory MicroRNAs and Clinical Traits: A Phenome-wide Mendelian Randomization Analysis, Publisher: WILEY, Pages: 777-778, ISSN: 0741-0395

Conference paper

Rodriguez A, 2021, The link between Attention Deficit Hyperactivity Disorder (ADHD) symptoms and obesity-related traits: Genetic and prenatal explanations, Translational Psychiatry, Vol: 11, Pages: 1-8, ISSN: 2158-3188

Attention-deficit/hyperactivity disorder (ADHD) often co-occurs with obesity, however the potential causality between the traits remains unclear. We examined both genetic and prenatal evidence for causality using Mendelian Randomisation (MR) and polygenic risk scores (PRS). We conducted bi-directional MR on ADHD liability and six obesity-related traits using summary statistics from the largest available meta-analyses of genome-wide association studies. We also examined the shared genetic aetiology between ADHD symptoms (inattention and hyperactivity) and body mass index (BMI) by PRS association analysis using longitudinal data from Northern Finland Birth Cohort 1986 (NFBC1986, n = 2984). Lastly, we examined the impact of prenatal environment by association analysis of maternal pre-pregnancy BMI and offspring ADHD symptoms, adjusted for PRS of both traits, in NFBC1986 dataset. Through MR analyses, we found evidence for bidirectional causality between ADHD liability and obesity-related traits. PRS association analyses showed evidence for genetic overlap between ADHD symptoms and BMI. We found no evidence for a difference between inattention and hyperactivity symptoms, suggesting that neither symptom subtype is driving the association. We found evidence for association between maternal pre-pregnancy BMI and offspring ADHD symptoms after adjusting for both BMI and ADHD PRS (association p-value = 0.027 for inattention, p = 0.008 for hyperactivity). These results are consistent with the hypothesis that the co-occurrence between ADHD and obesity has both genetic and prenatal environmental origins.

Journal article

Adams N, Riddle-Workman E, Evangelou M, 2021, Multi-Type relational clustering for enterprise cyber-security networks, Pattern Recognition Letters, Vol: 149, Pages: 172-178, ISSN: 0167-8655

Several cyber-security data sources are collected in enterprise networks providing relational information between different types of nodes in the network, namely computers, users and ports. This relational data can be expressed as adjacency matrices detailing inter-type relationships corresponding to relations between nodes of different types and intra-type relationships showing relationships between nodes of the same type. In this paper, we propose an extension of Non-Negative Matrix Tri-Factorisation (NMTF) to simultaneously cluster nodes based on their intra and inter-type relationships. Existing NMTF based clustering methods suffer from long computational times due to large matrix multiplications. In our approach, we enforce stricter cluster indicator constraints on the factor matrices to circumvent these issues. Additionally, to make our proposed approach less susceptible to variation in results due to random initialisation, we propose a novel initialisation procedure based on Non-Negative Double Singular Value Decomposition for multi-type relational clustering. Finally, a new performance measure suitable for assessing clustering performance on unlabelled multi-type relational data sets is presented. Our algorithm is assessed on both a simulated and real computer network against standard approaches showing its strong performance.

Journal article

Rodosthenous T, Shahrezaei V, Evangelou M, 2021, Semi-supervised classification and visualisation of multi-view data, Joint Statistics Meeting (JSM) 2021, Publisher: American Statistical Association

An increasing number of multi-view data are being published by studies in several fields. This type of data corresponds to multiple data-views, each representing a different aspect of the same set of samples. We have recently proposed multi-SNE, an extension of t-SNE, that produces a single visualisation of multi-view data. The multi-SNE approach provides low-dimensional embeddings of the samples, produced by being updated iteratively through the different data-views. Here, we further extend multi-SNE to a semi-supervised approach, that classifies unlabelled samples byregarding the labelling information as an extra data-view. We look deeper into the performance, limitations and strengths of multi-SNE and its extension, S-multi-SNE, by applying the two methods on various multi-view datasets with different challenges. We show that by including the labelling information, the projection of the samples improves drastically and it is accompanied by a strong classification performance.

Conference paper

Van Vliet NA, Bos MM, Thesing CS, Chaker L, Pietzner M, Houtman E, Neville MJ, Li-Gao R, Trompet S, Mustafa R, Ahmadizar F, Beekman M, Bot M, Budde K, Christodoulides C, Dehghan A, Delles C, Elliott P, Evangelou M, Gao H, Ghanbari M, Van Herwaarden AE, Ikram MA, Jaeger M, Jukema JW, Karaman I, Karpe F, Kloppenburg M, Meessen JMTA, Meulenbelt I, Milaneschi Y, Mooijaart SP, Mook-Kanamori DO, Netea MG, Netea-Maier RT, Peeters RP, Penninx BWJH, Sattar N, Slagboom PE, Suchiman HED, Volzke H, Van Dijk KW, Noordam Ret al., 2021, HIGHER THYROID STIMULATING HORMONE LEADS TO CARDIOVASCULAR DISEASE AND AN UNFAVORABLE LIPID PROFILE: EVIDENCE FROM MULTI-COHORT MENDELIAN RANDOMIZATION AND METABOLOMIC PROFILING, Publisher: ELSEVIER IRELAND LTD, Pages: E40-E40, ISSN: 0021-9150

Conference paper

Frainay C, Pitarch Y, Filippi S, Evangelou M, Custovic Aet al., 2021, Atopic dermatitis or eczema? Consequences of ambiguity in disease name for biomedical literature mining, Clinical and Experimental Allergy, Vol: 51, Pages: 1185-1194, ISSN: 0954-7894

BackgroundBiomedical research increasingly relies on computational approaches to extract relevant information from large corpora of publications.ObjectiveTo investigate the consequence of the ambiguity between the use of terms “Eczema” and “Atopic Dermatitis” (AD) from the Information Retrieval perspective, and its impact on meta-analyses, systematic reviews and text mining.MethodsArticles were retrieved by querying the PubMed using terms ‘eczema’ (D003876) and “dermatitis, atopic” (D004485). We used machine learning to investigate the differences between the contexts in which each term is used. We used a decision tree approach and trained model to predict if an article would be indexed with eczema or AD tags. We used text-mining tools to extract biological entities associated with eczema and AD, and investigated the discrepancy regarding the retrieval of key findings according to the terminology used.ResultsAtopic dermatitis query yielded more articles related to veterinary science, biochemistry, cellular and molecular biology; the eczema query linked to public health, infectious disease and respiratory system. Medical Subject Headings terms associated with “AD” or “Eczema” differed, with an agreement between the top 40 lists of 52%. The presence of terms related to cellular mechanisms, especially allergies and inflammation, characterized AD literature. The metabolites mentioned more frequently than expected in articles with AD tag differed from those indexed with eczema. Fewer enriched genes were retrieved when using eczema compared to AD query.Conclusions and Clinical RelevanceThere is a considerable discrepancy when using text mining to extract bio-entities related to eczema or AD. Our results suggest that any systematic approach (particularly when looking for metabolites or genes related to the condition) should be performed using both terms jointly. We propose to use decision tree learning

Journal article

Rodosthenous T, Shahrezaei V, Evangelou M, 2021, Multi-view Data Visualisation via Manifold Learning, Publisher: arXiv

Working paper

Mustafa R, Mens M, Pinto R, Karaman I, Roshchupkin G, Huang J, Elliott P, Evangelou M, Dehghan A, Ghanbari Met al., 2020, Identifying metabolomic fingerprints of microRNAs in cardiovascular disorders, Publisher: SPRINGERNATURE, Pages: 277-277, ISSN: 1018-4813

Conference paper

Evangelou M, Adams N, 2020, An anomaly detection framework for cyber-security data, Computers and Security, Vol: 97, Pages: 1-10, ISSN: 0167-4048

Data-driven anomaly detection systems unrivalled potential as complementary defence systems to existing signature-based tools as the number of cyber attacks increases. In this manuscript an anomaly detection system is presented that detects any abnormal deviations from the normal behaviour of an individual device. Device behaviour is defined as the number of network traffic events involving the device of interest observed within a pre-specified time period. The behaviour of each device at normal state is modelled to depend on its observed historic behaviour. A number of statistical and machine learning approaches are explored for modelling this relationship and through a comparative study, the Quantile Regression Forests approach is found to have the best predictive power. Based on the prediction intervals of the Quantile Regression Forests an anomaly detection system is proposed that characterises as abnormal, any observed behaviour outside of these intervals. A series of experiments for contaminating normal device behaviour are presented for examining the performance of the anomaly detection system. Through the conducted analysis the proposed anomaly detection system is found to outperform two other detection systems. The presented work has been conducted on two enterprise networks.

Journal article

Mustafa R, Mens M, Pinto RJ, Karaman I, Roshchupkin G, Huang J, Elliot P, Evangelou M, Dehghan A, Ghanbari Met al., 2020, Metabolomic signatures of microRNAs in cardiovascular traits: A Mendelian randomization analysis, Annual Meeting of the International-Genetic-Epidemiology-Society, Publisher: WILEY, Pages: 506-506, ISSN: 0741-0395

Conference paper

Lucotte EA, Sugier P-E, Deleuze J-F, Ostroumova E, Boutron M-C, de Vathaire F, Guenel P, Liquet B, Evangelou M, Truong Tet al., 2020, Analysis of the pleiotropy between breast cancer and thyroid cancer, Annual Meeting of the International-Genetic-Epidemiology-Society, Publisher: WILEY, Pages: 504-504, ISSN: 0741-0395

Conference paper

Rodosthenous T, Shahrezaei V, Evangelou M, 2020, Integrating multi-OMICS data through sparse Canonical Correlation Analysis for the prediction of complex traits: A comparison study, Bioinformatics, Vol: 36, Pages: 4616-4625, ISSN: 1367-4803

MotivationRecent developments in technology have enabled researchers to collect multiple OMICS datasets for the same individuals. The conventional approach for understanding the relationships between the collected datasets and the complex trait of interest would be through the analysis of each OMIC dataset separately from the rest, or to test for associations between the OMICS datasets. In this work we show that integrating multiple OMICS datasets together, instead of analysing them separately, improves our understanding of their in-between relationships as well as the predictive accuracy for the tested trait. Several approaches have been proposed for the integration of heterogeneous and high-dimensional (p ≫ n) data, such as OMICS. The sparse variant of Canonical Correlation Analysis (CCA) approach is a promising one that seeks to penalise the canonical variables for producing sparse latent variables while achieving maximal correlation between the datasets. Over the last years, a number of approaches for implementing sparse CCA (sCCA) have been proposed, where they differ on their objective functions, iterative algorithm for obtaining the sparse latent variables and make different assumptions about the original datasets.ResultsThrough a comparative study we have explored the performance of the conventional CCA proposed by Parkhomenko et al. (2009), penalised matrix decomposition CCA proposed by Witten and Tibshirani (2009) and its extension proposed by Suo et al. (2017). The aforementioned methods were modified to allow for different penalty functions. Although sCCA is an unsupervised learning approach for understanding of the in-between relationships, we have twisted the problem as a supervised learning one and investigated how the computed latent variables can be used for predicting complex traits. The approaches were extended to allow for multiple (more than two) datasets where the trait was included as one of the input datasets. Both ways have shown improvement

Journal article

Karhunen V, Jarvelin M-R, Evangelou M, Rodriguez Aet al., 2019, A MENDELIAN RANDOMISATION STUDY ON CAUSALITY BETWEEN ATTENTION-DEFICIT/HYPERACTIVITY DISORDER AND MULTIPLE OBESITY-RELATED TRAITS, 27th World Congress of Psychiatric Genetics (WCPG), Publisher: ELSEVIER, Pages: S114-S115, ISSN: 0924-977X

Conference paper

Riddle-Workman E, Evangelou M, Adams N, 2018, Adaptive anomaly detection on network data streams, IEEE Conference on Intelligence and Security Informatics (ISI) 2018, Publisher: IEEE

As the number of cyber-attacks increases, there hasbeen increasing emphasis on developing complementary methodsof detection to the existing signature-based approaches. This workbuilds upon a previously discovered persistent structure withinthe Los Alamos National Laboratory network data sources,to develop a regression based streaming anomaly detectionmechanism that can adapt to the network behaviour over time.The methodology has also been applied to a new data set of thesame network to assess the extent of its pertinence in time.

Conference paper

DrMarinaEvangelou

Contact

Location

Summary