Imperial College London


Faculty of Natural SciencesDepartment of Life Sciences

Director Centre for Bioinformatics



+44 (0)20 7594 5212m.sternberg Website




306Sir Ernst Chain BuildingSouth Kensington Campus






BibTex format

author = {Leal, Ayala LG and David, A and Jarvelin, MR and Sebert, S and Ruddock, M and Karhunen, V and Seaby, E and Hoggart, C and Sternberg, MJE},
doi = {bioinformatics/btz310},
journal = {Bioinformatics},
pages = {5182--5190},
title = {Identification of disease-associated loci using machine learning for genotype and network data integration},
url = {},
volume = {35},
year = {2019}

RIS format (EndNote, RefMan)

AB - MotivationIntegration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci.ResultsWe developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals’ ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user’s research needs.
AU - Leal,Ayala LG
AU - David,A
AU - Jarvelin,MR
AU - Sebert,S
AU - Ruddock,M
AU - Karhunen,V
AU - Seaby,E
AU - Hoggart,C
AU - Sternberg,MJE
DO - bioinformatics/btz310
EP - 5190
PY - 2019///
SN - 1367-4803
SP - 5182
TI - Identification of disease-associated loci using machine learning for genotype and network data integration
T2 - Bioinformatics
UR -
UR -
VL - 35
ER -