This is the perfect time to exploit the power of large datasets in healthcare. In the future large studies of environmental effects, genetic predispositions and epigenetic modifications will lead to precise disease modeling and risk estimate. I want to contribute to redefine treatment and preventative measures for improving the quality of life. In particular I am interested in exploring the following topics.
1. Polygenic risk scores (PRS) and their predictive power.
From recent work by myself and others we know that PRSs can capture disease risk equivalent (but complementary) to traditional risk factors and to rare monogenic mutations. The potential of PRS remains for the most part untapped.
(a) Network-based PRS. Can we define different components of the PRS that represent different molecular networks and lead to different manifestations of disease?
(b) Exposure response PRSs. Can we build reliable PRSs for intermediate traits such as metabolomics or proteomics that can summarize the physiological response to exposures? These can then be used as risk factors in people where no information is available about exposure data, but only genetic information.
2. The interplay of somatic and germline mutations with polygenic scores and environmental exposures in ageing
The characterization of somatic variants in healthy people (both in blood and in other tissues) harbors the promise of explaining causal mechanisms, not only for cancer but also for inflammatory disease (Nanki et al. 2019) and possibly other biologically important processes, such as ageing. Somatic mutations in blood are strongly correlated with advanced age and can be cancer precursors but in a non-deterministic way. In this light acquiring somatic variants can be treated as an exposure and evaluated as a risk factor for disease, in combination with germline variation. If in turn somatic variation can be correlated with environmental exposure and to which degree is still uncertain and could lead to interesting findings.
3. A knowledge graph of OMIC interactions
Statistical methods to combine OMIC datasets (e.g. the metabolome, proteome and exposome), including machine learning, are rapidly improving. I would like to explore how a knowledge graph can help with the representation of multi-omic variation in the context of disease. Furthermore, I believe complex biological questions with different types of datasets involved can benefit from the application of Knowledge Representation Learning methods to infer causal relationships. Finally, I am interesting in the application of clustering algorithms for defining disease clusters and patient sub-types in complex disease.
My PostDoctoral research was focused on (i) understanding the genetic architecture of complex traits, (ii) determining the causative variants and (iii) building genetic scores that contribute to disease risk. To this end, I used blood cell phenotypes as model traits in large cohorts of European ancestry. Haematopoiesis serves as a valuable paradigm for studying complex trait genetic architecture since blood cell indices are routinely measured and therefore available from large population-based studies and production of peripheral blood cells (red blood cells, white blood cells, platelets) is a highly regulated, hierarchical process which can be measured by isolating intermediate cell types. Furthermore, blood phenotypes contribute causally to common diseases such as auto-immune diseases, cardiovascular disease and stroke, hence understanding the genetic determinants of blood phenotypes will help understanding common diseases which are a great healthcare burden as well.
First, I explored the so-called omnigenic model, which has been recently hypothesized as follows: a set of core genes directly regulates the phenotype of interest and a larger set of peripheral genes modifies the phenotype only by regulating the core genes. By using a co-expression network in whole blood and Mendelian genes involved in blood disorders as a putative core set, I showed that this hierarchical structure can be observed in the experimental network and that Mendelian genes do have centrality characteristics, such as a larger number of connections and stronger correlation when compared to other GWAS associated genes. This is one of the first empirical examples where a data-driven approach was used to study the theoretical model hypothesized and it will be interesting in the future to explore other complex traits as well as integrate environmental exposures in the picture.
Second, I fine-mapped thousands of blood trait associations to putative causative variant sets. I then focused on thoroughly exploring the problem of gene to variant assignment. Using colocalizing expression quantitative trait loci (eQTL) as a gold standard, I found that a network-based gene assignment exploiting the network structure, performed better than the standard gene annotation (with VEP tool). Furthermore, fine-mapping results helped identify five examples of allelic series with multiple variants in the same locus associated with blood traits and common diseases such as type 1 diabetes and inflammatory bowel disease. Three of these loci contained known drug targets. Overall, this large set of conditionally independent variants informs future efforts to define allelic series to study genes of pharmacological importance.
Finally, I built polygenic scores for the analyzed traits that explain up to 28% of the phenotypic variance and are portable across European ancestry. I then showed, for the first time, that these polygenic scores for blood counts contribute to rare blood disorder risk, independently of rare causative mutations. This implies that the baseline polygenic predisposition to healthy blood phenotypes impacts disease and might lead to personalized diagnostic cut-offs, especially for those disorders that are defined as extremes of normal ranges, such as anaemia, polycythemia and thrombocytopenia.
I started my research by working on hereditary hearing loss and normal hearing function in isolated populations from Italian communities, exploring the environmental and genetic factors involved. My research led to the identification of the first two replicated candidate genes modulating normal hearing function (Vuckovic et al. 2015, HMG). I also identified a list of genes involved in age-related hearing loss, based on a small sequencing study. Some of these genes were confirmed by a subsequent GWAS in a larger cohort. After a literature curation, a selection of these genes was included in a diagnostic and research re-sequencing panel still used at the IRCCS Burlo Garofolo Hospital in Trieste, Italy (Vuckovic et al. 2018, HMG). Furthermore, I studied healthy carriers of deleterious mutations in the GJB2 gene, which is the most common gene causing congenital hearing loss, with a recessive inheritance pattern. The high frequency of deleterious variants in this gene (up to 2% in the population) indicates a possible heterozygote advantage. I showed that healthy carriers of these mutations indeed have a better gastro-intestinal health which might have provided grounds for positive selection (Vuckovic et al 2015, EJHG).
During 2014, I spent a year working as a PostDoctoral Fellow in Sidra Hospital, Qatar, where I have contributed to characterize the local population structure and the epidemiology of congenital hearing loss in the region (Girotto et al. 2014, Hum Hered).