Several of our current PhD candidates and fellow researchers at the Data Science Institute have published, or in the proccess of publishing, papers to present their research.  

  • Journal article
    Moriconi R, Deisenroth M, Karri S, 2020,

    High-dimensional Bayesian optimization usinglow-dimensional feature spaces

    , Machine Learning, ISSN: 0885-6125

    Bayesian optimization (BO) is a powerful approach for seeking the global optimum of expensive black-box functions and has proven successful for fine tuning hyper-parameters of machine learning models. However, BO is practically limited to optimizing 10–20 parameters. To scale BO to high dimensions, we usually make structural assumptions on the decomposition of the objective and/or exploit the intrinsic lower dimensionality of the problem, e.g. by using linear projections. We could achieve a higher compression rate with nonlinear projections, but learning these nonlinear embeddings typically requires much data. This contradicts the BO objective of a relatively small evaluation budget. To address this challenge, we propose to learn a low-dimensional feature space jointly with (a) the response surface and (b) a reconstruction mapping. Our approach allows for optimization of BO’s acquisition function in the lower-dimensional subspace, which significantly simplifies the optimization problem. We reconstruct the original parameter space from the lower-dimensional subspace for evaluating the black-box function. For meaningful exploration, we solve a constrained optimization problem.

  • Journal article
    Fernando S, AmadorDíazLópez J, Şerban O, Gómez-Romero J, Molina-Solana M, Guo Yet al., 2020,

    Towards a large-scale twitter observatory for political events

    , Future Generation Computer Systems, Vol: 110, Pages: 976-983, ISSN: 0167-739X

    Explosion in usage of social media has made its analysis a relevant topic of interest, and particularly so in the political science area. Within Data Science, no other techniques are more widely accepted and appealing than visualisation. However, with datasets growing in size, visualisation tools also require a paradigm shift to remain useful in big data contexts. This work presents our proposal for a Large-Scale Twitter Observatory that enables researchers to efficiently retrieve, analyse and visualise data from this social network to gain actionable insights and knowledge related with political events. In addition to describing the supporting technologies, we put forward a working pipeline and validate the setup with different examples.

  • Journal article
    Meyer H, Dawes T, Serrani M, Bai W, Tokarczuk P, Cai J, Simoes Monteiro de Marvao A, Henry A, Lumbers T, Gierten J, Thumberger T, Wittbrodt J, Ware J, Rueckert D, Matthews P, Prasad S, Costantino M, Cook S, Birney E, O'Regan Det al., 2020,

    Genetic and functional insights into the fractal structure of the heart

    , Nature, Vol: 584, Pages: 589-594, ISSN: 0028-0836

    The inner surfaces of the human heart are covered by a complex network of muscular strands that is thought to be a vestigeof embryonic development.1,2 The function of these trabeculae in adults and their genetic architecture are unknown. Toinvestigate this we performed a genome-wide association study using fractal analysis of trabecular morphology as animage-derived phenotype in 18,096 UK Biobank participants. We identified 16 significant loci containing genes associatedwith haemodynamic phenotypes and regulation of cytoskeletal arborisation.3,4 Using biomechanical simulations and humanobservational data, we demonstrate that trabecular morphology is an important determinant of cardiac performance. Throughgenetic association studies with cardiac disease phenotypes and Mendelian randomisation, we find a causal relationshipbetween trabecular morphology and cardiovascular disease risk. These findings suggest an unexpected role for myocardialtrabeculae in the function of the adult heart, identify conserved pathways that regulate structural complexity, and reveal theirinfluence on susceptibility to disease

  • Journal article
    Balaban G, Halliday B, Bradley P, Bai W, Nygaard S, Owen R, Hatipoglu S, Ferreira ND, Izgi C, Tayal U, Corden B, Ware J, Pennell D, Rueckert D, Plank G, Rinaldi CA, Prasad SK, Bishop Met al.,

    Late-gadolinium enhancement interface area and electrophysiological simulations predict arrhythmic events in non-ischemic dilated cardiomyopathy patients

    , JACC: Clinical Electrophysiology, ISSN: 2405-5018

    BACKGROUND: The presence of late-gadolinium enhancement (LGE) predicts life threatening ventricular arrhythmias in non-ischemic dilated cardiomyopathy (NIDCM); however, risk stratification remains imprecise. LGE shape and simulations of electrical activity may be able to provide additional prognostic information.OBJECTIVE: This study sought to investigate whether shape-based LGE metrics and simulations of reentrant electrical activity are associated with arrhythmic events in NIDCM patients.METHODS: CMR-LGE shape metrics were computed for a cohort of 156 NIDCM patients with visible LGE and tested retrospectively for an association with an arrhythmic composite end-point of sudden cardiac death and ventricular tachycardia. Computational models were created from images and used in conjunction with simulated stimulation protocols to assess the potential for reentry induction in each patient’s scar morphology. A mechanistic analysis of the simulations was carried out to explain the associations. RESULTS: During a median follow-up of 1611 [IQR 881-2341] days, 16 patients (10.3%) met the primary endpoint. In an inverse probability weighted Cox regression, the LGE-myocardial interface area (HR:1.75; 95% CI:1.24-2.47; p=0.001), number of simulated reentries (HR: 1.4; 95% CI: 1.23-1.59; p<0.01) and LGE volume (HR:1.44; 95% CI:1.07-1.94; p=0.02) were associated with arrhythmic events. Computational modeling revealed repolarisation heterogeneity and rate-dependent block of electrical wavefronts at the LGE-myocardial interface as putative arrhythmogenic mechanisms directly related to LGE interface area.CONCLUSION: The area of interface between scar and surviving myocardium, as well as simulated reentrant activity, are associated with an elevated risk of major arrhythmic events in NIDCM patients with LGE and represent novel risk predictors.

  • Conference paper
    Chen C, Qin C, Qiu H, Ouyang C, Wang S, Chen L, Tarroni G, Bai W, Rueckert Det al., 2020,

    Realistic adversarial data augmentation for MR image segmentation

    , International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)

    Neural network-based approaches can achieve high accuracy in various medicalimage segmentation tasks. However, they generally require large labelleddatasets for supervised learning. Acquiring and manually labelling a largemedical dataset is expensive and sometimes impractical due to data sharing andprivacy issues. In this work, we propose an adversarial data augmentationmethod for training neural networks for medical image segmentation. Instead ofgenerating pixel-wise adversarial attacks, our model generates plausible andrealistic signal corruptions, which models the intensity inhomogeneities causedby a common type of artefacts in MR imaging: bias field. The proposed methoddoes not rely on generative networks, and can be used as a plug-in module forgeneral segmentation networks in both supervised and semi-supervised learning.Using cardiac MR imaging we show that such an approach can improve thegeneralization ability and robustness of models as well as provide significantimprovements in low-data scenarios.

  • Journal article
    Fernando S, Scott-Brown J, Şerban O, Birch D, Akroyd D, Molina-Solana M, Heinis T, Guo Yet al., 2020,

    Open Visualization Environment (OVE): A web framework for scalable rendering of data visualizations

    , Future Generation Computer Systems, Vol: 112, Pages: 785-799, ISSN: 0167-739X

    Scalable resolution display environments, including immersive data observatories, are emerging as equitable and socially engaging platforms for collaborative data exploration and decision making. These environments require specialized middleware to drive them, but, due to various limitations, there is still a gap in frameworks capable of scalable rendering of data visualizations. To overcome these limitations, we introduce a new modular open-source middleware, the Open Visualization Environment (OVE). This framework uses web technologies to provide an ecosystem for visualizing data using web browsers that span hundreds of displays. In this paper, we discuss the key design features and architecture of our framework as well as its limitations. This is followed by an extensive study on performance and scalability, which validates its design and compares it to the popular SAGE2 middleware. We show how our framework solves three key limitations in SAGE2. Thereafter, we present two of our projects that used OVE and show how it can extend SAGE2 to overcome limitations and simplify the user experience for common data visualization use-cases.

  • Journal article
    Biffi C, Cerrolaza Martinez JJ, Tarroni G, Bai W, Simoes Monteiro de Marvao A, Oktay O, Ledig C, Le Folgoc L, Kamnitsas K, Doumou G, Duan J, Prasad S, Cook S, O'Regan D, Rueckert Det al., 2020,

    Explainable anatomical shape analysis through deep hierarchical generative models

    , IEEE Transactions on Medical Imaging, Vol: 39, Pages: 2088-2099, ISSN: 0278-0062

    Quantification of anatomical shape changes currently relies on scalar global indexes which are largely insensitive to regional or asymmetric modifications. Accurate assessment of pathology-driven anatomical remodeling is a crucial step for the diagnosis and treatment of many conditions. Deep learning approaches have recently achieved wide success in the analysis of medical images, but they lack interpretability in the feature extraction and decision processes. In this work, we propose a new interpretable deep learning model for shape analysis. In particular, we exploit deep generative networks to model a population of anatomical segmentations through a hierarchy of conditional latent variables. At the highest level of this hierarchy, a two-dimensional latent space is simultaneously optimised to discriminate distinct clinical conditions, enabling the direct visualisation of the classification space. Moreover, the anatomical variability encoded by this discriminative latent space can be visualised in the segmentation space thanks to the generative properties of the model, making the classification task transparent. This approach yielded high accuracy in the categorisation of healthy and remodelled left ventricles when tested on unseen segmentations from our own multi-centre dataset as well as in an external validation set, and on hippocampi from healthy controls and patients with Alzheimer’s disease when tested on ADNI data. More importantly, it enabled the visualisation in three-dimensions of both global and regional anatomical features which better discriminate between the conditions under exam. The proposed approach scales effectively to large populations, facilitating highthroughput analysis of normal anatomy and pathology in largescale studies of volumetric imaging.

  • Conference paper
    Wang S, Tarroni G, Qin C, Mo Y, Dai C, Chen C, Glocker B, Guo Y, Rueckert D, Bai Wet al., 2020,

    Deep generative model-based quality control for cardiac MRI segmentation

    , International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)

    In recent years, convolutional neural networks have demonstrated promisingperformance in a variety of medical image segmentation tasks. However, when atrained segmentation model is deployed into the real clinical world, the modelmay not perform optimally. A major challenge is the potential poor-qualitysegmentations generated due to degraded image quality or domain shift issues.There is a timely need to develop an automated quality control method that candetect poor segmentations and feedback to clinicians. Here we propose a noveldeep generative model-based framework for quality control of cardiac MRIsegmentation. It first learns a manifold of good-quality image-segmentationpairs using a generative model. The quality of a given test segmentation isthen assessed by evaluating the difference from its projection onto thegood-quality manifold. In particular, the projection is refined throughiterative search in the latent space. The proposed method achieves highprediction accuracy on two publicly available cardiac MRI datasets. Moreover,it shows better generalisation ability than traditional regression-basedmethods. Our approach provides a real-time and model-agnostic quality controlfor cardiac MRI segmentation, which has the potential to be integrated intoclinical image analysis workflows.

  • Journal article
    Chen C, Bai W, Davies R, Bhuva A, Manisty C, Moon J, Aung N, Lee A, Sanghvi M, Fung K, Paiva J, Petersen S, Lukaschuk E, Piechnik S, Neubauer S, Rueckert Det al.,

    Improving the generalizability of convolutional neural network-based segmentation on CMR images

    , Frontiers in Cardiovascular Medicine, ISSN: 2297-055X
  • Journal article
    Martínez V, Fernando S, Molina-Solana M, Guo Yet al., 2020,

    Tuoris: A middleware for visualizing dynamic graphics in scalable resolution display environments

    , Future Generation Computer Systems, Vol: 106, Pages: 559-571, ISSN: 0167-739X

    In the era of big data, large-scale information visualization has become an important challenge. Scalable resolution display environments (SRDEs) have emerged as a technological solution for building high-resolution display systems by tiling lower resolution screens. These systems bring serious advantages, including lower construction cost and better maintainability compared to other alternatives. However, they require specialized software but also purpose-built content to suit the inherently complex underlying systems. This creates several challenges when designing visualizations for big data, such that can be reused across several SRDEs of varying dimensions. This is not yet a common practice but is becoming increasingly popular among those who engage in collaborative visual analytics in data observatories. In this paper, we define three key requirements for systems suitable for such environments, point out limitations of existing frameworks, and introduce Tuoris, a novel open-source middleware for visualizing dynamic graphics in SRDEs. Tuoris manages the complexity of distributing and synchronizing the information among different components of the system, eliminating the need for purpose-built content. This makes it possible for users to seamlessly port existing graphical content developed using standard web technologies, and simplifies the process of developing advanced, dynamic and interactive web applications for large-scale information visualization. Tuoris is designed to work with Scalable Vector Graphics (SVG), reducing bandwidth consumption and achieving high frame rates in visualizations with dynamic animations. It scales independent of the display wall resolution and contrasts with other frameworks that transmit visual information as blocks of images.

  • Journal article
    Bhuva AN, Treibel TA, De Marvao A, Biffi C, Dawes TJW, Doumou G, Bai W, Patel K, Boubertakh R, Rueckert D, O'Regan DP, Hughes AD, Moon JC, Manisty CHet al., 2020,

    Sex and regional differences inmyocardial plasticity in aortic stenosis are revealed by 3D modelmachine learning

  • Journal article
    Jolliffe DA, Stefanidis C, Wang Z, Kermani NZ, Dimitrov V, White JH, McDonough JE, Janssens W, Pfeffer P, Griffiths CJ, Bush A, Guo Y, Christenson S, Adcock IM, Chung KF, Thummel KE, Martineau ARet al., 2020,

    Vitamin D Metabolism is Dysregulated in Asthma and Chronic Obstructive Pulmonary Disease.

    , Am J Respir Crit Care Med

    RATIONALE: Vitamin D deficiency is common in patients with asthma and COPD. Low 25-hydroxyvitamin D (25[OH]D) levels may represent a cause or a consequence of these conditions. OBJECTIVE: To determine whether vitamin D metabolism is altered in asthma or COPD. METHODS: We conducted a longitudinal study in 186 adults to determine whether the 25(OH)D response to six oral doses of 3 mg vitamin D3, administered over one year, differed between those with asthma or COPD vs. controls. Serum concentrations of vitamin D3, 25(OH)D3 and 1α,25-dihydroxyvitamin D3 (1α,25[OH]2D3) were determined pre- and post-supplementation in 93 adults with asthma, COPD or neither condition, and metabolite-to-parent compound molar ratios were compared between groups to estimate hydroxylase activity. Additionally, we analyzed fourteen datasets to compare expression of 1α,25[OH]2D3-inducible gene expression signatures in clinical samples taken from adults with asthma or COPD vs. controls. MEASUREMENTS AND MAIN RESULTS: The mean post-supplementation 25(OH)D increase in participants with asthma (20.9 nmol/L) and COPD (21.5 nmol/L) was lower than in controls (39.8 nmol/L; P=0.001). Compared with controls, patients with asthma and COPD had lower molar ratios of 25(OH)D3-to-vitamin D3 and higher molar ratios of 1α,25(OH)2D3-to-25(OH)D3 both pre- and post-supplementation (P≤0.005). Inter-group differences in 1α,25[OH]2D3-inducible gene expression signatures were modest and variable where statistically significant. CONCLUSIONS: Attenuation of the 25(OH)D response to vitamin D supplementation in asthma and COPD associated with reduced molar ratios of 25(OH)D3-to-vitamin D3 and increased molar ratios of 1α,25(OH)2D3-to-25(OH)D3 in serum, suggesting that vitamin D metabolism is dysregulated in these conditions.

  • Journal article
    Ali MK, Kim RY, Brown AC, Mayall JR, Karim R, Pinkerton JW, Liu G, Martin KL, Starkey MR, Pillar A, Donovan C, Pathinayake PS, Carroll OR, Trinder D, Tay HL, Badi YE, Kermani NZ, Guo Y-K, Aryal R, Mumby S, Pavlidis S, Adcock IM, Weaver J, Xenaki D, Oliver BG, Holliday EG, Foster PS, Wark PA, Johnstone DM, Milward EA, Hansbro PM, Horvat JCet al., 2020,

    Crucial role for lung iron level and regulation in the pathogenesis and severity of asthma.

    , European Respiratory Journal, Vol: 55, ISSN: 0903-1936

    Accumulating evidence highlights links between iron regulation and respiratory disease. Here, we assessed the relationship between iron levels and regulatory responses in clinical and experimental asthma.We show that cell-free iron levels are reduced in the bronchoalveolar lavage (BAL) supernatant of severe or mild-moderate asthma patients and correlate with lower forced expiratory volume in 1 s (FEV1). Conversely, iron-loaded cell numbers were increased in BAL in these patients and with lower FEV1/forced vital capacity (FEV1/FVC). The airway tissue expression of the iron sequestration molecules divalent metal transporter 1 (DMT1) and transferrin receptor 1 (TFR1) are increased in asthma with TFR1 expression correlating with reduced lung function and increased type 2 (T2) inflammatory responses in the airways. Furthermore, pulmonary iron levels are increased in a house dust mite (HDM)-induced model of experimental asthma in association with augmented Tfr1 expression in airway tissue, similar to human disease. We show that macrophages are the predominant source of increased Tfr1 and Tfr1+ macrophages have increased Il13 expression. We also show that increased iron levels induce increased pro-inflammatory cytokine and/or extracellular matrix (ECM) responses in human airway smooth muscle (ASM) cells and fibroblasts ex vivo and induce key features of asthma, including airway hyper-responsiveness and fibrosis and T2 inflammatory responses, in vivoTogether these complementary clinical and experimental data highlight the importance of altered pulmonary iron levels and regulation in asthma, and the need for a greater focus on the role and potential therapeutic targeting of iron in the pathogenesis and severity of disease.

  • Journal article
    Chen C, Qin C, Qiu H, Tarroni G, Duan J, Bai W, Rueckert Det al., 2020,

    Deep learning for cardiac image segmentation: A review

    , Frontiers in Cardiovascular Medicine, Vol: 7, Pages: 1-33, ISSN: 2297-055X

    Deep learning has become the most widely used approach for cardiac imagesegmentation in recent years. In this paper, we provide a review of over 100cardiac image segmentation papers using deep learning, which covers commonimaging modalities including magnetic resonance imaging (MRI), computedtomography (CT), and ultrasound (US) and major anatomical structures ofinterest (ventricles, atria and vessels). In addition, a summary of publiclyavailable cardiac image datasets and code repositories are included to providea base for encouraging reproducible research. Finally, we discuss thechallenges and limitations with current deep learning-based approaches(scarcity of labels, model generalizability across different domains,interpretability) and suggest potential directions for future research.

  • Journal article
    Ruijsink B, Puyol-Antón E, Oksuz I, Sinclair M, Bai W, Schnabel JA, Razavi R, King APet al., 2020,

    Fully automated, quality-controlled cardiac analysis from CMR: Validation and large-scale application to characterize cardiac function

    , JACC: Cardiovascular Imaging, Vol: 13, Pages: 684-695, ISSN: 1876-7591

    OBJECTIVES: This study sought to develop a fully automated framework for cardiac function analysis from cardiac magnetic resonance (CMR), including comprehensive quality control (QC) algorithms to detect erroneous output. BACKGROUND: Analysis of cine CMR imaging using deep learning (DL) algorithms could automate ventricular function assessment. However, variable image quality, variability in phenotypes of disease, and unavoidable weaknesses in training of DL algorithms currently prevent their use in clinical practice. METHODS: The framework consists of a pre-analysis DL image QC, followed by a DL algorithm for biventricular segmentation in long-axis and short-axis views, myocardial feature-tracking (FT), and a post-analysis QC to detect erroneous results. The study validated the framework in healthy subjects and cardiac patients by comparison against manual analysis (n = 100) and evaluation of the QC steps' ability to detect erroneous results (n = 700). Next, this method was used to obtain reference values for cardiac function metrics from the UK Biobank. RESULTS: Automated analysis correlated highly with manual analysis for left and right ventricular volumes (all r > 0.95), strain (circumferential r = 0.89, longitudinal r > 0.89), and filling and ejection rates (all r ≥ 0.93). There was no significant bias for cardiac volumes and filling and ejection rates, except for right ventricular end-systolic volume (bias +1.80 ml; p = 0.01). The bias for FT strain was <1.3%. The sensitivity of detection of erroneous output was 95% for volume-derived parameters and 93% for FT strain. Finally, reference values were automatically derived from 2,029 CMR exams in healthy subjects. CONCLUSIONS: The study demonstrates a DL-based framework for automated, quality-controlled characterization of cardiac function from cine CMR, without the need for direct clinician oversight.

