Publications from our Researchers

Several of our current PhD candidates and fellow researchers at the Data Science Institute have published, or in the proccess of publishing, papers to present their research.  

Search or filter publications

Filter by type:

Filter by publication type

Filter by year:



  • Showing results for:
  • Reset all filters

Search results

  • Conference paper
    Duan J, Schlemper J, Qin C, Ouyang C, Bai W, Biffi C, Bello G, Statton B, O’Regan DP, Rueckert Det al., 2019,

    VS-Net: variable splitting network for accelerated parallel MRI reconstruction

    , International Conference on Medical Image Computing and Computer-Assisted Intervention, Publisher: Springer International Publishing, Pages: 713-722, ISSN: 0302-9743

    In this work, we propose a deep learning approach for parallel magnetic resonance imaging (MRI) reconstruction, termed a variable splitting network (VS-Net), for an efficient, high-quality reconstruction of undersampled multi-coil MR data. We formulate the generalized parallel compressed sensing reconstruction as an energy minimization problem, for which a variable splitting optimization method is derived. Based on this formulation we propose a novel, end-to-end trainable deep neural network architecture by unrolling the resulting iterative process of such variable splitting scheme. VS-Net is evaluated on complex valued multi-coil knee images for 4-fold and 6-fold acceleration factors. We show that VS-Net outperforms state-of-the-art deep learning reconstruction algorithms, in terms of reconstruction accuracy and perceptual quality. Our code is publicly available at

  • Conference paper
    Wang S, Dai C, Mo Y, Angelini E, Guo Y, Bai Wet al., 2019,

    Automatic Brain Tumour Segmentation and Biophysics-Guided Survival Prediction

    , MICCAI BraTS 2019 Challenge

    Gliomas are the most common malignant brain tumourswith intrinsicheterogeneity. Accurate segmentation of gliomas and theirsub-regions onmulti-parametric magnetic resonance images (mpMRI)is of great clinicalimportance, which defines tumour size, shape andappearance and providesabundant information for preoperative diag-nosis, treatment planning andsurvival prediction. Recent developmentson deep learning have significantlyimproved the performance of auto-mated medical image segmentation. In thispaper, we compare severalstate-of-the-art convolutional neural network modelsfor brain tumourimage segmentation. Based on the ensembled segmentation, wepresenta biophysics-guided prognostic model for patient overall survivalpredic-tion which outperforms a data-driven radiomics approach. Our methodwonthe second place of the MICCAI 2019 BraTS Challenge for theoverall survivalprediction.

  • Journal article
    Duan J, Bello G, Schlemper J, Bai W, Dawes TJW, Biffi C, Marvao AD, Doumou G, O'Regan DP, Rueckert Det al., 2019,

    Automatic 3D bi-ventricular segmentation of cardiac images by a shape-refined multi-task deep learning approach

    , IEEE Transactions on Medical Imaging, Vol: 38, Pages: 2151-2164, ISSN: 0278-0062

    Deep learning approaches have achieved state-of-the-art performance incardiac magnetic resonance (CMR) image segmentation. However, most approaches have focused on learning image intensity features for segmentation, whereas the incorporation of anatomical shape priors has received less attention. In this paper, we combine a multi-task deep learning approach with atlas propagation to develop a shape-constrained bi-ventricular segmentation pipeline for short-axis CMR volumetric images. The pipeline first employs a fully convolutional network (FCN) that learns segmentation and landmark localisation tasks simultaneously. The architecture of the proposed FCN uses a 2.5D representation, thus combining the computational advantage of 2D FCNs networks and the capability of addressing 3D spatial consistency without compromising segmentation accuracy. Moreover, the refinement step is designed to explicitly enforce a shape constraint and improve segmentation quality. This step is effective for overcoming image artefacts (e.g. due to different breath-hold positions and large slice thickness), which preclude the creation of anatomically meaningful 3D cardiac shapes. The proposed pipeline is fully automated, due to network's ability to infer landmarks, which are then used downstream in the pipeline to initialise atlas propagation. We validate the pipeline on 1831 healthy subjects and 649 subjects with pulmonary hypertension. Extensive numerical experiments on the two datasets demonstrate that our proposed method is robust and capable of producing accurate, high-resolution and anatomically smooth bi-ventricular3D models, despite the artefacts in input CMR volumes.

  • Conference paper
    Dai C, Mo Y, Angelini E, Guo Y, Bai Wet al., 2019,

    Transfer learning from partial annotations for whole brain segmentation

    , International Workshop on Medical Image Learning with Less Labels and Imperfect Data

    Brain MR image segmentation is a key task in neuroimaging studies. It is commonly conducted using standard computational tools, such as FSL, SPM, multi-atlas segmentation etc, which are often registration-based and suffer from expensive computation cost. Recently, there is an increased interest using deep neural networks for brain image segmentation, which have demonstrated advantages in both speed and performance. However, neural networks-based approaches normally require a large amount of manual annotations for optimising the massive amount of network parameters. For 3D networks used in volumetric image segmentation, this has become a particular challenge, as a 3D network consists of many more parameters compared to its 2D counterpart. Manual annotation of 3D brain images is extremely time-consuming and requires extensive involvement of trained experts. To address the challenge with limited manual annotations, here we propose a novel multi-task learning framework for brain image segmentation, which utilises a large amount of automatically generated partial annotations together with a small set of manually created full annotations for network training. Our method yields a high performance comparable to state-of-the-art methods for whole brain segmentation.

  • Conference paper
    Truong N, Sun K, Guo Y, 2019,

    Blockchain-based personal data management: from fiction to solution

    , The 18th IEEE International Symposium on Network Computing and Applications (NCA 2019), Publisher: IEEE

    The emerging blockchain technology has enabledvarious decentralised applications in a trustless environmentwithout relying on a trusted intermediary. It is expected as apromising solution to tackle sophisticated challenges on personaldata management, thanks to its advanced features such as im-mutability, decentralisation and transparency. Although certainapproaches have been proposed to address technical difficultiesin personal data management; most of them only provided pre-liminary methodological exploration. Alarmingly, when utilisingBlockchain for developing a personal data management system,fictions have occurred in existing approaches and been promul-gated in the literature. Such fictions are theoretically doable;however, by thoroughly breaking down consensus protocols andtransaction validation processes, we clarify that such existingapproaches are either impractical or highly inefficient due tothe natural limitations of the blockchain and Smart Contractstechnologies. This encourages us to propose a feasible solution inwhich such fictions are reduced by designing a novel systemarchitecture with a blockchain-based “proof of permission”protocol. We demonstrate the feasibility and efficiency of theproposed models by implementing a clinical data sharing servicebuilt on top of a public blockchain platform. We believe thatour research resolves existing ambiguity and take a step furtheron providing a practically feasible solution for decentralisedpersonal data management.

  • Conference paper
    Gadotti A, Houssiau F, Rocher L, Livshits B, de Montjoye Y-Aet al., 2019,

    When the Signal is in the Noise: Exploiting Diffix's Sticky Noise

    , 28th USENIX Security Symposium (USENIX Security '19), Publisher: USENIX

    Anonymized data is highly valuable to both businesses andresearchers. A large body of research has however shown thestrong limits of the de-identification release-and-forget model,where data is anonymized and shared. This has led to the de-velopment of privacy-preserving query-based systems. Basedon the idea of “sticky noise”, Diffix has been recently pro-posed as a novel query-based mechanism satisfying alone theEU Article 29 Working Party’s definition of anonymization.According to its authors, Diffix adds less noise to answersthan solutions based on differential privacy while allowingfor an unlimited number of queries.This paper presents a new class of noise-exploitation at-tacks, exploiting the noise added by the system to infer privateinformation about individuals in the dataset. Our first differen-tial attack uses samples extracted from Diffix in a likelihoodratio test to discriminate between two probability distributions.We show that using this attack against a synthetic best-casedataset allows us to infer private information with 89.4% ac-curacy using only 5 attributes. Our second cloning attack usesdummy conditions that conditionally strongly affect the out-put of the query depending on the value of the private attribute.Using this attack on four real-world datasets, we show thatwe can infer private attributes of at least 93% of the users inthe dataset with accuracy between 93.3% and 97.1%, issuinga median of 304 queries per user. We show how to optimizethis attack, targeting 55.4% of the users and achieving 91.7%accuracy, using a maximum of only 32 queries per user.Our attacks demonstrate that adding data-dependent noise,as done by Diffix, is not sufficient to prevent inference ofprivate attributes. We furthermore argue that Diffix alone failsto satisfy Art. 29 WP’s definition of anonymization. We con-clude by discussing how non-provable privacy-preserving systems can be combined with fundamental security principlessuch as defense-in

  • Journal article
    Rocher L, Hendrickx J, de Montjoye Y-A, 2019,

    Estimating the success of re-identifications in incomplete datasets using generative models

    , Nature Communications, Vol: 10, ISSN: 2041-1723

    While rich medical, behavioral, and socio-demographic data are key to modern data-driven research, their collection and use raise legitimate privacy concerns. Anonymizing datasets through de-identification and sampling before sharing them has been the main tool used to address those concerns. We here propose a generative copula-based method that can accurately estimate the likelihood of a specific person to be correctly re-identified, even in a heavily incomplete dataset. On 210 populations, our method obtains AUC scores for predicting individual uniqueness ranging from 0.84 to 0.97, with low false-discovery rate. Using our model, we find that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes. Our results suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model.

  • Conference paper
    Bai W, Chen C, Tarroni G, Duan J, Guitton F, Petersen SE, Guo Y, Matthews PM, Rueckert Det al., 2019,

    Self-supervised learning for cardiac MR image segmentation by anatomicalposition prediction

    , International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)

    In the recent years, convolutional neural networks have transformed the field of medical image analysis due to their capacity to learn discriminative image features for a variety of classification and regression tasks. However, successfully learning these features requires a large amount of manuallyannotated data, which is expensive to acquire and limited by the availableresources of expert image analysts. Therefore, unsupervised, weakly-supervised and self-supervised feature learning techniques receive a lot of attention, which aim to utilise the vast amount of available data, while at the same time avoid or substantially reduce the effort of manual annotation. In this paper, we propose a novel way for training a cardiac MR image segmentation network, in which features are learnt in a self-supervised manner by predicting anatomical positions. The anatomical positions serve as a supervisory signal and do not require extra manual annotation. We demonstrate that this seemingly simple task provides a strong signal for feature learning and with self-supervised learning, we achieve a high segmentation accuracy that is better than or comparable to a U-net trained from scratch, especially at a small data setting. When only five annotated subjects are available, the proposed method improves the mean Dice metric from 0.811 to 0.852 for short-axis image segmentation, compared to the baseline U-net.

  • Report
    Crémer J, de Montjoye Y-A, Schweitzer H, 2019,

    Competition policy for the digital era

    , Competition policy for the digital era, Brussels, Publisher: EU Publications
  • Conference paper
    Jain S, Bensaid E, de Montjoye Y-A, 2019,

    UNVEIL: capture and visualise WiFi data leakages

    , The Web Conference 2019, Publisher: ACM, Pages: 3550-3554

    In the past few years, numerous privacy vulnerabilities have been discovered in the WiFi standards and their implementations for mobile devices. These vulnerabilities allow an attacker to collect large amounts of data on the device user, which could be used to infer sensitive information such as religion, gender, and sexual orientation. Solutions for these vulnerabilities are often hard to design and typically require many years to be widely adopted, leaving many devices at risk.In this paper, we present UNVEIL - an interactive and extendable platform to demonstrate the consequences of these attacks. The platform performs passive and active attacks on smartphones to collect and analyze data leaked through WiFi and communicate the analysis results to users through simple and interactive visualizations.The platform currently performs two attacks. First, it captures probe requests sent by nearby devices and combines them with public WiFi location databases to generate a map of locations previously visited by the device users. Second, it creates rogue access points with SSIDs of popular public WiFis (e.g. _Heathrow WiFi, Railways WiFi) and records the resulting internet traffic. This data is then analyzed and presented in a format that highlights the privacy leakage. The platform has been designed to be easily extendable to include more attacks and to be easily deployable in public spaces. We hope that UNVEIL will help raise public awareness of privacy risks of WiFi networks.

  • Journal article
    Brinkman P, Wagener AH, Hekking P-P, Bansal AT, Maitland-van der Zee A-H, Wang Y, Weda H, Knobel HH, Vink TJ, Rattray NJ, D'Amico A, Pennazza G, Santonico M, Lefaudeux D, De Meulder B, Auffray C, Bakke PS, Caruso M, Chanez P, Chung KF, Corfield J, Dahlen S-E, Djukanovic R, Geiser T, Horvath I, Krug N, Musial J, Sun K, Riley JH, Shaw DE, Sandstrom T, Sousa AR, Montuschi P, Fowler SJ, Sterk PJet al., 2019,

    Identification and prospective stability of electronic nose (eNose)-derived inflammatory phenotypes in patients with severe asthma

    , JOURNAL OF ALLERGY AND CLINICAL IMMUNOLOGY, Vol: 143, Pages: 1811-+, ISSN: 0091-6749
  • Journal article
    Tarroni G, Oktay O, Bai W, Schuh A, Suzuki H, Passerat-Palmbach J, de Marvao A, O'Regan D, Cook S, Glocker B, Matthews P, Rueckert Det al., 2019,

    Learning-based quality control for cardiac MR images

    , IEEE Transactions on Medical Imaging, Vol: 38, Pages: 1127-1138, ISSN: 0278-0062

    The effectiveness of a cardiovascular magnetic resonance (CMR) scan depends on the ability of the operator to correctly tune the acquisition parameters to the subject being scanned and on the potential occurrence of imaging artefacts such as cardiac and respiratory motion. In the clinical practice, a quality control step is performed by visual assessment of the acquired images: however, this procedure is strongly operatordependent, cumbersome and sometimes incompatible with the time constraints in clinical settings and large-scale studies. We propose a fast, fully-automated, learning-based quality control pipeline for CMR images, specifically for short-axis image stacks. Our pipeline performs three important quality checks: 1) heart coverage estimation, 2) inter-slice motion detection, 3) image contrast estimation in the cardiac region. The pipeline uses a hybrid decision forest method - integrating both regression and structured classification models - to extract landmarks as well as probabilistic segmentation maps from both long- and short-axis images as a basis to perform the quality checks. The technique was tested on up to 3000 cases from the UK Biobank as well as on 100 cases from the UK Digital Heart Project, and validated against manual annotations and visual inspections performed by expert interpreters. The results show the capability of the proposed pipeline to correctly detect incomplete or corrupted scans (e.g. on UK Biobank, sensitivity and specificity respectively 88% and 99% for heart coverage estimation, 85% and 95% for motion detection), allowing their exclusion from the analysed dataset or the triggering of a new acquisition.

  • Journal article
    Cox DJ, Bai W, Price AN, Edwards AD, Rueckert D, Groves AMet al., 2019,

    Ventricular remodeling in preterm infants: computational cardiac magnetic resonance atlasing shows significant early remodeling of the left ventricle

    , PEDIATRIC RESEARCH, Vol: 85, Pages: 807-815, ISSN: 0031-3998
  • Journal article
    Gomez-Romero J, Fernandez-Basso CJ, Cambronero MV, Molina-Solana M, Campana JR, Ruiz MD, Martin-Bautista MJet al., 2019,

    A probabilistic algorithm for predictive control with full-complexity models in non-residential buildings

    , IEEE Access, Vol: 7, Pages: 38748-38765, ISSN: 2169-3536

    Despite the increasing capabilities of information technologies for data acquisition and processing, building energy management systems still require manual configuration and supervision to achieve optimal performance. Model predictive control (MPC) aims to leverage equipment control – particularly heating, ventilation and air conditioning (HVAC)– by using a model of the building to capture its dynamic characteristics and to predict its response to alternative control scenarios. Usually, MPC approaches are based on simplified linear models, which support faster computation but also present some limitations regarding interpretability, solution diversification and longer-term optimization. In this work, we propose a novel MPC algorithm that uses a full-complexity grey-box simulation model to optimize HVAC operation in non-residential buildings. Our system generates hundreds of candidate operation plans, typically for the next day, and evaluates them in terms of consumption and comfort by means of a parallel simulator configured according to the expected building conditions (weather, occupancy, etc.) The system has been implemented and tested in an office building in Helsinki, both in a simulated environment and in the real building, yielding energy savings around 35% during the intermediate winter season and 20% in the whole winter season with respect to the current operation of the heating equipment.

  • Journal article
    Creswell A, Bharath AA, 2019,

    Denoising adversarial autoencoders

    , IEEE Transactions on Neural Networks and Learning Systems, Vol: 30, Pages: 968-984, ISSN: 2162-2388

    Unsupervised learning is of growing interest becauseit unlocks the potential held in vast amounts of unlabelled data tolearn useful representations for inference. Autoencoders, a formof generative model, may be trained by learning to reconstructunlabelled input data from a latent representation space. Morerobust representations may be produced by an autoencoderif it learns to recover clean input samples from corruptedones. Representations may be further improved by introducingregularisation during training to shape the distribution of theencoded data in the latent space. We suggestdenoising adversarialautoencoders, which combine denoising and regularisation, shap-ing the distribution of latent space using adversarial training.We introduce a novel analysis that shows how denoising maybe incorporated into the training and sampling of adversarialautoencoders. Experiments are performed to assess the contri-butions that denoising makes to the learning of representationsfor classification and sample synthesis. Our results suggest thatautoencoders trained using a denoising criterion achieve higherclassification performance, and can synthesise samples that aremore consistent with the input data than those trained withouta corruption process.

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-t4-html.jsp Query String: id=607&limit=15&page=3&respub-action=search.html Current Millis: 1600777130524 Current Time: Tue Sep 22 13:18:50 BST 2020