Imperial College London

Dr Ben Glocker

Faculty of EngineeringDepartment of Computing

Reader in Machine Learning for Imaging
 
 
 
//

Contact

 

+44 (0)20 7594 8334b.glocker Website CV

 
 
//

Location

 

377Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

225 results found

Bernhardt M, Castro DC, Tanno R, Schwaighofer A, Tezcan KC, Monteiro M, Bannur S, Lungren M, Nori A, Glocker B, Alvarez-Valle J, Oktay Oet al., 2021, Active label cleaning: Improving dataset quality under resource constraints

Imperfections in data annotation, known as label noise, are detrimental tothe training of machine learning models and have an often-overlookedconfounding effect on the assessment of model performance. Nevertheless,employing experts to remove label noise by fully re-annotating large datasetsis infeasible in resource-constrained settings, such as healthcare. This workadvocates for a data-driven approach to prioritising samples for re-annotation- which we term "active label cleaning". We propose to rank instances accordingto estimated label correctness and labelling difficulty of each sample, andintroduce a simulation framework to evaluate relabelling efficacy. Ourexperiments on natural images and on a new medical imaging benchmark show thatcleaning noisy labels mitigates their negative impact on model training,evaluation, and selection. Crucially, the proposed active label cleaningenables correcting labels up to 4 times more effectively than typical randomselection in realistic conditions, making better use of experts' valuable timefor improving dataset quality.

Working paper

Li J, Pimentel P, Szengel A, Ehlke M, Lamecker H, Zachow S, Estacio L, Doenitz C, Ramm H, Shi H, Chen X, Matzkin F, Newcombe V, Ferrante E, Jin Y, Ellis DG, Aizenberg MR, Kodym O, Spanel M, Herout A, Mainprize JG, Fishman Z, Hardisty MR, Bayat A, Shit S, Wang B, Liu Z, Eder M, Pepe A, Gsaxner C, Alves V, Zefferer U, von Campe G, Pistracher K, Schaefer U, Schmalstieg D, Menze BH, Glocker B, Egger Jet al., 2021, AutoImplant 2020-First MICCAI Challenge on Automatic Cranial Implant Design, IEEE TRANSACTIONS ON MEDICAL IMAGING, Vol: 40, Pages: 2329-2342, ISSN: 0278-0062

Journal article

Usynin D, Ziller A, Makowski M, Braren R, Rueckert D, Glocker B, Kaissis G, Passerat-Palmbach Jet al., 2021, Adversarial interference and its mitigations in privacy-preserving collaborative machine learning, Nature Machine Intelligence, Vol: 3, Pages: 749-758

Journal article

Chen X, Pawlowski N, Glocker B, Konukoglu Eet al., 2021, Normative ascent with local gaussians for unsupervised lesion detection., Med Image Anal, Vol: 74

Unsupervised abnormality detection is an appealing approach to identify patterns that are not present in training data without specific annotations for such patterns. In the medical imaging field, methods taking this approach have been proposed to detect lesions. The appeal of this approach stems from the fact that it does not require lesion-specific supervision and can potentially generalize to any sort of abnormal patterns. The principle is to train a generative model on images from healthy individuals to estimate the distribution of images of the normal anatomy, i.e., a normative distribution, and detect lesions as out-of-distribution regions. Restoration-based techniques that modify a given image by taking gradient ascent steps with respect to a posterior distribution composed of a normative distribution and a likelihood term recently yielded state-of-the-art results. However, these methods do not explicitly model ascent directions with respect to the normative distribution, i.e. normative ascent direction, which is essential for successful restoration. In this work, we introduce a novel approach for unsupervised lesion detection by modeling normative ascent directions. We present different modelling options based on the defined ascent directions with local Gaussians. We further extend the proposed method to efficiently utilize 3D information, which has not been explored in most existing works. We experimentally show that the proposed method provides higher accuracy in detection and produces more realistic restored images. The performance of the proposed method is evaluated against baselines on publicly available BRATS and ATLAS stroke lesion datasets; the detection accuracy of the proposed method surpasses the current state-of-the-art results.

Journal article

Baltatzis V, Bintsi K-M, Folgoc LL, Manzanera OEM, Ellis S, Nair A, Desai S, Glocker B, Schnabel JAet al., 2021, The pitfalls of sample selection: a case study on lung nodule classification, Predictive Intelligence in Medicine at MICCAI

Using publicly available data to determine the performance of methodologicalcontributions is important as it facilitates reproducibility and allowsscrutiny of the published results. In lung nodule classification, for example,many works report results on the publicly available LIDC dataset. In theory,this should allow a direct comparison of the performance of proposed methodsand assess the impact of individual contributions. When analyzing seven recentworks, however, we find that each employs a different data selection process,leading to largely varying total number of samples and ratios between benignand malignant cases. As each subset will have different characteristics withvarying difficulty for classification, a direct comparison between the proposedmethods is thus not always possible, nor fair. We study the particular effectof truthing when aggregating labels from multiple experts. We show thatspecific choices can have severe impact on the data distribution where it maybe possible to achieve superior performance on one sample distribution but noton another. While we show that we can further improve on the state-of-the-arton one sample selection, we also find that on a more challenging sampleselection, on the same database, the more advanced models underperform withrespect to very simple baseline methods, highlighting that the selected datadistribution may play an even more important role than the model architecture.This raises concerns about the validity of claimed methodologicalcontributions. We believe the community should be aware of these pitfalls andmake recommendations on how these can be avoided in future work.

Conference paper

Baltatzis V, Folgoc LL, Ellis S, Manzanera OEM, Bintsi K-M, Nair A, Desai S, Glocker B, Schnabel JAet al., 2021, The effect of the loss on generalization: empirical study on syntheticlung nodule data, Interpretability of Machine Intelligence in Medical Image Computing at MICCAI 2021

Convolutional Neural Networks (CNNs) are widely used for image classificationin a variety of fields, including medical imaging. While most studies deploycross-entropy as the loss function in such tasks, a growing number ofapproaches have turned to a family of contrastive learning-based losses. Eventhough performance metrics such as accuracy, sensitivity and specificity areregularly used for the evaluation of CNN classifiers, the features that theseclassifiers actually learn are rarely identified and their effect on theclassification performance on out-of-distribution test samples isinsufficiently explored. In this paper, motivated by the real-world task oflung nodule classification, we investigate the features that a CNN learns whentrained and tested on different distributions of a synthetic dataset withcontrolled modes of variation. We show that different loss functions lead todifferent features being learned and consequently affect the generalizationability of the classifier on unseen data. This study provides some importantinsights into the design of deep learning solutions for medical imaging tasks.

Conference paper

Osuala R, Kushibar K, Garrucho L, Linardos A, Szafranowska Z, Klein S, Glocker B, Diaz O, Lekadir Ket al., 2021, A review of generative adversarial networks in cancer imaging: new applications, new solutions, Publisher: arXiv

Despite technological and medical advances, the detection, interpretation,and treatment of cancer based on imaging data continue to pose significantchallenges. These include high inter-observer variability, difficulty ofsmall-sized lesion detection, nodule interpretation and malignancydetermination, inter- and intra-tumour heterogeneity, class imbalance,segmentation inaccuracies, and treatment effect uncertainty. The recentadvancements in Generative Adversarial Networks (GANs) in computer vision aswell as in medical imaging may provide a basis for enhanced capabilities incancer detection and analysis. In this review, we assess the potential of GANsto address a number of key challenges of cancer imaging, including datascarcity and imbalance, domain and dataset shifts, data access and privacy,data annotation and quantification, as well as cancer detection, tumourprofiling and treatment planning. We provide a critical appraisal of theexisting literature of GANs applied to cancer imagery, together withsuggestions on future research directions to address these challenges. Weanalyse and discuss 163 papers that apply adversarial training techniques inthe context of cancer imaging and elaborate their methodologies, advantages andlimitations. With this work, we strive to bridge the gap between the needs ofthe clinical cancer imaging community and the current and prospective researchon GANs in the artificial intelligence community.

Working paper

Filbrandt G, Kamnitsas K, Bernstein D, Taylor A, Glocker Bet al., 2021, Learning from Partially Overlapping Labels: Image Segmentation under Annotation Shift, MICCAI Workshop on Domain Adaptation and Representation Transfer

Scarcity of high quality annotated images remains a limiting factor fortraining accurate image segmentation models. While more and more annotateddatasets become publicly available, the number of samples in each individualdatabase is often small. Combining different databases to create larger amountsof training data is appealing yet challenging due to the heterogeneity as aresult of differences in data acquisition and annotation processes, oftenyielding incompatible or even conflicting information. In this paper, weinvestigate and propose several strategies for learning from partiallyoverlapping labels in the context of abdominal organ segmentation. We find thatcombining a semi-supervised approach with an adaptive cross entropy loss cansuccessfully exploit heterogeneously annotated data and substantially improvesegmentation accuracy compared to baseline and alternative approaches.

Conference paper

Kamnitsas K, Winzeck S, Kornaropoulos EN, Whitehouse D, Englman C, Phyu P, Pao N, Menon DK, Rueckert D, Das T, Newcombe VFJ, Glocker Bet al., 2021, Transductive image segmentation: Self-training and effect of uncertainty estimation, MICCAI Workshop on Domain Adaptation and Representation Transfer

Semi-supervised learning (SSL) uses unlabeled data during training to learnbetter models. Previous studies on SSL for medical image segmentation focusedmostly on improving model generalization to unseen data. In some applications,however, our primary interest is not generalization but to obtain optimalpredictions on a specific unlabeled database that is fully available duringmodel development. Examples include population studies for extracting imagingphenotypes. This work investigates an often overlooked aspect of SSL,transduction. It focuses on the quality of predictions made on the unlabeleddata of interest when they are included for optimization during training,rather than improving generalization. We focus on the self-training frameworkand explore its potential for transduction. We analyze it through the lens ofInformation Gain and reveal that learning benefits from the use of calibratedor under-confident models. Our extensive experiments on a large MRI databasefor multi-class segmentation of traumatic brain lesions shows promising resultswhen comparing transductive with inductive predictions. We believe this studywill inspire further research on transductive learning, a well-suited paradigmfor medical image analysis.

Conference paper

Sekuboyina A, Husseini ME, Bayat A, Löffler M, Liebl H, Li H, Tetteh G, Kukačka J, Payer C, Štern D, Urschler M, Chen M, Cheng D, Lessmann N, Hu Y, Wang T, Yang D, Xu D, Ambellan F, Amiranashvili T, Ehlke M, Lamecker H, Lehnert S, Lirio M, Olaguer NPD, Ramm H, Sahu M, Tack A, Zachow S, Jiang T, Ma X, Angerman C, Wang X, Brown K, Wolf M, Kirszenberg A, Puybareau É, Chen D, Bai Y, Rapazzo BH, Yeah T, Zhang A, Xu S, Hou F, He Z, Zeng C, Xiangshang Z, Liming X, Netherton TJ, Mumme RP, Court LE, Huang Z, He C, Wang L-W, Ling SH, Huynh LD, Boutry N, Jakubicek R, Chmelik J, Mulay S, Sivaprakasam M, Paetzold JC, Shit S, Ezhov I, Wiestler B, Glocker B, Valentinitsch A, Rempfler M, Menze BH, Kirschke JSet al., 2021, VerSe: a vertebrae labelling and segmentation benchmark for multi-detector CT images, Medical Image Analysis, ISSN: 1361-8415

Vertebral labelling and segmentation are two fundamental tasks in anautomated spine processing pipeline. Reliable and accurate processing of spineimages is expected to benefit clinical decision-support systems for diagnosis,surgery planning, and population-based analysis on spine and bone health.However, designing automated algorithms for spine processing is challengingpredominantly due to considerable variations in anatomy and acquisitionprotocols and due to a severe shortage of publicly available data. Addressingthese limitations, the Large Scale Vertebrae Segmentation Challenge (VerSe) wasorganised in conjunction with the International Conference on Medical ImageComputing and Computer Assisted Intervention (MICCAI) in 2019 and 2020, with acall for algorithms towards labelling and segmentation of vertebrae. Twodatasets containing a total of 374 multi-detector CT scans from 355 patientswere prepared and 4505 vertebrae have individually been annotated atvoxel-level by a human-machine hybrid algorithm (https://osf.io/nqjyw/,https://osf.io/t98fz/). A total of 25 algorithms were benchmarked on thesedatasets. In this work, we present the the results of this evaluation andfurther investigate the performance-variation at vertebra-level, scan-level,and at different fields-of-view. We also evaluate the generalisability of theapproaches to an implicit domain shift in data by evaluating the top performingalgorithms of one challenge iteration on data from the other iteration. Theprincipal takeaway from VerSe: the performance of an algorithm in labelling andsegmenting a spine scan hinges on its ability to correctly identify vertebraein cases of rare anatomical variations. The content and code concerning VerSecan be accessed at: https://github.com/anjany/verse.

Journal article

Islam M, Glocker B, 2021, Spatially varying label smoothing: capturing uncertainty from expertannotations, Information Processing in Medical Imaging (IPMI) 2021, Publisher: Springer Verlag, ISSN: 0302-9743

The task of image segmentation is inherently noisy due to ambiguitiesregarding the exact location of boundaries between anatomical structures. Weargue that this information can be extracted from the expert annotations at noextra cost, and when integrated into state-of-the-art neural networks, it canlead to improved calibration between soft probabilistic predictions and theunderlying uncertainty. We built upon label smoothing (LS) where a network istrained on 'blurred' versions of the ground truth labels which has been shownto be effective for calibrating output predictions. However, LS is not takingthe local structure into account and results in overly smoothed predictionswith low confidence even for non-ambiguous regions. Here, we propose SpatiallyVarying Label Smoothing (SVLS), a soft labeling technique that captures thestructural uncertainty in semantic segmentation. SVLS also naturally lendsitself to incorporate inter-rater uncertainty when multiple labelmaps areavailable. The proposed approach is extensively validated on four clinicalsegmentation tasks with different imaging modalities, number of classes andsingle and multi-rater expert annotations. The results demonstrate that SVLS,despite its simplicity, obtains superior boundary prediction with improveduncertainty and model calibration.

Conference paper

Qaiser T, Winzeck S, Barfoot T, Barwick T, Doran SJ, Kaiser MF, Wedlake L, Tunariu N, Koh D-M, Messiou C, Rockall A, Glocker Bet al., 2021, Multiple instance learning with auxiliary task weighting for multiple myeloma classification, International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI)

Whole body magnetic resonance imaging (WB-MRI) is the recommended modalityfor diagnosis of multiple myeloma (MM). WB-MRI is used to detect sites ofdisease across the entire skeletal system, but it requires significantexpertise and is time-consuming to report due to the great number of images. Toaid radiological reading, we propose an auxiliary task-based multiple instancelearning approach (ATMIL) for MM classification with the ability to localizesites of disease. This approach is appealing as it only requires patient-levelannotations where an attention mechanism is used to identify local regions withactive disease. We borrow ideas from multi-task learning and define anauxiliary task with adaptive reweighting to support and improve learningefficiency in the presence of data scarcity. We validate our approach on bothsynthetic and real multi-center clinical data. We show that the MIL attentionmodule provides a mechanism to localize bone regions while the adaptivereweighting of the auxiliary task considerably improves the performance.

Conference paper

Kart T, Fischer M, Kuestner T, Hepp T, Bamberg F, Winzeck S, Glocker B, Rueckert D, Gatidis Set al., 2021, Deep Learning-Based Automated Abdominal Organ Segmentation in the UK Biobank and German National Cohort Magnetic Resonance Imaging Studies, INVESTIGATIVE RADIOLOGY, Vol: 56, Pages: 401-408, ISSN: 0020-9996

Journal article

Co KT, Muñoz-González L, Kanthan L, Glocker B, Lupu ECet al., 2021, Universal Adversarial Robustness of Texture and Shape-Biased Models, IEEE International Conference on Image Processing (ICIP)

Increasing shape-bias in deep neural networks has been shown to improverobustness to common corruptions and noise. In this paper we analyze theadversarial robustness of texture and shape-biased models to UniversalAdversarial Perturbations (UAPs). We use UAPs to evaluate the robustness of DNNmodels with varying degrees of shape-based training. We find that shape-biasedmodels do not markedly improve adversarial robustness, and we show thatensembles of texture and shape-biased models can improve universal adversarialrobustness while maintaining strong performance.

Conference paper

Popescu SG, Sharp DJ, Cole JH, Kamnitsas K, Glocker Bet al., 2021, Distributional gaussian process layers for outlier detection in imagesegmentation, Information Processing in Medical Imaging (IPMI) 2021, Publisher: arXiv

We propose a parameter efficient Bayesian layer for hierarchicalconvolutional Gaussian Processes that incorporates Gaussian Processes operatingin Wasserstein-2 space to reliably propagate uncertainty. This directlyreplaces convolving Gaussian Processes with a distance-preserving affineoperator on distributions. Our experiments on brain tissue-segmentation showthat the resulting architecture approaches the performance of well-establisheddeterministic segmentation algorithms (U-Net), which has never been achievedwith previous hierarchical Gaussian Processes. Moreover, by applying the samesegmentation model to out-of-distribution data (i.e., images with pathologysuch as brain tumors), we show that our uncertainty estimates result inout-of-distribution detection that outperforms the capabilities of previousBayesian networks and reconstruction-based approaches that learn normativedistributions.

Conference paper

Reinke A, Eisenmann M, Tizabi MD, Sudre CH, Rädsch T, Antonelli M, Arbel T, Bakas S, Cardoso MJ, Cheplygina V, Farahani K, Glocker B, Heckmann-Nötzel D, Isensee F, Jannin P, Kahn CE, Kleesiek J, Kurc T, Kozubek M, Landman BA, Litjens G, Maier-Hein K, Menze B, Müller H, Petersen J, Reyes M, Rieke N, Stieltjes B, Summers RM, Tsaftaris SA, Ginneken BV, Kopp-Schneider A, Jäger P, Maier-Hein Let al., 2021, Common limitations of image processing metrics: a picture story, Publisher: arXiv

While the importance of automatic image analysis is increasing at an enormouspace, recent meta-research revealed major flaws with respect to algorithmvalidation. Specifically, performance metrics are key for objective,transparent and comparative performance assessment, but relatively littleattention has been given to the practical pitfalls when using specific metricsfor a given image analysis task. A common mission of several internationalinitiatives is therefore to provide researchers with guidelines and tools tochoose the performance metrics in a problem-aware manner. This dynamicallyupdated document has the purpose to illustrate important limitations ofperformance metrics commonly applied in the field of image analysis. Thecurrent version is based on a Delphi process on metrics conducted by aninternational consortium of image analysis experts.

Working paper

Manzanera OEM, Ellis S, Baltatzis V, Nair A, Le Folgoc L, Desai S, Glocker B, Schnabel JAet al., 2021, Patient-specific 3d cellular automata nodule growth synthesis in lung cancer without the need of external data, Pages: 925-928, ISSN: 1945-7928

We propose a novel patient-specific generative approach to simulate the emergence and growth of lung nodules using 3D cellular automata (CA) in computer tomography (CT). Our proposed method can be applied to individual images thus eliminating the need of external images that can contaminate and influence the generative process, a valuable characteristic in the medical domain. Firstly, we employ inpainting to generate pseudo-healthy representations of lung CT scans prior the visible appearance of each lung nodule. Then, for each nodule, we train a 3D CA to simulate nodule growth and progression using the image of that same nodule as a target. After each CA is trained, we generate early versions of each nodule from a single voxel until the growing nodule closely matches the appearance of the original nodule. These synthesized nodules are inserted where the original nodule was located in the pseudo-healthy inpainted versions of the CTs, which provide realistic context to the generated nodule. We utilize the simulated images for data augmentation yielding false positive reduction in a nodule detector. We found statistically significant improvements (p lt 0.001) in the detection of lung nodules.

Conference paper

Dou Q, So TY, Jiang M, Liu Q, Vardhanabhuti V, Kaissis G, Li Z, Si W, Lee HHC, Yu K, Feng Z, Dong L, Burian E, Jungmann F, Braren R, Makowski M, Kainz B, Rueckert D, Glocker B, Yu SCH, Heng PAet al., 2021, Federated deep learning for detecting COVID-19 lung abnormalities in CT: a privacy-preserving multinational validation study, NPJ DIGITAL MEDICINE, Vol: 4, ISSN: 2398-6352

Journal article

Li Z, Kamnitsas K, Glocker B, 2021, Analyzing overfitting under class imbalance in neural networks for image segmentation, IEEE Transactions on Medical Imaging, Vol: 40, Pages: 1065-1077, ISSN: 0278-0062

Class imbalance poses a challenge for developingunbiased, accurate predictive models. In particular, in imagesegmentation neural networks may overfit to the foregroundsamples from small structures, which are often heavily underrepresented in the training set, leading to poor generalization.In this study, we provide new insights on the problem ofoverfitting under class imbalance by inspecting the networkbehavior. We find empirically that when training with limiteddata and strong class imbalance, at test time the distribution oflogit activations may shift across the decision boundary, whilesamples of the well-represented class seem unaffected. This biasleads to a systematic under-segmentation of small structures.This phenomenon is consistently observed for different databases,tasks and network architectures. To tackle this problem, weintroduce new asymmetric variants of popular loss functionsand regularization techniques including a large margin loss,focal loss, adversarial training, mixup and data augmentation,which are explicitly designed to counter logit shift of the underrepresented classes. Extensive experiments are conducted onseveral challenging segmentation tasks. Our results demonstratethat the proposed modifications to the objective function canlead to significantly improved segmentation accuracy comparedto baselines and alternative approaches.

Journal article

Korkinof D, Harvey H, Heindl A, Karpati E, Williams G, Rijken T, Kecskemethy P, Glocker Bet al., 2021, Perceived Realism of High-Resolution Generative Adversarial Network-derived Synthetic Mammograms., Radiol Artif Intell, Vol: 3

Purpose: To explore whether generative adversarial networks (GANs) can enable synthesis of realistic medical images that are indiscernible from real images, even by domain experts. Materials and Methods: In this retrospective study, progressive growing GANs were used to synthesize mammograms at a resolution of 1280 × 1024 pixels by using images from 90 000 patients (average age, 56 years ± 9) collected between 2009 and 2019. To evaluate the results, a method to assess distributional alignment for ultra-high-dimensional pixel distributions was used, which was based on moment plots. This method was able to reveal potential sources of misalignment. A total of 117 volunteer participants (55 radiologists and 62 nonradiologists) took part in a study to assess the realism of synthetic images from GANs. Results: A quantitative evaluation of distributional alignment shows 60%-78% mutual-information score between the real and synthetic image distributions, and 80%-91% overlap in their support, which are strong indications against mode collapse. It also reveals shape misalignment as the main difference between the two distributions. Obvious artifacts were found by an untrained observer in 13.6% and 6.4% of the synthetic mediolateral oblique and craniocaudal images, respectively. A reader study demonstrated that real and synthetic images are perceptually inseparable by the majority of participants, even by trained breast radiologists. Only one out of the 117 participants was able to reliably distinguish real from synthetic images, and this study discusses the cues they used to do so. Conclusion: On the basis of these findings, it appears possible to generate realistic synthetic full-field digital mammograms by using a progressive GAN architecture up to a resolution of 1280 × 1024 pixels.Supplemental material is available for this article.© RSNA, 2020.

Journal article

Folgoc LL, Baltatzis V, Alansary A, Desai S, Devaraj A, Ellis S, Manzanera OEM, Kanavati F, Nair A, Schnabel J, Glocker Bet al., 2021, Bayesian analysis of the prevalence bias: learning and predicting from imbalanced data

Datasets are rarely a realistic approximation of the target population. Say,prevalence is misrepresented, image quality is above clinical standards, etc.This mismatch is known as sampling bias. Sampling biases are a major hindrancefor machine learning models. They cause significant gaps between modelperformance in the lab and in the real world. Our work is a solution toprevalence bias. Prevalence bias is the discrepancy between the prevalence of apathology and its sampling rate in the training dataset, introduced uponcollecting data or due to the practioner rebalancing the training batches. Thispaper lays the theoretical and computational framework for training models, andfor prediction, in the presence of prevalence bias. Concretely a bias-correctedloss function, as well as bias-corrected predictive rules, are derived underthe principles of Bayesian risk minimization. The loss exhibits a directconnection to the information gain. It offers a principled alternative toheuristic training losses and complements test-time procedures based onselecting an operating point from summary curves. It integrates seamlessly inthe current paradigm of (deep) learning using stochastic backpropagation andnaturally with Bayesian models.

Working paper

Berger C, Paschali M, Glocker B, Kamnitsas Ket al., 2021, Confidence-based Out-of-Distribution Detection: A Comparative Study and Analysis

Image classification models deployed in the real world may receive inputsoutside the intended data distribution. For critical applications such asclinical decision making, it is important that a model can detect suchout-of-distribution (OOD) inputs and express its uncertainty. In this work, weassess the capability of various state-of-the-art approaches forconfidence-based OOD detection through a comparative study and in-depthanalysis. First, we leverage a computer vision benchmark to reproduce andcompare multiple OOD detection methods. We then evaluate their capabilities onthe challenging task of disease classification using chest X-rays. Our studyshows that high performance in a computer vision task does not directlytranslate to accuracy in a medical imaging task. We analyse factors that affectperformance of the methods between the two tasks. Our results provide usefulinsights for developing the next generation of OOD detection methods.

Conference paper

Budd S, Sinclair M, Day T, Vlontzos A, Tan J, Liu T, Matthew J, Skelton E, Simpson J, Razavi R, Glocker B, Rueckert D, Robinson EC, Kainz Bet al., 2021, Detecting Hypo-plastic Left Heart Syndrome in Fetal Ultrasound via Disease-specific Atlas Maps

Fetal ultrasound screening during pregnancy plays a vital role in the earlydetection of fetal malformations which have potential long-term health impacts.The level of skill required to diagnose such malformations from live ultrasoundduring examination is high and resources for screening are often limited. Wepresent an interpretable, atlas-learning segmentation method for automaticdiagnosis of Hypo-plastic Left Heart Syndrome (HLHS) from a single `4 ChamberHeart' view image. We propose to extend the recently introducedImage-and-Spatial Transformer Networks (Atlas-ISTN) into a framework thatenables sensitising atlas generation to disease. In this framework we canjointly learn image segmentation, registration, atlas construction and diseaseprediction while providing a maximum level of clinical interpretabilitycompared to direct image classification methods. As a result our segmentationallows diagnoses competitive with expert-derived manual diagnosis and yields anAUC-ROC of 0.978 (1043 cases for training, 260 for validation and 325 fortesting).

Conference paper

Popescu S, Sharp D, Cole J, Glocker Bet al., 2020, Decoupled Sparse Gaussian Processes Components: Separating Decision Making from Data Manifold Fitting, Third Symposium on Advances in Approximate Bayesian Inference

Conference paper

Sinclair M, Schuh A, Hahn K, Petersen K, Bai Y, Batten J, Schaap M, Glocker Bet al., 2020, Atlas-ISTN: joint segmentation, registration and Atlas construction with image-and-spatial transformer networks

Deep learning models for semantic segmentation are able to learn powerfulrepresentations for pixel-wise predictions, but are sensitive to noise at testtime and do not guarantee a plausible topology. Image registration models onthe other hand are able to warp known topologies to target images as a means ofsegmentation, but typically require large amounts of training data, and havenot widely been benchmarked against pixel-wise segmentation models. We proposeAtlas-ISTN, a framework that jointly learns segmentation and registration on 2Dand 3D image data, and constructs a population-derived atlas in the process.Atlas-ISTN learns to segment multiple structures of interest and to registerthe constructed, topologically consistent atlas labelmap to an intermediatepixel-wise segmentation. Additionally, Atlas-ISTN allows for test timerefinement of the model's parameters to optimize the alignment of the atlaslabelmap to an intermediate pixel-wise segmentation. This process bothmitigates for noise in the target image that can result in spurious pixel-wisepredictions, as well as improves upon the one-pass prediction of the model.Benefits of the Atlas-ISTN framework are demonstrated qualitatively andquantitatively on 2D synthetic data and 3D cardiac computed tomography andbrain magnetic resonance image data, out-performing both segmentation andregistration baseline models. Atlas-ISTN also provides inter-subjectcorrespondence of the structures of interest, enabling population-level shapeand motion analysis.

Working paper

Zeiler FA, Mathieu F, Monteiro M, Glocker B, Ercole A, Cabeleira M, Stocchetti N, Smielewski P, Czosnyka M, Newcombe V, Menon DKet al., 2020, Systemic Markers of Injury and Injury Response Are Not Associated with Impaired Cerebrovascular Reactivity in Adult Traumatic Brain Injury: A Collaborative European Neurotrauma Effectiveness Research in Traumatic Brain Injury (CENTER-TBI) Study, JOURNAL OF NEUROTRAUMA, Vol: 38, Pages: 870-878, ISSN: 0897-7151

Journal article

Matzkin F, Newcombe V, Glocker B, Ferrante Eet al., 2020, Cranial implant design via virtual craniectomy with shape priors, AutoImplant 2020, Publisher: Springer, Cham, Pages: 37-46

Cranial implant design is a challenging task, whose accuracy is crucial in the context of cranioplasty procedures. This task isusually performed manually by experts using computer-assisted designsoftware. In this work, we propose and evaluate alternative automaticdeep learning models for cranial implant reconstruction from CT images. The models are trained and evaluated using the database releasedby the AutoImplant challenge, and compared to a baseline implementedby the organizers. We employ a simulated virtual craniectomy to trainour models using complete skulls, and compare two different approachestrained with this procedure. The first one is a direct estimation methodbased on the UNet architecture. The second method incorporates shapepriors to increase the robustness when dealing with out-of-distributionimplant shapes. Our direct estimation method outperforms the baselinesprovided by the organizers, while the model with shape priors showssuperior performance when dealing with out-of-distribution cases. Overall, our methods show promising results in the difficult task of cranialimplant design.

Conference paper

Larrazabal AJ, Martínez C, Glocker B, Ferrante Eet al., 2020, Post-DAE: anatomically plausible segmentation via post-processing with denoising autoencoders, IEEE Transactions on Medical Imaging, Vol: 39, Pages: 3813-3820, ISSN: 0278-0062

We introduce Post-DAE, a post-processing method based on denoisingautoencoders (DAE) to improve the anatomical plausibility of arbitrarybiomedical image segmentation algorithms. Some of the most popular segmentationmethods (e.g. based on convolutional neural networks or random forestclassifiers) incorporate additional post-processing steps to ensure that theresulting masks fulfill expected connectivity constraints. These methodsoperate under the hypothesis that contiguous pixels with similar aspect shouldbelong to the same class. Even if valid in general, this assumption does notconsider more complex priors like topological restrictions or convexity, whichcannot be easily incorporated into these methods. Post-DAE leverages the latestdevelopments in manifold learning via denoising autoencoders. First, we learn acompact and non-linear embedding that represents the space of anatomicallyplausible segmentations. Then, given a segmentation mask obtained with anarbitrary method, we reconstruct its anatomically plausible version byprojecting it onto the learnt manifold. The proposed method is trained usingunpaired segmentation mask, what makes it independent of intensity informationand image modality. We performed experiments in binary and multi-labelsegmentation of chest X-ray and cardiac magnetic resonance images. We show howerroneous and noisy segmentation masks can be improved using Post-DAE. Withalmost no additional computation cost, our method brings erroneoussegmentations back to a feasible space.

Journal article

Oktay O, Nanavati J, Schwaighofer A, Carter D, Bristow M, Tanno R, Jena R, Barnett G, Noble D, Rimmer Y, Glocker B, O'Hara K, Bishop C, Alvarez-Valle J, Nori Aet al., 2020, Evaluation of deep learning to augment image-guided radiotherapy for head and neck and prostate cancers, Jama Network Open, Vol: 3, Pages: 1-11, ISSN: 2574-3805

Importance: Personalized radiotherapy planning depends on high-quality delineation of target tumors and surrounding organs at risk (OARs). This process puts additional time burdens on oncologists and introduces variability among both experts and institutions. Objective: To explore clinically acceptable autocontouring solutions that can be integrated into existing workflows and used in different domains of radiotherapy. Design, Setting, and Participants: This quality improvement study used a multicenter imaging data set comprising 519 pelvic and 242 head and neck computed tomography (CT) scans from 8 distinct clinical sites and patients diagnosed either with prostate or head and neck cancer. The scans were acquired as part of treatment dose planning from patients who received intensity-modulated radiation therapy between October 2013 and February 2020. Fifteen different OARs were manually annotated by expert readers and radiation oncologists. The models were trained on a subset of the data set to automatically delineate OARs and evaluated on both internal and external data sets. Data analysis was conducted October 2019 to September 2020. Main Outcomes and Measures: The autocontouring solution was evaluated on external data sets, and its accuracy was quantified with volumetric agreement and surface distance measures. Models were benchmarked against expert annotations in an interobserver variability (IOV) study. Clinical utility was evaluated by measuring time spent on manual corrections and annotations from scratch. Results: A total of 519 participants' (519 [100%] men; 390 [75%] aged 62-75 years) pelvic CT images and 242 participants' (184 [76%] men; 194 [80%] aged 50-73 years) head and neck CT images were included. The models achieved levels of clinical accuracy within the bounds of expert IOV for 13 of 15 structures (eg, left femur, κ = 0.982; brainstem, κ = 0.806) and performed consistently well across both external and inte

Journal article

Popescu S, Sharp D, Cole J, Glocker Bet al., 2020, Hierarchical Gaussian processes with Wasserstein-2 kernels, Publisher: arXiv

We investigate the usefulness of Wasserstein-2 kernels in the context ofhierarchical Gaussian Processes. Stemming from an observation that stackingGaussian Processes severely diminishes the model's ability to detect outliers,which when combined with non-zero mean functions, further extrapolates lowvariance to regions with low training data density, we posit that directlytaking into account the variance in the computation of Wasserstein-2 kernels isof key importance towards maintaining outlier status as we progress through thehierarchy. We propose two new models operating in Wasserstein space which canbe seen as equivalents to Deep Kernel Learning and Deep GPs. Through extensiveexperiments, we show improved performance on large scale datasets and improvedout-of-distribution detection on both toy and real data.

Working paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00795421&limit=30&person=true