Publications

Chotzoglou E, Day T, Tan J, Matthew J, Lloyd D, Razavi R, Simpson J, Kainz Bet al., 2021, Learning normal appearance for fetal anomaly screening: application to the unsupervised detection of Hypoplastic Left Heart Syndrome, Journal of Machine Learning for Biomedical Imaging, Vol: 2021, Pages: 1-25

Congenital heart disease is considered as one the most common groups of congenital malformations which affects 6 − 11 per 1000 newborns. In this work, an automated framework for detection of cardiac anomalies during ultrasound screening is proposed and evaluated on the example of Hypoplastic Left Heart Syndrome (HLHS), a sub-category of congenital heart disease. We propose an unsupervised approach that learns healthy anatomy exclusively from clinically confirmed normal control patients. We evaluate a number of known anomaly detection frameworks together with a new model architecture based on the α-GAN network and find evidence that the proposed model performs significantly better than the state-of-the-art in image-based anomaly detection, yielding average 0.81 AUC and a better robustness towards initialisation compared to previous works.

Journal article

Budd S, Robinson E, Kainz B, 2021, A survey on active learning and human-in-the-loop deep learning for medical image analysis, Medical Image Analysis, Vol: 71, ISSN: 1361-8415

Fully automatic deep learning has become the state-of-the-art technique for many tasks including image acquisition, analysis andinterpretation, and for the extraction of clinically useful information for computer-aided detection, diagnosis, treatment planning,intervention and therapy. However, the unique challenges posed by medical image analysis suggest that retaining a human end-user in any deep learning enabled system will be beneficial. In this review we investigate the role that humans might play in thedevelopment and deployment of deep learning enabled diagnostic applications and focus on techniques that will retain a significantinput from a human end user. Human-in-the-Loop computing is an area that we see as increasingly important in future research dueto the safety-critical nature of working in the medical domain. We evaluate four key areas that we consider vital for deep learningin the clinical practice: (1)Active Learningto choose the best data to annotate for optimal model performance; (2)Interaction withmodel outputs- using iterative feedback to steer models to optima for a given prediction and offering meaningful ways to interpretand respond to predictions; (3) Practical considerations- developing full scale applications and the key considerations that need tobe made before deployment; (4)Future Prospective and Unanswered Questions- knowledge gaps and related research fields thatwill benefit human-in-the-loop computing as they evolve. We offer our opinions on the most promising directions of research andhow various aspects of each area might be unified towards common goals.

Journal article

Day TG, Kainz B, Hajnal J, Razavi R, Simpson JMet al., 2021, Artificial intelligence, fetal echocardiography, and congenital heart disease, Prenatal Diagnosis, Vol: 41, ISSN: 0197-3851

There has been a recent explosion in the use of artificial intelligence (AI), which is now part of our everyday lives. Uptake in medicine has been more limited, although in several fields there have been encouraging results showing excellent performance when AI is used to assist in a well-defined medical task. Most of this work has been performed using retrospective data, and there have been few clinical trials published using prospective data. This review focuses on the potential uses of AI in the field of fetal cardiology. Ultrasound of the fetal heart is highly specific and sensitive in experienced hands, but despite this there is significant room for improvement in the rates of prenatal diagnosis of congenital heart disease in most countries. AI may be one way of improving this. Other potential applications in fetal cardiology include the provision of more accurate prognoses for individuals, and automatic quantification of various metrics including cardiac function. However, there are also ethical and governance concerns. These will need to be overcome before AI can be widely accepted in mainstream use. It is likely that a familiarity of the uses, and pitfalls, of AI will soon be mandatory for many healthcare professionals working in fetal cardiology.

Journal article

Skelton E, Matthew J, Li Y, Khanal B, Martinez JJC, Toussaint N, Gupta C, Knight C, Kainz B, Hajnal JV, Rutherford Met al., 2021, Towards automated extraction of 2D standard fetal head planes from 3D ultrasound acquisitions: A clinical evaluation and quality assessment comparison, Radiography, Vol: 27, Pages: 519-526, ISSN: 1078-8174

IntroductionClinical evaluation of deep learning (DL) tools is essential to compliment technical accuracy metrics. This study assessed the image quality of standard fetal head planes automatically-extracted from three-dimensional (3D) ultrasound fetal head volumes using a customised DL-algorithm.MethodsTwo observers retrospectively reviewed standard fetal head planes against pre-defined image quality criteria. Forty-eight images (29 transventricular, 19 transcerebellar) were selected from 91 transabdominal fetal scans (mean gestational age = 26 completed weeks, range = 20+5–32+3 weeks). Each had two-dimensional (2D) manually-acquired (2D-MA), 3D operator-selected (3D-OS) and 3D-DL automatically-acquired (3D-DL) images. The proportion of adequate images from each plane and modality, and the number of inadequate images per plane was compared for each method. Inter and intra-observer agreement of overall image quality was calculated.ResultsSixty-seven percent of 3D-OS and 3D-DL transventricular planes were adequate quality. Forty-five percent of 3D-OS and 55% of 3D-DL transcerebellar planes were adequate.Seventy-one percent of 3D-OS and 86% of 3D-DL transventricular planes failed with poor visualisation of intra-cranial structures. Eighty-six percent of 3D-OS and 80% of 3D-DL transcerebellar planes failed due to inadequate visualisation of cerebellar hemispheres. Image quality was significantly different between 2D and 3D, however, no significant difference between 3D-modalities was demonstrated (p < 0.005). Inter-observer agreement of transventricular plane adequacy was moderate for both 3D-modalities, and weak for transcerebellar planes.ConclusionThe 3D-DL algorithm can automatically extract standard fetal head planes from 3D-head volumes of comparable quality to operator-selected planes. Image quality in 3D is inferior to corresponding 2D planes, likely due to limitations with 3D-technology and acquisition technique.Implications for practiceAutomated image

Journal article

Reinke A, Eisenmann M, Tizabi MD, Sudre CH, Rädsch T, Antonelli M, Arbel T, Bakas S, Cardoso MJ, Cheplygina V, Farahani K, Glocker B, Heckmann-Nötzel D, Isensee F, Jannin P, Kahn CE, Kleesiek J, Kurc T, Kozubek M, Landman BA, Litjens G, Maier-Hein K, Menze B, Müller H, Petersen J, Reyes M, Rieke N, Stieltjes B, Summers RM, Tsaftaris SA, Ginneken BV, Kopp-Schneider A, Jäger P, Maier-Hein Let al., 2021, Common limitations of image processing metrics: a picture story, Publisher: arXiv

While the importance of automatic image analysis is increasing at an enormouspace, recent meta-research revealed major flaws with respect to algorithmvalidation. Specifically, performance metrics are key for objective,transparent and comparative performance assessment, but relatively littleattention has been given to the practical pitfalls when using specific metricsfor a given image analysis task. A common mission of several internationalinitiatives is therefore to provide researchers with guidelines and tools tochoose the performance metrics in a problem-aware manner. This dynamicallyupdated document has the purpose to illustrate important limitations ofperformance metrics commonly applied in the field of image analysis. Thecurrent version is based on a Delphi process on metrics conducted by aninternational consortium of image analysis experts.

Working paper

Dou Q, So TY, Jiang M, Liu Q, Vardhanabhuti V, Kaissis G, Li Z, Si W, Lee HHC, Yu K, Feng Z, Dong L, Burian E, Jungmann F, Braren R, Makowski M, Kainz B, Rueckert D, Glocker B, Yu SCH, Heng PAet al., 2021, Federated deep learning for detecting COVID-19 lung abnormalities in CT: a privacy-preserving multinational validation study, npj Digital Medicine, Vol: 4, Pages: 1-11, ISSN: 2398-6352

Data privacy mechanisms are essential for rapidly scaling medical training databases to capture the heterogeneity of patient data distributions toward robust and generalizable machine learning systems. In the current COVID-19 pandemic, a major focus of artificial intelligence (AI) is interpreting chest CT, which can be readily used in the assessment and management of the disease. This paper demonstrates the feasibility of a federated learning method for detecting COVID-19 related CT abnormalities with external validation on patients from a multinational study. We recruited 132 patients from seven multinational different centers, with three internal hospitals from Hong Kong for training and testing, and four external, independent datasets from Mainland China and Germany, for validating model generalizability. We also conducted case studies on longitudinal scans for automated estimation of lesion burden for hospitalized COVID-19 patients. We explore the federated learning algorithms to develop a privacy-preserving AI model for COVID-19 medical image diagnosis with good generalization capability on unseen multinational datasets. Federated learning could provide an effective mechanism during pandemics to rapidly develop clinically useful AI across institutions and countries overcoming the burden of central aggregation of large amounts of sensitive data.

Journal article

Giulio J, Kainz B, 2021, Deep radiance caching: Convolutional autoencoders deeper in ray tracing, Computers and Graphics (UK), Vol: 94, Pages: 22-31, ISSN: 0097-8493

Rendering realistic images with global illumination is a computationally demanding task and often requires dedicated hardware for feasible runtime. Recent research uses Deep Neural Networks to predict indirect lighting on image level, but such methods are commonly limited to diffuse materials and require training on each scene. We present Deep Radiance Caching (DRC), an efficient variant of Radiance Caching utilizing Convolutional Autoencoders for rendering global illumination. DRC employs a denoising neural network with Radiance Caching to support a wide range of material types, without the requirement of offline pre-computation or training for each scene. This offers high performance CPU rendering for maximum accessibility. Our method has been evaluated on interior scenes, and is able to produce high-quality images within 180 s on a single CPU.

Journal article

Meng Q, Matthew J, Zimmer VA, Gomez A, Lloyd DFA, Rueckert D, Kainz Bet al., 2021, Mutual information-based disentangled neural networks for classifying unseen categories in different domains: application to fetal ultrasound imaging, IEEE Transactions on Medical Imaging, Vol: 40, Pages: 722-734, ISSN: 0278-0062

Deep neural networks exhibit limited generalizability across images with different entangled domain features and categorical features. Learning generalizable features that can form universal categorical decision boundaries across domains is an interesting and difficult challenge. This problem occurs frequently in medical imaging applications when attempts are made to deploy and improve deep learning models across different image acquisition devices, across acquisition parameters or if some classes are unavailable in new training databases. To ad-dress this problem, we propose Mutual Information-based Disentangled Neural Networks (MIDNet), which extract generalizable categorical features to transfer knowledge to unseen categories in a target domain. The proposed MID-Net adopts a semi-supervised learning paradigm to alleviate the dependency on labeled data. This is important for real-world applications where data annotation is time-consuming, costly and requires training and expertise. We extensively evaluate the proposed method on fetal ultra-sound datasets for two different image classification tasks where domain features are respectively defined by shadow artifacts and image acquisition devices. Experimental results show that the proposed method outperforms the state-of-the-art on the classification of unseen categories in a target domain with sparsely labeled training data.

Journal article

Kainz B, Makropoulos A, Oppenheimer J, Deane C, Mischkewitz S, Al-Noor F, Rawdin AC, Stevenson MD, Mandegaran R, Heinrich MP, Curry Net al., 2021, Non-invasive Diagnosis of Deep Vein Thrombosis from Ultrasound with Machine Learning, Publisher: Cold Spring Harbor Laboratory

<jats:title>Abstract</jats:title><jats:p>Deep Vein Thrombosis (DVT) is a blood clot most found in the leg, which can lead to fatal pulmonary embolism (PE). Compression ultrasound of the legs is the diagnostic gold standard, leading to a definitive diagnosis. However, many patients with possible symptoms are not found to have a DVT, resulting in long referral waiting times for patients and a large clinical burden for specialists. Thus, diagnosis at the point of care by non-specialists is desired.</jats:p><jats:p>We collect images in a pre-clinical study and investigate a deep learning approach for the automatic interpretation of compression ultrasound images. Our method provides guidance for free-hand ultrasound and aids non-specialists in detecting DVT.</jats:p><jats:p>We train a deep learning algorithm on ultrasound videos from 246 healthy volunteers and evaluate on a sample size of 51 prospectively enrolled patients from an NHS DVT diagnostic clinic. 32 DVT-positive patients and 19 DVT-negative patients were included. Algorithmic DVT diagnosis results in a sensitivity of 93.8% and a specificity of 84.2%, a positive predictive value of 90.9%, and a negative predictive value of 88.9% compared to the clinical gold standard.</jats:p><jats:p>To assess the potential benefits of this technology in healthcare we evaluate the entire clinical DVT decision algorithm and provide cost analysis when integrating our approach into a diagnostic pathway for DVT. Our approach is estimated to be cost effective at up to $150 per software examination, assuming a willingness to pay $26 000/QALY.</jats:p>

Abstract
Cite

Working paper

Budd S, Day T, Simpson J, Lloyd K, Matthew J, Skelton E, Razavi R, Kainz Bet al., 2021, Can Non-specialists Provide High Quality Gold Standard Labels in Challenging Modalities?, Editors: Albarqouni, Cardoso, Dou, Kamnitsas, Khanal, Rekik, Rieke, Sheet, Tsaftaris, Xu, Xu, Publisher: SPRINGER INTERNATIONAL PUBLISHING AG, Pages: 251-262, ISBN: 978-3-030-87721-7

Author Web Link
Cite
Citations: 6

Book chapter

Miolane N, Guigui N, Le Brigant A, Mathe J, Hou B, Thanwerdas Y, Heyder S, Peltre O, Koep N, Zaatiti H, Hajri H, Cabanes Y, Gerald T, Chauchat P, Shewmake C, Brooks D, Kainz B, Donnat C, Holmes S, Pennec Xet al., 2020, Geomstats: a python package for riemannian geometry in machine learning, Journal of Machine Learning Research, Vol: 21, Pages: 1-9, ISSN: 1532-4435

We introduce Geomstats, an open-source Python package for computations and statistics on nonlinear manifolds such as hyperbolic spaces, spaces of symmetric positive definite matrices, Lie groups of transformations, and many more. We provide object-oriented and extensively unit-tested implementations. Manifolds come equipped with families of Riemannian metrics with associated exponential and logarithmic maps, geodesics, and parallel transport. Statistics and learning algorithms provide methods for estimation, clustering, and dimension reduction on manifolds. All associated operations are vectorized for batch computation and provide support for different execution backends-namely NumPy, PyTorch, and TensorFlow. This paper presents the package, compares it with related libraries, and provides relevant code examples. We show that Geomstats provides reliable building blocks to both foster research in differential geometry and statistics and democratize the use of Riemannian geometry in machine learning applications. The source code is freely available under the MIT license at geomstats.ai.

Journal article

Hou B, Vlontzos A, Alansary A, Rueckert D, Kainz Bet al., 2020, Flexible conditional image generation of missing data with learned mental maps, Machine Learning for Medical Image Reconstruction: Second International Workshop, Publisher: Springer International Publishing, Pages: 139-150, ISSN: 0302-9743

Real-world settings often do not allow acquisition of high-resolution volumetric images for accurate morphological assessment and diagnostic. In clinical practice it is frequently common to acquire only sparse data (e.g. individual slices) for initial diagnostic decision making. Thereby, physicians rely on their prior knowledge (or mental maps) of the human anatomy to extrapolate the underlying 3D information. Accurate mental maps require years of anatomy training, which in the first instance relies on normative learning, i.e. excluding pathology. In this paper, we leverage Bayesian Deep Learning and environment mapping to generate full volumetric anatomy representations from none to a small, sparse set of slices. We evaluate proof of concept implementations based on Generative Query Networks (GQN) and Conditional BRUNO using abdominal CT and brain MRI as well as in a clinical application involving sparse, motion-corrupted MR acquisition for fetal imaging. Our approach allows to reconstruct 3D volumes from 1 to 4 tomographic slices, with a SSIM of 0.7+ and cross-correlation of 0.8+ compared to the 3D ground truth.

Conference paper

Chotzoglou E, Kainz B, 2020, Exploring the Relationship Between Segmentation Uncertainty, Segmentation Performance and Inter-observer Variability with Probabilistic Networks, Large-Scale Annotation of Biomedical Data and Expert Label Synthesis and Hardware Aware Learning for Medical Imaging and Computer Assisted Intervention, Publisher: Springer International Publishing, Pages: 51-60, ISSN: 0302-9743

Cite

Conference paper

Holland R, Patel U, Lung P, Chotzoglou E, Kainz Bet al., 2020, Automatic detection of bowel disease with residual networks, International Workshop on PRedictive Intelligence In MEdicine, Publisher: Springer International Publishing, Pages: 151-159, ISSN: 0302-9743

Crohn’s disease, one of two inflammatory bowel diseases (IBD), affects 200,000 people in the UK alone, or roughly one in every 500. We explore the feasibility of deep learning algorithms for identification of terminal ileal Crohn’s disease in Magnetic Resonance Enterography images on a small dataset. We show that they provide comparable performance to the current clinical standard, the MaRIA score, while requiring only a fraction of the preparation and inference time. Moreover, bowels are subject to high variation between individuals due to the complex and free-moving anatomy. Thus we also explore the effect of difficulty of the classification at hand on performance. Finally, we employ soft attention mechanisms to amplify salient local features and add interpretability.

Conference paper

Grzech D, Kainz B, Glocker B, le Folgoc Let al., 2020, Image registration via stochastic gradient markov chain monte carlo, Second International Workshop, UNSURE 2020, and Third International Workshop, GRAIL 2020, Held in Conjunction with MICCAI 2020, Publisher: Springer International Publishing, Pages: 3-12, ISSN: 0302-9743

We develop a fully Bayesian framework for non-rigid registration of three-dimensional medical images, with a focus on uncertainty quantification. Probabilistic registration of large images along with calibrated uncertainty estimates is difficult for both computational and modelling reasons. To address the computational issues, we explore connections between the Markov chain Monte Carlo by backprop and the variational inference by backprop frameworks in order to efficiently draw thousands of samples from the posterior distribution. Regarding the modelling issues, we carefully design a Bayesian model for registration to overcome the existing barriers when using a dense, high-dimensional, and diffeomorphic parameterisation of the transformation. This results in improved calibration of uncertainty estimates.

Conference paper

Hinterreiter A, Streit M, Kainz B, 2020, Projective latent interventions for understanding and fine-tuning classifiers, 5th International Workshop, LABELS 2020, Publisher: Springer International Publishing, Pages: 13-22, ISSN: 0302-9743

High-dimensional latent representations learned by neural network classifiers are notoriously hard to interpret. Especially in medical applications, model developers and domain experts desire a better understanding of how these latent representations relate to the resulting classification performance. We present Projective Latent Interventions (PLIs), a technique for retraining classifiers by back-propagating manual changes made to low-dimensional embeddings of the latent space. The back-propagation is based on parametric approximations of t -distributed stochastic neighbourhood embeddings. PLIs allow domain experts to control the latent decision space in an intuitive way in order to better match their expectations. For instance, the performance for specific pairs of classes can be enhanced by manually separating the class clusters in the embedding. We evaluate our technique on a real-world scenario in fetal ultrasound imaging.

Conference paper

Meng Q, Rueckert D, Kainz B, 2020, Unsupervised cross-domain image classification by distance metric guided feature alignment, ASMUS 2020, PIPPI 2020: Medical Ultrasound, and Preterm, Perinatal and Paediatric Image Analysis, Publisher: Springer International Publishing, Pages: 146-157, ISSN: 0302-9743

Learning deep neural networks that are generalizable across different domains remains a challenge due to the problem of domain shift. Unsupervised domain adaptation is a promising avenue which transfers knowledge from a source domain to a target domain without using any labels in the target domain. Contemporary techniques focus on extracting domain-invariant features using domain adversarial training. However, these techniques neglect to learn discriminative class boundaries in the latent representation space on a target domain and yield limited adaptation performance. To address this problem, we propose distance metric guided feature alignment (MetFA) to extract discriminative as well as domain-invariant features on both source and target domains. The proposed MetFA method explicitly and directly learns the latent representation without using domain adversarial training. Our model integrates class distribution alignment to transfer semantic knowledge from a source domain to a target domain. We evaluate the proposed method on fetal ultrasound datasets for cross-device image classification. Experimental results demonstrate that the proposed method outperforms the state-of-the-art and enables model generalization.

Conference paper

Liu T, Meng Q, Vlontzos A, Tan J, Rueckert D, Kainz Bet al., 2020, Ultrasound video summarization using deep reinforcement learning, 23rd INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING & COMPUTER ASSISTED INTERVENTION, Publisher: Springer International Publishing, Pages: 483-492, ISSN: 0302-9743

Video is an essential imaging modality for diagnostics, e.g. in ultrasound imaging, for endoscopy, or movement assessment. However, video hasn’t received a lot of attention in the medical image analysis community. In the clinical practice, it is challenging to utilise raw diagnostic video data efficiently as video data takes a long time to process, annotate or audit. In this paper we introduce a novel, fully automatic video summarization method that is tailored to the needs of medical video data. Our approach is framed as reinforcement learning problem and produces agents focusing on the preservation of important diagnostic information. We evaluate our method on videos from fetal ultrasound screening, where commonly only a small amount of the recorded data is used diagnostically. We show that our method is superior to alternative video summarization methods and that it preserves essential information required by clinical diagnostic standards.

Conference paper

Tan J, Au A, Meng Q, FinesilverSmith S, Simpson J, Rueckert D, Razavi R, Day T, Lloyd D, Kainz Bet al., 2020, Automated detection of congenital heart disease in fetal ultrasound screening, ASMUS 2020, PIPPI 2020: Medical Ultrasound, and Preterm, Perinatal and Paediatric Image Analysis, Publisher: Springer, Pages: 243-252, ISSN: 0302-9743

Prenatal screening with ultrasound can lower neonatal mortality significantly for selected cardiac abnormalities. However, the need for human expertise, coupled with the high volume of screening cases, limits the practically achievable detection rates. In this paper we discuss the potential for deep learning techniques to aid in the detection of congenital heart disease (CHD) in fetal ultrasound. We propose a pipeline for automated data curation and classification. During both training and inference, we exploit an auxiliary view classification task to bias features toward relevant cardiac structures. This bias helps to improve in F1-scores from 0.72 and 0.77 to 0.87 and 0.85 for healthy and CHD classes respectively.

Conference paper

Vlontzos A, Budd S, Hou B, Rueckert D, Kainz Bet al., 2020, 3D probabilistic segmentation and volumetry from 2D projection images, Thoracic Image Analysis, Publisher: Springer, Pages: 48-57, ISSN: 0302-9743

X-Ray imaging is quick, cheap and useful for front-line care assessment and intra-operative real-time imaging (e.g., C-Arm Fluoroscopy). However, it suffers from projective information loss and lacks vital volumetric information on which many essential diagnostic biomarkers are based on. In this paper we explore probabilistic methods to reconstruct 3D volumetric images from 2D imaging modalities and measure the models’ performance and confidence. We show our models’ performance on large connected structures and we test for limitations regarding fine structures and image domain sensitivity. We utilize fast end-to-end training of a 2D-3D convolutional networks, evaluate our method on 117 CT scans segmenting 3D structures from digitally reconstructed radiographs (DRRs) with a Dice score of 0.91±0.0013. Source code will be made available by the time of the conference.

Conference paper

Tan J, Kainz B, 2020, Divergent search for image classification behaviors, Pages: 91-92

When data is unlabelled and the target task is not known a priori, divergent search offers a strategy for learning a wide range of skills. Having such a repertoire allows a system to adapt to new, unforeseen tasks. Unlabelled image data is plentiful, but it is not always known which features will be required for downstream tasks. We propose a method for divergent search in the few-shot image classification setting and evaluate with Omniglot and Mini-ImageNet. This high-dimensional behavior space includes all possible ways of partitioning the data. To manage divergent search in this space, we rely on a meta-learning framework to integrate useful features from diverse tasks into a single model. The final layer of this model is used as an index into the 'archive' of all past behaviors. We search for regions in the behavior space that the current archive cannot reach. As expected, divergent search is outperformed by models with a strong bias toward the evaluation tasks. But it is able to match and sometimes exceed the performance of models that have a weak bias toward the target task or none at all. This demonstrates that divergent search is a viable approach, even in high-dimensional behavior spaces.

Abstract
Cite

Conference paper

Meng Q, Zimmer V, Hou B, Rajchl M, Toussaint N, Oktay O, Schlemper J, Gomez A, Housden J, Matthew J, Rueckert D, Schnabel JA, Kainz Bet al., 2019, Weakly supervised estimation of shadow confidence maps in fetal ultrasound imaging, IEEE Transactions on Medical Imaging, Vol: 38, Pages: 2755-2767, ISSN: 0278-0062

Detecting acoustic shadows in ultrasound images is important in many clinical and engineering applications. Real-time feedback of acoustic shadows can guide sonographers to a standardized diagnostic viewing plane with minimal artifacts and can provide additional information for other automatic image analysis algorithms. However, automatically detecting shadow regions using learning-based algorithms is challenging because pixel-wise ground truth annotation of acoustic shadows is subjective and time consuming. In this paper we propose a weakly supervised method for automatic confidence estimation of acoustic shadow regions. Our method is able to generate a dense shadow-focused confidence map. In our method, a shadow-seg module is built to learn general shadow features for shadow segmentation, based on global image-level annotations as well as a small number of coarse pixel-wise shadow annotations. A transfer function is introduced to extend the obtained binary shadow segmentation to a reference confidence map. Additionally, a confidence estimation network is proposed to learn the mapping between input images and the reference confidence maps. This network is able to predict shadow confidence maps directly from input images during inference. We use evaluation metrics such as DICE, inter-class correlation and etc. to verify the effectiveness of our method. Our method is more consistent than human annotation, and outperforms the state-of-the-art quantitatively in shadow segmentation and qualitatively in confidence estimation of shadow regions. We further demonstrate the applicability of our method by integrating shadow confidence maps into tasks such as ultrasound image classification, multi-view image fusion and automated biometric measurements.

Journal article

Schultz T, Puig A, Kainz B, 2019, Foreword to the special section on the Eurographics Workshop on Visual Computing for Biology and Medicine (VCBM) at Medical Image Computing and Computer Assisted Intervention (MICCAI) 2018, COMPUTERS & GRAPHICS-UK, Vol: 83, Pages: A5-A6, ISSN: 0097-8493

Journal article

Budd S, Sinclair M, Khanal B, Matthew J, Lloyd D, Gomez A, Toussaint N, Robinson EC, Kainz Bet al., 2019, Confident head circumference measurement from ultrasound with real-time feedback for sonographers, 10th International Workshop on Machine Learning in Medical Imaging (MLMI) / 22nd International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Publisher: Springer International Publishing AG, Pages: 683-691, ISSN: 0302-9743

Manual estimation of fetal Head Circumference (HC) from Ultrasound (US) is a key biometric for monitoring the healthy development of fetuses. Unfortunately, such measurements are subject to large inter-observer variability, resulting in low early-detection rates of fetal abnormalities. To address this issue, we propose a novel probabilistic Deep Learning approach for real-time automated estimation of fetal HC. This system feeds back statistics on measurement robustness to inform users how confident a deep neural network is in evaluating suitable views acquired during free-hand ultrasound examination. In real-time scenarios, this approach may be exploited to guide operators to scan planes that are as close as possible to the underlying distribution of training images, for the purpose of improving inter-operator consistency. We train on freehand ultrasound data from over 2000 subjects (2848 training/540 test) and show that our method is able to predict HC measurements within 1.81±1.65 mm deviation from the ground truth, with 50% of the test images fully contained within the predicted confidence margins, and an average of 1.82±1.78 mm deviation from the margin for the remaining cases that are not fully contained.

Conference paper

Wright R, Toussaint N, Gomez A, Zimmer V, Khanal B, Matthew J, Skelton E, Kainz B, Rueckert D, Hajnal JV, Schnabel JAet al., 2019, Complete Fetal Head Compounding from Multi-view 3D Ultrasound, 10th International Workshop on Machine Learning in Medical Imaging (MLMI) / 22nd International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Publisher: SPRINGER INTERNATIONAL PUBLISHING AG, Pages: 384-392, ISSN: 0302-9743

Conference paper

Vlontzos A, Alansary A, Kamnitsas K, Rueckert D, Kainz Bet al., 2019, Multiple landmark detection using multi-agent reinforcement learning, 10th International Workshop on Machine Learning in Medical Imaging (MLMI) / 22nd International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Publisher: Springer International Publishing AG, Pages: 262-270, ISSN: 0302-9743

The detection of anatomical landmarks is a vital step for medical image analysis and applications for diagnosis, interpretation and guidance. Manual annotation of landmarks is a tedious process that requires domain-specific expertise and introduces inter-observer variability. This paper proposes a new detection approach for multiple landmarks based on multi-agent reinforcement learning. Our hypothesis is that the position of all anatomical landmarks is interdependent and non-random within the human anatomy, thus finding one landmark can help to deduce the location of others. Using a Deep Q-Network (DQN) architecture we construct an environment and agent with implicit inter-communication such that we can accommodate K agents acting and learning simultaneously, while they attempt to detect K different landmarks. During training the agents collaborate by sharing their accumulated knowledge for a collective gain. We compare our approach with state-of-the-art architectures and achieve significantly better accuracy by reducing the detection error by 50%, while requiring fewer computational resources and time to train compared to the naïve approach of training K agents separately. Code and visualizations available: https://github.com/thanosvlo/MARL-for-Anatomical-Landmark-Detection

Conference paper

Castro DC, Tan J, Kainz B, Konukoglu E, Glocker Bet al., 2019, Morpho-MNIST: quantitative assessment and diagnostics for representation learning, Journal of Machine Learning Research, Vol: 20, Pages: 1-29, ISSN: 1532-4435

Revealing latent structure in data is an active field of research, havingintroduced exciting technologies such as variational autoencoders andadversarial networks, and is essential to push machine learning towardsunsupervised knowledge discovery. However, a major challenge is the lack ofsuitable benchmarks for an objective and quantitative evaluation of learnedrepresentations. To address this issue we introduce Morpho-MNIST, a frameworkthat aims to answer: "to what extent has my model learned to represent specificfactors of variation in the data?" We extend the popular MNIST dataset byadding a morphometric analysis enabling quantitative comparison of trainedmodels, identification of the roles of latent variables, and characterisationof sample diversity. We further propose a set of quantifiable perturbations toassess the performance of unsupervised and supervised methods on challengingtasks such as outlier detection and domain adaptation. Data and code areavailable at https://github.com/dccastro/Morpho-MNIST.

Journal article

Meng Q, Pawlowski N, Rueckert D, Kainz Bet al., 2019, Representation disentanglement for multi-task learning with application to fetal ultrasound, Smart Ultrasound Imaging and Perinatal, Preterm and Paediatric Image Analysis, Publisher: Springer International Publishing, Pages: 47-55, ISSN: 0302-9743

One of the biggest challenges for deep learning algorithms in medical image analysis is the indiscriminate mixing of image properties, e.g. artifacts and anatomy. These entangled image properties lead to a semantically redundant feature encoding for the relevant task and thus lead to poor generalization of deep learning algorithms. In this paper we propose a novel representation disentanglement method to extract semantically meaningful and generalizable features for different tasks within a multi-task learning framework. Deep neural networks are utilized to ensure that the encoded features are maximally informative with respect to relevant tasks, while an adversarial regularization encourages these features to be disentangled and minimally informative about irrelevant tasks. We aim to use the disentangled representations to generalize the applicability of deep neural networks. We demonstrate the advantages of the proposed method on synthetic data as well as fetal ultrasound images. Our experiments illustrate that our method is capable of learning disentangled internal representations. It outperforms baseline methods in multiple tasks, especially on images with new properties, e.g. previously unseen artifacts in fetal ultrasound.

Conference paper

Tan J, Au A, Meng Q, Kainz Bet al., 2019, Semi-supervised learning of fetal anatomy from ultrasound, Domain Adaptation and Representation Transfer and Medical Image Learning with Less Labels and Imperfect Data 2019, Publisher: Springer Verlag, Pages: 157-164, ISSN: 0302-9743

Semi-supervised learning methods have achieved excellent performance on standard benchmark datasets using very few labelled images. Anatomy classification in fetal 2D ultrasound is an ideal problem setting to test whether these results translate to non-ideal data. Our results indicate that inclusion of a challenging background class can be detrimental and that semi-supervised learning mostly benefits classes that are already distinct, sometimes at the expense of more similar classes.

Conference paper

Meng Q, Sinclair M, Zimmer V, Hou B, Rajchl M, Toussaint N, Oktay O, Schlemper J, Gomez A, Housden J, Matthew J, Rueckert D, Schnabel JA, Kainz Bet al., 2019, Weakly supervised estimation of shadow confidence maps in fetal ultrasound imaging., IEEE Transactions on Medical Imaging, Pages: 1-13, ISSN: 0278-0062

Detecting acoustic shadows in ultrasound images is important in many clinical and engineering applications. Real-time feedback of acoustic shadows can guide sonographers to a standardized diagnostic viewing plane with minimal artifacts and can provide additional information for other automatic image analysis algorithms. However, automatically detecting shadow regions using learning-based algorithms is challenging because pixel-wise ground truth annotation of acoustic shadows is subjective and time consuming. In this paper we propose a weakly supervised method for automatic confidence estimation of acoustic shadow regions. Our method is able to generate a dense shadow-focused confidence map. In our method, a shadow-seg module is built to learn general shadow features for shadow segmentation, based on global image-level annotations as well as a small number of coarse pixel-wise shadow annotations. A transfer function is introduced to extend the obtained binary shadow segmentation to a reference confidence map. Additionally, a confidence estimation network is proposed to learn the mapping between input images and the reference confidence maps. This network is able to predict shadow confidence maps directly from input images during inference. We use evaluation metrics such as DICE, inter-class correlation and etc. to verify the effectiveness of our method. Our method is more consistent than human annotation, and outperforms the state-of-the-art quantitatively in shadow segmentation and qualitatively in confidence estimation of shadow regions. We further demonstrate the applicability of our method by integrating shadow confidence maps into tasks such as ultrasound image classification, multi-view image fusion and automated biometric measurements.

Journal article

DR BERNHARD KAINZ

Contact

Location

Summary