Imperial College London


Faculty of EngineeringDepartment of Computing

Reader in Medical Image Computing



+44 (0)20 7594 8349b.kainz Website CV




372Huxley BuildingSouth Kensington Campus





Publication Type

161 results found

Grzech D, Azampour MF, Qiu H, Glocker B, Kainz B, Folgoc LLet al., 2022, Uncertainty quantification in non-rigid image registration via stochastic gradient Markov chain Monte Carlo, Publisher: ArXiv

We develop a new Bayesian model for non-rigid registration ofthree-dimensional medical images, with a focus on uncertainty quantification.Probabilistic registration of large images with calibrated uncertaintyestimates is difficult for both computational and modelling reasons. To addressthe computational issues, we explore connections between the Markov chain MonteCarlo by backpropagation and the variational inference by backpropagationframeworks, in order to efficiently draw samples from the posteriordistribution of transformation parameters. To address the modelling issues, weformulate a Bayesian model for image registration that overcomes the existingbarriers when using a dense, high-dimensional, and diffeomorphic transformationparametrisation. This results in improved calibration of uncertainty estimates.We compare the model in terms of both image registration accuracy anduncertainty quantification to VoxelMorph, a state-of-the-art image registrationmodel based on deep learning.

Working paper

Zimmerer D, Full PM, Isensee F, Jager P, Adler T, Petersen J, Kohler G, Ross T, Reinke A, Kascenas A, Jensen BS, O'Neil AQ, Tan J, Hou B, Batten J, Qiu H, Kainz B, Shvetsova N, Fedulova I, Dylov DV, Yu B, Zhai J, Hu J, Si R, Zhou S, Wang S, Li X, Chen X, Zhao Y, Marimont SN, Tarroni G, Saase V, Maier-Hein L, Maier-Hein Ket al., 2022, MOOD 2020: A public benchmark for out-of-distribution detection and localization on medical images., IEEE Transactions on Medical Imaging, ISSN: 0278-0062

Detecting Out-of-Distribution (OoD) data is one of the greatest challenges in safe and robust deployment of machine learning algorithms in medicine. When the algorithms encounter cases that deviate from the distribution of the training data, they often produce incorrect and over-confident predictions. OoD detection algorithms aim to catch erroneous predictions in advance by analysing the data distribution and detecting potential instances of failure. Moreover, flagging OoD cases may support human readers in identifying incidental findings. Due to the increased interest in OoD algorithms, benchmarks for different domains have recently been established. In the medical imaging domain, for which reliable predictions are often essential, an open benchmark has been missing. We introduce the Medical-Out-Of-Distribution-Analysis-Challenge (MOOD) as an open, fair, and unbiased benchmark for OoD methods in the medical imaging domain. The analysis of the submitted algorithms shows that performance has a strong positive correlation with the perceived difficulty, and that all algorithms show a high variance for different anomalies, making it yet hard to recommend them for clinical practice. We also see a strong correlation between challenge ranking and performance on a simple toy test set, indicating that this might be a valuable addition as a proxy dataset during anomaly detection algorithm development.

Journal article

Dou Q, So TY, Jiang M, Liu Q, Vardhanabhuti V, Kaissis G, Li Z, Si W, Lee HHC, Yu K, Feng Z, Dong L, Burian E, Jungmann F, Braren R, Makowski M, Kainz B, Rueckert D, Glocker B, Yu SCH, Heng PAet al., 2022, Author Correction: Federated deep learning for detecting COVID-19 lung abnormalities in CT: a privacy-preserving multinational validation study, npj Digital Medicine, Vol: 5, ISSN: 2398-6352

Correction to: npj Digital Medicine, published online 29 March 2021

Journal article

Tan J, Hou B, Batten J, Qiu H, Kainz Bet al., 2022, Detecting outliers with foreign patch interpolation, Journal of Machine Learning for Biomedical Imaging, Vol: 2022, Pages: 1-27, ISSN: 2766-905X

In medical imaging, outliers can contain hypo/hyper-intensities, minor deformations, or completely altered anatomy. To detect these irregularities it is helpful to learn the features present in both normal and abnormal images. However this is difficult because of the wide range of possible abnormalities and also the number of ways that normal anatomy can vary naturally. As such, we leverage the natural variations in normal anatomy to create a range of synthetic abnormalities. Specifically, the same patch region is extracted from two independent samples and replaced with an interpolation between both patches. The interpolation factor, patch size, and patch location are randomly sampled from uniform distributions. A wide residual encoder decoder is trained to give a pixel-wise prediction of the patch and its interpolation factor. This encourages the network to learn what features to expect normally and to identify where foreign patterns have been introduced. The estimate of the interpolation factor lends itself nicely to the derivation of an outlier score. Meanwhile the pixel-wise output allows for pixel- and subject- level predictions using the same model.Our code is available at

Journal article

Tan J, Kart T, Hou B, Batten J, Kainz Bet al., 2022, MetaDetector: Detecting outliers by learning to learn from self-supervision, Biomedical Image Registration, Domain Generalisation and Out-of-Distribution Analysis, Publisher: Springer, Pages: 119-126, ISSN: 0302-9743

Using self-supervision in anomaly detection can increase sensitivity to subtle irregularities. However, increasing sensitivity to certain classes of outliers could result in decreased sensitivity to other types. While a single model may have limited coverage, an adaptive method could help detect a broader range of outliers. Our proposed method explores whether meta learning can increase the adaptability of self-supervised methods. Meta learning is often employed in few-shot settings with labelled examples. To use it for anomaly detection, where labelled support data is usually not available, we instead construct a self-supervised task using the test input itself and reference samples from the normal training data. Specifically, patches from the test image are introduced into normal reference images. This forms the basis of the few-shot task. During training, the same few-shot process is used, but the test/query image is substituted with a normal training image that contains a synthetic irregularity. Meta learning is then used to learn how to learn from the few-shot task by computing second order gradients. Given the importance of screening applications, e.g. in healthcare or security, any adaptability in the method must be counterbalanced with robustness. As such, we add strong regularization by i) restricting meta learning to only layers near the bottleneck of our encoder-decoder architecture and ii) computing the loss at multiple points during the few-shot process.

Conference paper

Liu T, Meng Q, Huang J-J, Vlontzos A, Rueckert D, Kainz Bet al., 2022, Video summarization through reinforcement learning with a 3D spatio-temporal U-Net, IEEE Transactions on Image Processing, Vol: 31, Pages: 1573-1586, ISSN: 1057-7149

Intelligent video summarization algorithms allow to quickly convey the most relevant information in videos through the identification of the most essential and explanatory content while removing redundant video frames. In this paper, we introduce the 3DST-UNet-RL framework for video summarization. A 3D spatio-temporal U-Net is used to efficiently encode spatio-temporal information of the input videos for downstream reinforcement learning (RL). An RL agent learns from spatio-temporal latent scores and predicts actions for keeping or rejecting a video frame in a video summary. We investigate if real/inflated 3D spatio-temporal CNN features are better suited to learn representations from videos than commonly used 2D image features. Our framework can operate in both, a fully unsupervised mode and a supervised training mode. We analyse the impact of prescribed summary lengths and show experimental evidence for the effectiveness of 3DST-UNet-RL on two commonly used general video summarization benchmarks. We also applied our method on a medical video summarization task. The proposed video summarization method has the potential to save storage costs of ultrasound screening videos as well as to increase efficiency when browsing patient video data during retrospective analysis or audit without loosing essential information.

Journal article

Gomez A, Zimmer VA, Wheeler G, Toussaint N, Deng S, Wright R, Skelton E, Matthew J, Kainz B, Hajnal J, Schnabel Jet al., 2022, PRETUS: A plug-in based platform for real-time ultrasound imaging research, SoftwareX, Vol: 17, ISSN: 2352-7110

We present PRETUS — a Plugin-based Real Time UltraSound software platform for live ultrasound image analysis and operator support. The software is lightweight; functionality is brought in via independent plug-ins that can be arranged in sequence. The software allows to capture the real-time stream of ultrasound images from virtually any ultrasound machine, applies computational methods and visualizes the results on-the-fly.Plug-ins can run concurrently without blocking each other. They can be implemented in C++ and Python. A graphical user interface can be implemented for each plug-in, and presented to the user in a compact way. The software is free and open source, and allows for rapid prototyping and testing of real-time ultrasound imaging methods in a manufacturer-agnostic fashion. The software is provided with input, output and processing plug-ins, as well as with tutorials to illustrate how to develop new plug-ins for PRETUS.

Journal article

Schmidtke L, Vlontzos A, Ellershaw S, Lukens A, Arichi T, Kainz Bet al., 2021, Unsupervised human pose estimation through transforming shape templates, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Publisher: IEEE Computer Society, Pages: 2484-2494, ISSN: 1063-6919

Human pose estimation is a major computer vision problem with applications ranging from augmented reality and video capture to surveillance and movement tracking. In the medical context, the latter may be an important biomarker for neurological impairments in infants. Whilst many methods exist, their application has been limited by the need for well annotated large datasets and the inability to gen-eralize to humans of different shapes and body compositions, e.g. children and infants. In this paper we present a novel method for learning pose estimators for human adults and infants in an unsupervised fashion. We approach this as a learnable template matching problem facilitated by deep feature extractors. Human-interpretable landmarks are estimated by transforming a template consisting of predefined body parts that are characterized by 2D Gaussian distributions. Enforcing a connectivity prior guides our model to meaningful human shape representations. We demonstrate the effectiveness of our approach on two different datasets including adults and infants. Project page:

Conference paper

Matthew J, Skelton E, Day TG, Zimmer VA, Gomez A, Wheeler G, Toussaint N, Liu T, Budd S, Lloyd K, Wright R, Deng S, Ghavami N, Sinclair M, Meng Q, Kainz B, Schnabel JA, Rueckert D, Razavi R, Simpson J, Hajnal Jet al., 2021, Exploring a new paradigm for the fetal anomaly ultrasound scan: Artificial intelligence in real time, Prenatal Diagnosis, Vol: 42, Pages: 49-59, ISSN: 0197-3851

ObjectiveAdvances in artificial intelligence (AI) have demonstrated potential to improve medical diagnosis. We piloted the end-to-end automation of the mid-trimester screening ultrasound scan using AI-enabled tools.MethodsA prospective method comparison study was conducted. Participants had both standard and AI-assisted US scans performed. The AI tools automated image acquisition, biometric measurement, and report production. A feedback survey captured the sonographers' perceptions of scanning.ResultsTwenty-three subjects were studied. The average time saving per scan was 7.62 min (34.7%) with the AI-assisted method (p < 0.0001). There was no difference in reporting time. There were no clinically significant differences in biometric measurements between the two methods. The AI tools saved a satisfactory view in 93% of the cases (four core views only), and 73% for the full 13 views, compared to 98% for both using the manual scan. Survey responses suggest that the AI tools helped sonographers to concentrate on image interpretation by removing disruptive tasks.ConclusionSeparating freehand scanning from image capture and measurement resulted in a faster scan and altered workflow. Removing repetitive tasks may allow more attention to be directed identifying fetal malformation. Further work is required to improve the image plane detection algorithm for use in real time.

Journal article

Budd S, Patkee P, Baburamani A, Rutherford M, Robinson EC, Kainz Bet al., 2021, Surface agnostic metrics for cortical volume segmentation and regression, The 3rd Workshop on Machine Learning in Clinical Neuroimaging, Publisher: Springer, Pages: 3-12, ISSN: 0302-9743

The cerebral cortex performs higher-order brain functions and is thus implicated in a range of cognitive disorders. Current analysis of cortical variation is typically performed by fitting surface mesh models to inner and outer cortical boundaries and investigating metrics such as surface area and cortical curvature or thickness. These, however, take a long time to run, and are sensitive to motion and image and surface resolution, which can prohibit their use in clinical settings. In this paper, we instead propose a machine learning solution, training a novel architecture to predict cortical thickness and curvature metrics from T2 MRI images, while additionally returning metrics of prediction uncertainty. Our proposed model is tested on a clinical cohort (Down Syndrome) for which surface-based modelling often fails. Results suggest that deep convolutional neural networks are a viable option to predict cortical metrics across a range of brain development stages and pathologies.

Conference paper

Reynaud H, Vlontzos A, Hou B, Beqiri A, Leeson P, Kainz Bet al., 2021, Ultrasound video transformers for cardiac ejection fraction estimation, 24th International Conference on Medical Image Computing and Computer Assisted Intervention, Publisher: Springer, Pages: 495-505, ISSN: 0302-9743

Cardiac ultrasound imaging is used to diagnose various heart diseases. Common analysis pipelines involve manual processing of the video frames by expert clinicians. This suffers from intra- and inter-observer variability. We propose a novel approach to ultrasound video analysis using a transformer architecture based on a Residual Auto-Encoder Network and a BERT model adapted for token classification. This enables videos of any length to be processed. We apply our model to the task of End-Systolic (ES) and End-Diastolic (ED) frame detection and the automated computation of the left ventricular ejection fraction. We achieve an average frame distance of 3.36 frames for the ES and 7.17 frames for the ED on videos of arbitrary length. Our end-to-end learnable approach can estimate the ejection fraction with a MAE of 5.95 and R2 of 0.52 in 0.15 s per video, showing that segmentation is not the only way to predict ejection fraction. Code and models are available at

Conference paper

Hou B, Kaissis G, Summers RM, Kainz Bet al., 2021, RATCHET: Medical transformer for chest X-ray diagnosis and reporting, 24th International Conference on Medical Image Computing and Computer Assisted Intervention, Publisher: Springer, Pages: 293-303, ISSN: 0302-9743

Chest radiographs are one of the most common diagnostic modalities in clinical routine. It can be done cheaply, requires minimal equipment, and the image can be diagnosed by every radiologists. However, the number of chest radiographs obtained on a daily basis can easily overwhelm the available clinical capacities. We propose RATCHET: RAdiological Text Captioning for Human Examined Thoraces. RATCHET is a CNN-RNN-based medical transformer that is trained end-to-end. It is capable of extracting image features from chest radiographs, and generates medically accurate text reports that fit seamlessly into clinical work flows. The model is evaluated for its natural language generation ability using common metrics from NLP literature, as well as its medically accuracy through a surrogate report classification task. The model is available for download at:

Conference paper

Budd S, Sinclair M, Day T, Vlontzos A, Tan J, Liu T, Matthew J, Skelton E, Simpson J, Razavi R, Glocker B, Rueckert D, Robinson EC, Kainz Bet al., 2021, Detecting hypo-plastic left heart syndrome in fetal ultrasound via disease-specific atlas maps, 24th International Conference on Medical Image Computing and Computer Assisted Intervention, Publisher: Springer, Pages: 207-217, ISSN: 0302-9743

Fetal ultrasound screening during pregnancy plays a vital role in the early detection of fetal malformations which have potential long-term health impacts. The level of skill required to diagnose such malformations from live ultrasound during examination is high and resources for screening are often limited. We present an interpretable, atlas-learning segmentation method for automatic diagnosis of Hypo-plastic Left Heart Syndrome (HLHS) from a single ‘4 Chamber Heart’ view image. We propose to extend the recently introduced Image-and-Spatial Transformer Networks (Atlas-ISTN) into a framework that enables sensitising atlas generation to disease. In this framework we can jointly learn image segmentation, registration, atlas construction and disease prediction while providing a maximum level of clinical interpretability compared to direct image classification methods. As a result our segmentation allows diagnoses competitive with expert-derived manual diagnosis and yields an AUC-ROC of 0.978 (1043 cases for training, 260 for validation and 325 for testing).

Conference paper

Ma Q, Robinson EC, Kainz B, Rueckert D, Alansary Aet al., 2021, PialNN: A fast deep learning framework for cortical pial surface reconstruction, 24th International Conference on Medical Image Computing and Computer Assisted Intervention, Publisher: Springer, Pages: 73-81, ISSN: 0302-9743

Traditional cortical surface reconstruction is time consuming and limited by the resolution of brain Magnetic Resonance Imaging (MRI). In this work, we introduce Pial Neural Network (PialNN), a 3D deep learning framework for pial surface reconstruction. PialNN is trained end-to-end to deform an initial white matter surface to a target pial surface by a sequence of learned deformation blocks. A local convolutional operation is incorporated in each block to capture the multi-scale MRI information of each vertex and its neighborhood. This is fast and memory-efficient, which allows reconstructing a pial surface mesh with 150k vertices in one second. The performance is evaluated on the Human Connectome Project (HCP) dataset including T1-weighted MRI scans of 300 subjects. The experimental results demonstrate that PialNN reduces the geometric error of the predicted pial surface by 30% compared to state-of-the-art deep learning approaches. The codes are publicly available at

Conference paper

Tan J, Hou B, Day T, Simpson J, Rueckert D, Kainz Bet al., 2021, Detecting outliers with poisson image interpolation, 24th International Conference on Medical Image Computing and Computer Assisted Intervention, Publisher: Springer, Pages: 581-591, ISSN: 0302-9743

Supervised learning of every possible pathology is unrealistic for many primary care applications like health screening. Image anomaly detection methods that learn normal appearance from only healthy data have shown promising results recently. We propose an alternative to image reconstruction-based and image embedding-based methods and propose a new self-supervised method to tackle pathological anomaly detection. Our approach originates in the foreign patch interpolation (FPI) strategy that has shown superior performance on brain MRI and abdominal CT data. We propose to use a better patch interpolation strategy, Poisson image interpolation (PII), which makes our method suitable for applications in challenging data regimes. PII outperforms state-of-the-art methods by a good margin when tested on surrogate tasks like identifying common lung anomalies in chest X-rays or hypo-plastic left heart syndrome in prenatal, fetal cardiac ultrasound images. Code available at

Conference paper

Li L, Sinclair M, Makropoulos A, Hajnal JV, David Edwards A, Kainz B, Rueckert D, Alansary Aet al., 2021, CAS-Net: Conditional atlas generation and brain segmentation for fetal MRI, 24th International Conference on Medical Image Computing and Computer Assisted Intervention, Publisher: Springer, Pages: 221-230, ISSN: 0302-9743

Fetal Magnetic Resonance Imaging (MRI) is used in prenatal diagnosis and to assess early brain development. Accurate segmentation of the different brain tissues is a vital step in several brain analysis tasks, such as cortical surface reconstruction and tissue thickness measurements. Fetal MRI scans, however, are prone to motion artifacts that can affect the correctness of both manual and automatic segmentation techniques. In this paper, we propose a novel network structure that can simultaneously generate conditional atlases and predict brain tissue segmentation, called CAS-Net. The conditional atlases provide anatomical priors that can constrain the segmentation connectivity, despite the heterogeneity of intensity values caused by motion or partial volume effects. The proposed method is trained and evaluated on 253 subjects from the developing Human Connectome Project (dHCP). The results demonstrate that the proposed method can generate conditional age-specific atlas with sharp boundary and shape variance. It also segment multi-category brain tissues for fetal MRI with a high overall Dice similarity coefficient (DSC) of 85.2% for the selected 9 tissue labels.

Conference paper

Chartsias A, Gao S, Mumith A, Oliveira J, Bhatia K, Kainz B, Beqiri Aet al., 2021, Contrastive learning for view classification of echocardiograms, 24th International Conference on Medical Image Computing and Computer Assisted Intervention, Publisher: Springer, Pages: 149-158, ISSN: 0302-9743

Analysis of cardiac ultrasound images is commonly performed in routine clinical practice for quantification of cardiac function. Its increasing automation frequently employs deep learning networks that are trained to predict disease or detect image features. However, such models are extremely data-hungry and training requires labelling of many thousands of images by experienced clinicians. Here we propose the use of contrastive learning to mitigate the labelling bottleneck. We train view classification models for imbalanced cardiac ultrasound datasets and show improved performance for views/classes for which minimal labelled data is available. Compared to a naïve baseline model, we achieve an improvement in F1 score of up to 26% in those views while maintaining state-of-the-art performance for the views with sufficiently many labelled training observations.

Conference paper

Kainz B, Makropoulos A, Oppenheimer J, Deane C, Mischkewitz S, Al-Noor F, Rawdin AC, Stevenson MD, Mandegaran R, Heinrich MP, Curry N, Sankar S, Ruttloff A, Klein-Weigel Pet al., 2021, Non-invasive diagnosis of deep vein thrombosis from ultrasound imaging with machine learning, npj Digital Medicine, Vol: 4, Pages: 1-15, ISSN: 2398-6352

Deep vein thrombosis (DVT) is a blood clot most commonly found in the leg, which can lead to fatal pulmonary embolism (PE). Compression ultrasound of the legs is the diagnostic gold standard, leading to a definitive diagnosis. However, many patients with possible symptoms are not found to have a DVT, resulting in long referral waiting times for patients and a large clinical burden for specialists. Thus, diagnosis at the point of care by non-specialists is desired. We collect images in a pre-clinical study and investigate a deep learning approach for the automatic interpretation of compression ultrasound images. Our method provides guidance for free-hand ultrasound and aids non-specialists in detecting DVT. We train a deep learning algorithm on ultrasound videos from 255 volunteers and evaluate on a sample size of 53 prospectively enrolled patients from an NHS DVT diagnostic clinic and 30 prospectively enrolled patients from a German DVT clinic. Algorithmic DVT diagnosis performance results in a sensitivity within a 95% CI range of (0.82, 0.94), specificity of (0.70, 0.82), a positive predictive value of (0.65, 0.89), and a negative predictive value of (0.99, 1.00) when compared to the clinical gold standard. To assess the potential benefits of this technology in healthcare we evaluate the entire clinical DVT decision algorithm and provide cost analysis when integrating our approach into diagnostic pathways for DVT. Our approach is estimated to generate a positive net monetary benefit at costs up to £72 to £175 per software-supported examination, assuming a willingness to pay of £20,000/QALY.

Journal article

Chotzoglou E, Day T, Tan J, Matthew J, Lloyd D, Razavi R, Simpson J, Kainz Bet al., 2021, Learning normal appearance for fetal anomaly screening: application to the unsupervised detection of Hypoplastic Left Heart Syndrome, Journal of Machine Learning for Biomedical Imaging, Vol: 2021, Pages: 1-25

Congenital heart disease is considered as one the most common groups of congenital malformations which affects 6 − 11 per 1000 newborns. In this work, an automated framework for detection of cardiac anomalies during ultrasound screening is proposed and evaluated on the example of Hypoplastic Left Heart Syndrome (HLHS), a sub-category of congenital heart disease. We propose an unsupervised approach that learns healthy anatomy exclusively from clinically confirmed normal control patients. We evaluate a number of known anomaly detection frameworks together with a new model architecture based on the α-GAN network and find evidence that the proposed model performs significantly better than the state-of-the-art in image-based anomaly detection, yielding average 0.81 AUC and a better robustness towards initialisation compared to previous works.

Journal article

Budd S, Robinson E, Kainz B, 2021, A survey on active learning and human-in-the-loop deep learning for medical image analysis, Medical Image Analysis, Vol: 71, ISSN: 1361-8415

Fully automatic deep learning has become the state-of-the-art technique for many tasks including image acquisition, analysis andinterpretation, and for the extraction of clinically useful information for computer-aided detection, diagnosis, treatment planning,intervention and therapy. However, the unique challenges posed by medical image analysis suggest that retaining a human end-user in any deep learning enabled system will be beneficial. In this review we investigate the role that humans might play in thedevelopment and deployment of deep learning enabled diagnostic applications and focus on techniques that will retain a significantinput from a human end user. Human-in-the-Loop computing is an area that we see as increasingly important in future research dueto the safety-critical nature of working in the medical domain. We evaluate four key areas that we consider vital for deep learningin the clinical practice: (1)Active Learningto choose the best data to annotate for optimal model performance; (2)Interaction withmodel outputs- using iterative feedback to steer models to optima for a given prediction and offering meaningful ways to interpretand respond to predictions; (3) Practical considerations- developing full scale applications and the key considerations that need tobe made before deployment; (4)Future Prospective and Unanswered Questions- knowledge gaps and related research fields thatwill benefit human-in-the-loop computing as they evolve. We offer our opinions on the most promising directions of research andhow various aspects of each area might be unified towards common goals.

Journal article

Day TG, Kainz B, Hajnal J, Razavi R, Simpson JMet al., 2021, Artificial intelligence, fetal echocardiography, and congenital heart disease, Prenatal Diagnosis, Vol: 41, ISSN: 0197-3851

There has been a recent explosion in the use of artificial intelligence (AI), which is now part of our everyday lives. Uptake in medicine has been more limited, although in several fields there have been encouraging results showing excellent performance when AI is used to assist in a well-defined medical task. Most of this work has been performed using retrospective data, and there have been few clinical trials published using prospective data. This review focuses on the potential uses of AI in the field of fetal cardiology. Ultrasound of the fetal heart is highly specific and sensitive in experienced hands, but despite this there is significant room for improvement in the rates of prenatal diagnosis of congenital heart disease in most countries. AI may be one way of improving this. Other potential applications in fetal cardiology include the provision of more accurate prognoses for individuals, and automatic quantification of various metrics including cardiac function. However, there are also ethical and governance concerns. These will need to be overcome before AI can be widely accepted in mainstream use. It is likely that a familiarity of the uses, and pitfalls, of AI will soon be mandatory for many healthcare professionals working in fetal cardiology.

Journal article

Skelton E, Matthew J, Li Y, Khanal B, Martinez JJC, Toussaint N, Gupta C, Knight C, Kainz B, Hajnal JV, Rutherford Met al., 2021, Towards automated extraction of 2D standard fetal head planes from 3D ultrasound acquisitions: A clinical evaluation and quality assessment comparison, Radiography, Vol: 27, Pages: 519-526, ISSN: 1078-8174

IntroductionClinical evaluation of deep learning (DL) tools is essential to compliment technical accuracy metrics. This study assessed the image quality of standard fetal head planes automatically-extracted from three-dimensional (3D) ultrasound fetal head volumes using a customised DL-algorithm.MethodsTwo observers retrospectively reviewed standard fetal head planes against pre-defined image quality criteria. Forty-eight images (29 transventricular, 19 transcerebellar) were selected from 91 transabdominal fetal scans (mean gestational age = 26 completed weeks, range = 20+5–32+3 weeks). Each had two-dimensional (2D) manually-acquired (2D-MA), 3D operator-selected (3D-OS) and 3D-DL automatically-acquired (3D-DL) images. The proportion of adequate images from each plane and modality, and the number of inadequate images per plane was compared for each method. Inter and intra-observer agreement of overall image quality was calculated.ResultsSixty-seven percent of 3D-OS and 3D-DL transventricular planes were adequate quality. Forty-five percent of 3D-OS and 55% of 3D-DL transcerebellar planes were adequate.Seventy-one percent of 3D-OS and 86% of 3D-DL transventricular planes failed with poor visualisation of intra-cranial structures. Eighty-six percent of 3D-OS and 80% of 3D-DL transcerebellar planes failed due to inadequate visualisation of cerebellar hemispheres. Image quality was significantly different between 2D and 3D, however, no significant difference between 3D-modalities was demonstrated (p < 0.005). Inter-observer agreement of transventricular plane adequacy was moderate for both 3D-modalities, and weak for transcerebellar planes.ConclusionThe 3D-DL algorithm can automatically extract standard fetal head planes from 3D-head volumes of comparable quality to operator-selected planes. Image quality in 3D is inferior to corresponding 2D planes, likely due to limitations with 3D-technology and acquisition technique.Implications for practiceAutomated image

Journal article

Dou Q, So TY, Jiang M, Liu Q, Vardhanabhuti V, Kaissis G, Li Z, Si W, Lee HHC, Yu K, Feng Z, Dong L, Burian E, Jungmann F, Braren R, Makowski M, Kainz B, Rueckert D, Glocker B, Yu SCH, Heng PAet al., 2021, Federated deep learning for detecting COVID-19 lung abnormalities in CT: a privacy-preserving multinational validation study, npj Digital Medicine, Vol: 4, Pages: 1-11, ISSN: 2398-6352

Data privacy mechanisms are essential for rapidly scaling medical training databases to capture the heterogeneity of patient data distributions toward robust and generalizable machine learning systems. In the current COVID-19 pandemic, a major focus of artificial intelligence (AI) is interpreting chest CT, which can be readily used in the assessment and management of the disease. This paper demonstrates the feasibility of a federated learning method for detecting COVID-19 related CT abnormalities with external validation on patients from a multinational study. We recruited 132 patients from seven multinational different centers, with three internal hospitals from Hong Kong for training and testing, and four external, independent datasets from Mainland China and Germany, for validating model generalizability. We also conducted case studies on longitudinal scans for automated estimation of lesion burden for hospitalized COVID-19 patients. We explore the federated learning algorithms to develop a privacy-preserving AI model for COVID-19 medical image diagnosis with good generalization capability on unseen multinational datasets. Federated learning could provide an effective mechanism during pandemics to rapidly develop clinically useful AI across institutions and countries overcoming the burden of central aggregation of large amounts of sensitive data.

Journal article

Giulio J, Kainz B, 2021, Deep radiance caching: Convolutional autoencoders deeper in ray tracing, Computers and Graphics (UK), Vol: 94, Pages: 22-31, ISSN: 0097-8493

Rendering realistic images with global illumination is a computationally demanding task and often requires dedicated hardware for feasible runtime. Recent research uses Deep Neural Networks to predict indirect lighting on image level, but such methods are commonly limited to diffuse materials and require training on each scene. We present Deep Radiance Caching (DRC), an efficient variant of Radiance Caching utilizing Convolutional Autoencoders for rendering global illumination. DRC employs a denoising neural network with Radiance Caching to support a wide range of material types, without the requirement of offline pre-computation or training for each scene. This offers high performance CPU rendering for maximum accessibility. Our method has been evaluated on interior scenes, and is able to produce high-quality images within 180 s on a single CPU.

Journal article

Meng Q, Matthew J, Zimmer VA, Gomez A, Lloyd DFA, Rueckert D, Kainz Bet al., 2021, Mutual information-based disentangled neural networks for classifying unseen categories in different domains: application to fetal ultrasound imaging, IEEE Transactions on Medical Imaging, Vol: 40, Pages: 722-734, ISSN: 0278-0062

Deep neural networks exhibit limited generalizability across images with different entangled domain features and categorical features. Learning generalizable features that can form universal categorical decision boundaries across domains is an interesting and difficult challenge. This problem occurs frequently in medical imaging applications when attempts are made to deploy and improve deep learning models across different image acquisition devices, across acquisition parameters or if some classes are unavailable in new training databases. To ad-dress this problem, we propose Mutual Information-based Disentangled Neural Networks (MIDNet), which extract generalizable categorical features to transfer knowledge to unseen categories in a target domain. The proposed MID-Net adopts a semi-supervised learning paradigm to alleviate the dependency on labeled data. This is important for real-world applications where data annotation is time-consuming, costly and requires training and expertise. We extensively evaluate the proposed method on fetal ultra-sound datasets for two different image classification tasks where domain features are respectively defined by shadow artifacts and image acquisition devices. Experimental results show that the proposed method outperforms the state-of-the-art on the classification of unseen categories in a target domain with sparsely labeled training data.

Journal article

Kainz B, Makropoulos A, Oppenheimer J, Deane C, Mischkewitz S, Al-Noor F, Rawdin AC, Stevenson MD, Mandegaran R, Heinrich MP, Curry Net al., 2021, Non-invasive Diagnosis of Deep Vein Thrombosis from Ultrasound with Machine Learning, Publisher: Cold Spring Harbor Laboratory

<jats:title>Abstract</jats:title><jats:p>Deep Vein Thrombosis (DVT) is a blood clot most found in the leg, which can lead to fatal pulmonary embolism (PE). Compression ultrasound of the legs is the diagnostic gold standard, leading to a definitive diagnosis. However, many patients with possible symptoms are not found to have a DVT, resulting in long referral waiting times for patients and a large clinical burden for specialists. Thus, diagnosis at the point of care by non-specialists is desired.</jats:p><jats:p>We collect images in a pre-clinical study and investigate a deep learning approach for the automatic interpretation of compression ultrasound images. Our method provides guidance for free-hand ultrasound and aids non-specialists in detecting DVT.</jats:p><jats:p>We train a deep learning algorithm on ultrasound videos from 246 healthy volunteers and evaluate on a sample size of 51 prospectively enrolled patients from an NHS DVT diagnostic clinic. 32 DVT-positive patients and 19 DVT-negative patients were included. Algorithmic DVT diagnosis results in a sensitivity of 93.8% and a specificity of 84.2%, a positive predictive value of 90.9%, and a negative predictive value of 88.9% compared to the clinical gold standard.</jats:p><jats:p>To assess the potential benefits of this technology in healthcare we evaluate the entire clinical DVT decision algorithm and provide cost analysis when integrating our approach into a diagnostic pathway for DVT. Our approach is estimated to be cost effective at up to $150 per software examination, assuming a willingness to pay $26 000/QALY.</jats:p>

Working paper

Budd S, Day T, Simpson J, Lloyd K, Matthew J, Skelton E, Razavi R, Kainz Bet al., 2021, Can Non-specialists Provide High Quality Gold Standard Labels in Challenging Modalities?, Editors: Albarqouni, Cardoso, Dou, Kamnitsas, Khanal, Rekik, Rieke, Sheet, Tsaftaris, Xu, Xu, Publisher: SPRINGER INTERNATIONAL PUBLISHING AG, Pages: 251-262, ISBN: 978-3-030-87721-7

Book chapter

Miolane N, Guigui N, Le Brigant A, Mathe J, Hou B, Thanwerdas Y, Heyder S, Peltre O, Koep N, Zaatiti H, Hajri H, Cabanes Y, Gerald T, Chauchat P, Shewmake C, Brooks D, Kainz B, Donnat C, Holmes S, Pennec Xet al., 2020, Geomstats: a python package for riemannian geometry in machine learning, Journal of Machine Learning Research, Vol: 21, Pages: 1-9, ISSN: 1532-4435

We introduce Geomstats, an open-source Python package for computations and statistics on nonlinear manifolds such as hyperbolic spaces, spaces of symmetric positive definite matrices, Lie groups of transformations, and many more. We provide object-oriented and extensively unit-tested implementations. Manifolds come equipped with families of Riemannian metrics with associated exponential and logarithmic maps, geodesics, and parallel transport. Statistics and learning algorithms provide methods for estimation, clustering, and dimension reduction on manifolds. All associated operations are vectorized for batch computation and provide support for different execution backends-namely NumPy, PyTorch, and TensorFlow. This paper presents the package, compares it with related libraries, and provides relevant code examples. We show that Geomstats provides reliable building blocks to both foster research in differential geometry and statistics and democratize the use of Riemannian geometry in machine learning applications. The source code is freely available under the MIT license at

Journal article

Hou B, Vlontzos A, Alansary A, Rueckert D, Kainz Bet al., 2020, Flexible conditional image generation of missing data with learned mental maps, Machine Learning for Medical Image Reconstruction: Second International Workshop, Publisher: Springer International Publishing, Pages: 139-150, ISSN: 0302-9743

Real-world settings often do not allow acquisition of high-resolution volumetric images for accurate morphological assessment and diagnostic. In clinical practice it is frequently common to acquire only sparse data (e.g. individual slices) for initial diagnostic decision making. Thereby, physicians rely on their prior knowledge (or mental maps) of the human anatomy to extrapolate the underlying 3D information. Accurate mental maps require years of anatomy training, which in the first instance relies on normative learning, i.e. excluding pathology. In this paper, we leverage Bayesian Deep Learning and environment mapping to generate full volumetric anatomy representations from none to a small, sparse set of slices. We evaluate proof of concept implementations based on Generative Query Networks (GQN) and Conditional BRUNO using abdominal CT and brain MRI as well as in a clinical application involving sparse, motion-corrupted MR acquisition for fetal imaging. Our approach allows to reconstruct 3D volumes from 1 to 4 tomographic slices, with a SSIM of 0.7+ and cross-correlation of 0.8+ compared to the 3D ground truth.

Conference paper

Chotzoglou E, Kainz B, 2020, Exploring the Relationship Between Segmentation Uncertainty, Segmentation Performance and Inter-observer Variability with Probabilistic Networks, Large-Scale Annotation of Biomedical Data and Expert Label Synthesis and Hardware Aware Learning for Medical Imaging and Computer Assisted Intervention, Publisher: Springer International Publishing, Pages: 51-60, ISSN: 0302-9743

Conference paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: respub-action=search.html&id=00646162&limit=30&person=true