Publications
287 results found
Islam M, Seenivasan L, Sharan SP, et al., 2023, Paced-curriculum distillation with prediction and label uncertainty for image segmentation, International Journal of Computer Assisted Radiology and Surgery, Vol: 18, Pages: 1875-1883, ISSN: 1861-6410
PURPOSE: In curriculum learning, the idea is to train on easier samples first and gradually increase the difficulty, while in self-paced learning, a pacing function defines the speed to adapt the training progress. While both methods heavily rely on the ability to score the difficulty of data samples, an optimal scoring function is still under exploration. METHODOLOGY: Distillation is a knowledge transfer approach where a teacher network guides a student network by feeding a sequence of random samples. We argue that guiding student networks with an efficient curriculum strategy can improve model generalization and robustness. For this purpose, we design an uncertainty-based paced curriculum learning in self-distillation for medical image segmentation. We fuse the prediction uncertainty and annotation boundary uncertainty to develop a novel paced-curriculum distillation (P-CD). We utilize the teacher model to obtain prediction uncertainty and spatially varying label smoothing with Gaussian kernel to generate segmentation boundary uncertainty from the annotation. We also investigate the robustness of our method by applying various types and severity of image perturbation and corruption. RESULTS: The proposed technique is validated on two medical datasets of breast ultrasound image segmentation and robot-assisted surgical scene segmentation and achieved significantly better performance in terms of segmentation and robustness. CONCLUSION: P-CD improves the performance and obtains better generalization and robustness over the dataset shift. While curriculum learning requires extensive tuning of hyper-parameters for pacing function, the level of performance improvement suppresses this limitation.
Siqueira Pinto M, Winzeck S, Kornaropoulos EN, et al., 2023, Use of Support Vector Machines Approach via ComBat Harmonized Diffusion Tensor Imaging for the Diagnosis and Prognosis of Mild Traumatic Brain Injury: A CENTER-TBI Study., J Neurotrauma, Vol: 40, Pages: 1317-1338
The prediction of functional outcome after mild traumatic brain injury (mTBI) is challenging. Conventional magnetic resonance imaging (MRI) does not do a good job of explaining the variance in outcome, as many patients with incomplete recovery will have normal-appearing clinical neuroimaging. More advanced quantitative techniques such as diffusion MRI (dMRI), can detect microstructural changes not otherwise visible, and so may offer a way to improve outcome prediction. In this study, we explore the potential of linear support vector classifiers (linearSVCs) to identify dMRI biomarkers that can predict recovery after mTBI. Simultaneously, the harmonization of fractional anisotropy (FA) and mean diffusivity (MD) via ComBat was evaluated and compared for the classification performances of the linearSVCs. We included dMRI scans of 179 mTBI patients and 85 controls from the Collaborative European NeuroTrauma Effectiveness Research in Traumatic Brain Injury (CENTER-TBI), a multi-center prospective cohort study, up to 21 days post-injury. Patients were dichotomized according to their Extended Glasgow Outcome Scale (GOSE) scores at 6 months into complete (n = 92; GOSE = 8) and incomplete (n = 87; GOSE <8) recovery. FA and MD maps were registered to a common space and harmonized via the ComBat algorithm. LinearSVCs were applied to distinguish: (1) mTBI patients from controls and (2) mTBI patients with complete from those with incomplete recovery. The linearSVCs were trained on (1) age and sex only, (2) non-harmonized, (3) two-category-harmonized ComBat, and (4) three-category-harmonized ComBat FA and MD images combined with age and sex. White matter FA and MD voxels and regions of interest (ROIs) within the John Hopkins University (JHU) atlas were examined. Recursive feature elimination was used to identify the 10% most discriminative voxels or the 10 most discriminative ROIs for each implementation. mTBI patients displayed signif
Rockall AG, Li X, Johnson N, et al., 2023, Development and evaluation of machine learning in whole-body magnetic resonance imaging for detecting metastases in patients with lung or colon cancer: a diagnostic test accuracy study., Investigative Radiology, ISSN: 0020-9996
OBJECTIVES: Whole-body magnetic resonance imaging (WB-MRI) has been demonstrated to be efficient and cost-effective for cancer staging. The study aim was to develop a machine learning (ML) algorithm to improve radiologists' sensitivity and specificity for metastasis detection and reduce reading times. MATERIALS AND METHODS: A retrospective analysis of 438 prospectively collected WB-MRI scans from multicenter Streamline studies (February 2013-September 2016) was undertaken. Disease sites were manually labeled using Streamline reference standard. Whole-body MRI scans were randomly allocated to training and testing sets. A model for malignant lesion detection was developed based on convolutional neural networks and a 2-stage training strategy. The final algorithm generated lesion probability heat maps. Using a concurrent reader paradigm, 25 radiologists (18 experienced, 7 inexperienced in WB-/MRI) were randomly allocated WB-MRI scans with or without ML support to detect malignant lesions over 2 or 3 reading rounds. Reads were undertaken in the setting of a diagnostic radiology reading room between November 2019 and March 2020. Reading times were recorded by a scribe. Prespecified analysis included sensitivity, specificity, interobserver agreement, and reading time of radiology readers to detect metastases with or without ML support. Reader performance for detection of the primary tumor was also evaluated. RESULTS: Four hundred thirty-three evaluable WB-MRI scans were allocated to algorithm training (245) or radiology testing (50 patients with metastases, from primary 117 colon [n = 117] or lung [n = 71] cancer). Among a total 562 reads by experienced radiologists over 2 reading rounds, per-patient specificity was 86.2% (ML) and 87.7% (non-ML) (-1.5% difference; 95% confidence interval [CI], -6.4%, 3.5%; P = 0.39). Sensitivity was 66.0% (ML) and 70.0% (non-ML) (-4.0% difference; 95% CI, -13.5%, 5.5%; P = 0.344). Among 161 reads by inexperienced readers, per-patient spec
Mccradden M, Odusi O, Joshi S, et al., 2023, What's fair is ⋯ fair? Presenting JustEFAB, an ethical framework for operationalizing medical ethics and social justice in the integration of clinical machine learning, Pages: 1505-1519
The problem of algorithmic bias represents an ethical threat to the fair treatment of patients when their care involves machine learning (ML) models informing clinical decision-making. The design, development, testing, and integration of ML models therefore require a lifecycle approach to bias identification and mitigation efforts. Presently, most work focuses on the ML tool alone, neglecting the larger sociotechnical context in which these models operate. Moreover, the narrow focus on technical definitions of fairness must be integrated within the larger context of medical ethics in order to facilitate equitable care with ML. Drawing from principles of medical ethics, research ethics, feminist philosophy of science, and justice-based theories, we describe the Justice, Equity, Fairness, and Anti-Bias (JustEFAB) guideline intended to support the design, testing, validation, and clinical evaluation of ML models with respect to algorithmic fairness. This paper describes JustEFAB's development and vetting through multiple advisory groups and the lifecycle approach to addressing fairness in clinical ML tools. We present an ethical decision-making framework to support design and development, adjudication between ethical values as design choices, silent trial evaluation, and prospective clinical evaluation guided by medical ethics and social justice principles. We provide some preliminary considerations for oversight and safety to support ongoing attention to fairness issues. We envision this guideline as useful to many stakeholders, including ML developers, healthcare decision-makers, research ethics committees, regulators, and other parties who have interest in the fair and judicious use of clinical ML tools.
Li Z, Kamnitsas K, Dou Q, et al., 2023, Joint Optimization of Class-Specific Training- and Test-Time Data Augmentation in Segmentation., IEEE Trans Med Imaging, Vol: PP
This paper presents an effective and general data augmentation framework for medical image segmentation. We adopt a computationally efficient and data-efficient gradient-based meta-learning scheme to explicitly align the distribution of training and validation data which is used as a proxy for unseen test data. We improve the current data augmentation strategies with two core designs. First, we learn class-specific training-time data augmentation (TRA) effectively increasing the heterogeneity within the training subsets and tackling the class imbalance common in segmentation. Second, we jointly optimize TRA and test-time data augmentation (TEA), which are closely connected as both aim to align the training and test data distribution but were so far considered separately in previous works. We demonstrate the effectiveness of our method on four medical image segmentation tasks across different scenarios with two state-of-the-art segmentation models, DeepMedic and nnU-Net. Extensive experimentation shows that the proposed data augmentation framework can significantly and consistently improve the segmentation performance when compared to existing solutions. Code is publicly available1.
Li Z, Kamnitsas K, Ouyang C, et al., 2023, Context label learning: improving background class representations in semantic segmentation, IEEE Transactions on Medical Imaging, Vol: 42, Pages: 1885-1896, ISSN: 0278-0062
Background samples provide key contextual information for segmenting regionsof interest (ROIs). However, they always cover a diverse set of structures,causing difficulties for the segmentation model to learn good decisionboundaries with high sensitivity and precision. The issue concerns the highlyheterogeneous nature of the background class, resulting in multi-modaldistributions. Empirically, we find that neural networks trained withheterogeneous background struggle to map the corresponding contextual samplesto compact clusters in feature space. As a result, the distribution overbackground logit activations may shift across the decision boundary, leading tosystematic over-segmentation across different datasets and tasks. In thisstudy, we propose context label learning (CoLab) to improve the contextrepresentations by decomposing the background class into several subclasses.Specifically, we train an auxiliary network as a task generator, along with theprimary segmentation model, to automatically generate context labels thatpositively affect the ROI segmentation accuracy. Extensive experiments areconducted on several challenging segmentation tasks and datasets. The resultsdemonstrate that CoLab can guide the segmentation model to map the logits ofbackground samples away from the decision boundary, resulting in significantlyimproved segmentation accuracy. Code is available.
Mackay K, Bernstein D, Glocker B, et al., 2023, A Review of the Metrics Used to Assess Auto-Contouring Systems in Radiotherapy., Clin Oncol (R Coll Radiol), Vol: 35, Pages: 354-369
Auto-contouring could revolutionise future planning of radiotherapy treatment. The lack of consensus on how to assess and validate auto-contouring systems currently limits clinical use. This review formally quantifies the assessment metrics used in studies published during one calendar year and assesses the need for standardised practice. A PubMed literature search was undertaken for papers evaluating radiotherapy auto-contouring published during 2021. Papers were assessed for types of metric and the methodology used to generate ground-truth comparators. Our PubMed search identified 212 studies, of which 117 met the criteria for clinical review. Geometric assessment metrics were used in 116 of 117 studies (99.1%). This includes the Dice Similarity Coefficient used in 113 (96.6%) studies. Clinically relevant metrics, such as qualitative, dosimetric and time-saving metrics, were less frequently used in 22 (18.8%), 27 (23.1%) and 18 (15.4%) of 117 studies, respectively. There was heterogeneity within each category of metric. Over 90 different names for geometric measures were used. Methods for qualitative assessment were different in all but two papers. Variation existed in the methods used to generate radiotherapy plans for dosimetric assessment. Consideration of editing time was only given in 11 (9.4%) papers. A single manual contour as a ground-truth comparator was used in 65 (55.6%) studies. Only 31 (26.5%) studies compared auto-contours to usual inter- and/or intra-observer variation. In conclusion, significant variation exists in how research papers currently assess the accuracy of automatically generated contours. Geometric measures are the most popular, however their clinical utility is unknown. There is heterogeneity in the methods used to perform clinical assessment. Considering the different stages of system implementation may provide a framework to decide the most appropriate metrics. This analysis supports the need for a consensus on the clinical implement
Li L, Heselgrave A, Soreq E, et al., 2023, Investigating the characteristics and correlates of systemic inflammation after traumatic brain injury: the TBI-BraINFLAMM study, BMJ Open, Vol: 13, ISSN: 2044-6055
Introduction: A significant environmental risk factor for neurodegenerative disease is traumatic brain injury (TBI). However, it is not clear how TBI results in ongoing chronic neurodegeneration. Animal studies show that systemic inflammation is signalled to the brain. This can result in sustained and aggressive microglial activation, which in turn is associated with widespread neurodegeneration. We aim to evaluate systemic inflammation as a mediator of ongoing neurodegeneration after TBI.Methods and analysis: TBI-braINFLAMM will combine data already collected from two large prospective TBI studies. The CREACTIVE study, a broad consortium which enrolled >8000 patients with TBI to have CT scans and blood samples in the hyperacute period, has data available from 854 patients. The BIO-AX-TBI study recruited 311 patients to have acute CT scans, longitudinal blood samples and longitudinal MRI brain scans. The BIO-AX-TBI study also has data from 102 healthy and 24 non-TBI trauma controls, comprising blood samples (both control groups) and MRI scans (healthy controls only). All blood samples from BIO-AX-TBI and CREACTIVE have already been tested for neuronal injury markers (GFAP, tau and NfL), and CREACTIVE blood samples have been tested for inflammatory cytokines. We will additionally test inflammatory cytokine levels from the already collected longitudinal blood samples in the BIO-AX-TBI study, as well as matched microdialysate and blood samples taken during the acute period from a subgroup of patients with TBI (n=18).We will use this unique dataset to characterise post-TBI systemic inflammation, and its relationships with injury severity and ongoing neurodegeneration.Ethics and dissemination: Ethical approval for this study has been granted by the London—Camberwell St Giles Research Ethics Committee (17/LO/2066). Results will be submitted for publication in peer-review journals, presented at conferences and inform the design of larger observational and experime
Sharma N, Ng AY, James JJ, et al., 2023, Multi-vendor evaluation of artificial intelligence as an independent reader for double reading in breast cancer screening on 275,900 mammograms., BMC Cancer, Vol: 23
BACKGROUND: Double reading (DR) in screening mammography increases cancer detection and lowers recall rates, but has sustainability challenges due to workforce shortages. Artificial intelligence (AI) as an independent reader (IR) in DR may provide a cost-effective solution with the potential to improve screening performance. Evidence for AI to generalise across different patient populations, screening programmes and equipment vendors, however, is still lacking. METHODS: This retrospective study simulated DR with AI as an IR, using data representative of real-world deployments (275,900 cases, 177,882 participants) from four mammography equipment vendors, seven screening sites, and two countries. Non-inferiority and superiority were assessed for relevant screening metrics. RESULTS: DR with AI, compared with human DR, showed at least non-inferior recall rate, cancer detection rate, sensitivity, specificity and positive predictive value (PPV) for each mammography vendor and site, and superior recall rate, specificity, and PPV for some. The simulation indicates that using AI would have increased arbitration rate (3.3% to 12.3%), but could have reduced human workload by 30.0% to 44.8%. CONCLUSIONS: AI has potential as an IR in the DR workflow across different screening programmes, mammography equipment and geographies, substantially reducing human reader workload while maintaining or improving standard of care. TRIAL REGISTRATION: ISRCTN18056078 (20/03/2019; retrospectively registered).
Ng AY, Glocker B, Oberije C, et al., 2023, Artificial Intelligence as Supporting Reader in Breast Screening: A Novel Workflow to Preserve Quality and Reduce Workload, Journal of Breast Imaging, Vol: 5, Pages: 267-276, ISSN: 2631-6110
Objective: To evaluate the effectiveness of a new strategy for using artificial intelligence (AI) as supporting reader for the detection of breast cancer in mammography-based double reading screening practice. Methods: Large-scale multi-site, multi-vendor data were used to retrospectively evaluate a new paradigm of AI-supported reading. Here, the AI served as the second reader only if it agrees with the recall/no-recall decision of the first human reader. Otherwise, a second human reader made an assessment followed by the standard clinical workflow. The data included 280 594 cases from 180 542 female participants screened for breast cancer at seven screening sites in two countries and using equipment from four hardware vendors. The statistical analysis included non-inferiority and superiority testing of cancer screening performance and evaluation of the reduction in workload, measured as arbitration rate and number of cases requiring second human reading. Results: Artificial intelligence as a supporting reader was found to be superior or noninferior on all screening metrics compared with human double reading while reducing the number of cases requiring second human reading by up to 87% (245 395/280 594). Compared with AI as an independent reader, the number of cases referred to arbitration was reduced from 13% (35 199/280 594) to 2% (5056/280 594). Conclusion: The simulation indicates that the proposed workflow retains screening performance of human double reading while substantially reducing the workload. Further research should study the impact on the second human reader because they would only assess cases in which the AI prediction and first human reader disagree.
Menten MJ, Holland R, Leingang O, et al., 2023, Exploring healthy retinal aging with deep learning, Ophthalmology Science, Vol: 3, Pages: 1-10, ISSN: 2666-9145
PurposeTo study the individual course of retinal changes caused by healthy aging using deep learning.DesignRetrospective analysis of a large data set of retinal OCT images.ParticipantsA total of 85 709 adults between the age of 40 and 75 years of whom OCT images were acquired in the scope of the UK Biobank population study.MethodsWe created a counterfactual generative adversarial network (GAN), a type of neural network that learns from cross-sectional, retrospective data. It then synthesizes high-resolution counterfactual OCT images and longitudinal time series. These counterfactuals allow visualization and analysis of hypothetical scenarios in which certain characteristics of the imaged subject, such as age or sex, are altered, whereas other attributes, crucially the subject’s identity and image acquisition settings, remain fixed.Main Outcome MeasuresUsing our counterfactual GAN, we investigated subject-specific changes in the retinal layer structure as a function of age and sex. In particular, we measured changes in the retinal nerve fiber layer (RNFL), combined ganglion cell layer plus inner plexiform layer (GCIPL), inner nuclear layer to the inner boundary of the retinal pigment epithelium (INL-RPE), and retinal pigment epithelium (RPE).ResultsOur counterfactual GAN is able to smoothly visualize the individual course of retinal aging. Across all counterfactual images, the RNFL, GCIPL, INL-RPE, and RPE changed by −0.1 μm ± 0.1 μm, −0.5 μm ± 0.2 μm, −0.2 μm ± 0.1 μm, and 0.1 μm ± 0.1 μm, respectively, per decade of age. These results agree well with previous studies based on the same cohort from the UK Biobank population study. Beyond population-wide average measures, our counterfactual GAN allows us to explore whether the retinal layers of a given eye will increase in thickness, decrease in thickness, or stagnate as a subject ages.ConclusionThis study demonstrates how counterfactual GANs
Glocker B, Jones C, Bernhardt M, et al., 2023, Algorithmic encoding of protected characteristics in chest X-ray disease detection models, EBioMedicine, Vol: 89, Pages: 1-19, ISSN: 2352-3964
BackgroundIt has been rightfully emphasized that the use of AI for clinical decision making could amplify health disparities. An algorithm may encode protected characteristics, and then use this information for making predictions due to undesirable correlations in the (historical) training data. It remains unclear how we can establish whether such information is actually used. Besides the scarcity of data from underserved populations, very little is known about how dataset biases manifest in predictive models and how this may result in disparate performance. This article aims to shed some light on these issues by exploring methodology for subgroup analysis in image-based disease detection models.MethodsWe utilize two publicly available chest X-ray datasets, CheXpert and MIMIC-CXR, to study performance disparities across race and biological sex in deep learning models. We explore test set resampling, transfer learning, multitask learning, and model inspection to assess the relationship between the encoding of protected characteristics and disease detection performance across subgroups.FindingsWe confirm subgroup disparities in terms of shifted true and false positive rates which are partially removed after correcting for population and prevalence shifts in the test sets. We find that transfer learning alone is insufficient for establishing whether specific patient information is used for making predictions. The proposed combination of test-set resampling, multitask learning, and model inspection reveals valuable insights about the way protected characteristics are encoded in the feature representations of deep neural networks.InterpretationSubgroup analysis is key for identifying performance disparities of AI models, but statistical differences across subgroups need to be taken into account when analyzing potential biases in disease detection. The proposed methodology provides a comprehensive framework for subgroup analysis enabling further research into the underlyi
Monteiro M, De Sousa Ribeiro F, Pawlowski N, et al., 2023, Measuring axiomatic soundness of counterfactual image models, International Conference on Learning Representations (ICLR)
We use the axiomatic definition of counterfactual to derive metrics that enable quantifying the correctness of approximate counterfactual inference models.Abstract: We present a general framework for evaluating image counterfactuals. The power and flexibility of deep generative models make them valuable tools for learning mechanisms in structural causal models. However, their flexibility makes counterfactual identifiability impossible in the general case.Motivated by these issues, we revisit Pearl's axiomatic definition of counterfactuals to determine the necessary constraints of any counterfactual inference model: composition, reversibility, and effectiveness. We frame counterfactuals as functions of an input variable, its parents, and counterfactual parents and use the axiomatic constraints to restrict the set of functions that could represent the counterfactual, thus deriving distance metrics between the approximate and ideal functions. We demonstrate how these metrics can be used to compare and choose between different approximate counterfactual inference models and to provide insight into a model's shortcomings and trade-offs.
Pati S, Baid U, Edwards B, et al., 2023, Author Correction: Federated learning enables big data for rare cancer boundary detection., Nature Communications, Vol: 14, Pages: 436-436, ISSN: 2041-1723
Xu M, Islam M, Glocker B, et al., 2023, Confidence-Aware Paced-Curriculum Learning by Label Smoothing for Surgical Scene Understanding, IEEE Transactions on Automation Science and Engineering, ISSN: 1545-5955
Curriculum learning and self-paced learning are the training strategies that gradually feed the samples from easy to more complex. They have captivated increasing attention due to their excellent performance in robotic vision. Most recent works focus on designing curricula based on difficulty levels in input samples or smoothing the feature maps. However, smoothing labels to control the learning utility in a curriculum manner is still unexplored. In this work, we design a paced curriculum by label smoothing (P-CBLS) using paced learning with uniform label smoothing (ULS) for classification tasks and fuse uniform and spatially varying label smoothing (SVLS) for semantic segmentation tasks in a curriculum manner. In ULS and SVLS, a bigger smoothing factor value enforces a heavy smoothing penalty in the true label and limits learning less information. Therefore, we design the curriculum by label smoothing (CBLS). We set a bigger smoothing value at the beginning of training and gradually decreased it to zero to control the model learning utility from lower to higher. We also designed a confidence-aware pacing function and combined it with our CBLS to investigate the benefits of various curricula. The proposed techniques are validated on four robotic surgery datasets of multi-class, multi-label classification, captioning, and segmentation tasks. We also investigate the robustness of our method by corrupting validation data into different severity levels. Our extensive analysis shows that the proposed method improves prediction accuracy and robustness. The code is publicly available at https://github.com/XuMengyaAmy/P-CBLS. <italic>Note to Practitioners</italic>—The motivation of this article is to improve the performance and robustness of deep neural networks in safety-critical applications such as robotic surgery by controlling the learning ability of the model in a curriculum learning manner and allowing the model to imitate the cognitive process
Dorent R, Kujawa A, Ivory M, et al., 2023, CrossMoDA 2021 challenge: Benchmark of cross-modality domain adaptation techniques for vestibular schwannoma and cochlea segmentation, Publisher: ELSEVIER
Batten J, Sinclair M, Glocker B, et al., 2023, Image To Tree with Recursive Prompting
Extracting complex structures from grid-based data is a common key step inautomated medical image analysis. The conventional solution to recoveringtree-structured geometries typically involves computing the minimal cost paththrough intermediate representations derived from segmentation masks. However,this methodology has significant limitations in the context of projectiveimaging of tree-structured 3D anatomical data such as coronary arteries, sincethere are often overlapping branches in the 2D projection. In this work, wepropose a novel approach to predicting tree connectivity structure whichreformulates the task as an optimization problem over individual steps of arecursive process. We design and train a two-stage model which leverages theUNet and Transformer architectures and introduces an image-based promptingtechnique. Our proposed method achieves compelling results on a pair ofsynthetic datasets, and outperforms a shortest-path baseline.
Islam M, Glocker B, 2023, Frequency Dropout: Feature-Level Regularization via Randomized Filtering, Pages: 281-295, ISSN: 0302-9743
Deep convolutional neural networks have shown remarkable performance on various computer vision tasks, and yet, they are susceptible to picking up spurious correlations from the training signal. So called ‘shortcuts’ can occur during learning, for example, when there are specific frequencies present in the image data that correlate with the output predictions. Both high and low frequencies can be characteristic of the underlying noise distribution caused by the image acquisition rather than in relation to the task-relevant information about the image content. Models that learn features related to this characteristic noise will not generalize well to new data. In this work, we propose a simple yet effective training strategy, Frequency Dropout, to prevent convolutional neural networks from learning frequency-specific imaging features. We employ randomized filtering of feature maps during training which acts as a feature-level regularization. In this study, we consider common image processing filters such as Gaussian smoothing, Laplacian of Gaussian, and Gabor filtering. Our training strategy is model-agnostic and can be used for any computer vision task. We demonstrate the effectiveness of Frequency Dropout on a range of popular architectures and multiple tasks including image classification, domain adaptation, and semantic segmentation using both computer vision and medical imaging datasets. Our results suggest that the proposed approach does not only improve predictive accuracy but also improves robustness against domain shift.
Piçarra C, Winzeck S, Monteiro M, et al., 2023, Automatic localisation and per-region quantification of traumatic brain injury on head CT using atlas mapping, European Journal of Radiology Open, Vol: 10, Pages: 1-9, ISSN: 2352-0477
Rationale and objectivesTo develop a method for automatic localisation of brain lesions on head CT, suitable for both population-level analysis and lesion management in a clinical setting.Materials and methodsLesions were located by mapping a bespoke CT brain atlas to the patient’s head CT in which lesions had been previously segmented. The atlas mapping was achieved through robust intensity-based registration enabling the calculation of per-region lesion volumes. Quality control (QC) metrics were derived for automatic detection of failure cases. The CT brain template was built using 182 non-lesioned CT scans and an iterative template construction strategy. Individual brain regions in the CT template were defined via non-linear registration of an existing MRI-based brain atlas.Evaluation was performed on a multi-centre traumatic brain injury dataset (TBI) (n = 839 scans), including visual inspection by a trained expert. Two population-level analyses are presented as proof-of-concept: a spatial assessment of lesion prevalence, and an exploration of the distribution of lesion volume per brain region, stratified by clinical outcome.Results95.7% of the lesion localisation results were rated by a trained expert as suitable for approximate anatomical correspondence between lesions and brain regions, and 72.5% for more quantitatively accurate estimates of regional lesion load. The classification performance of the automatic QC showed an AUC of 0.84 when compared to binarised visual inspection scores. The localisation method has been integrated into the publicly available Brain Lesion Analysis and Segmentation Tool for CT (BLAST-CT).ConclusionAutomatic lesion localisation with reliable QC metrics is feasible and can be used for patient-level quantitative analysis of TBI, as well as for large-scale population analysis due to its computational efficiency (<2 min/scan on GPU).
Rasal R, Castro DC, Pawlowski N, et al., 2023, Deep structural causal shape models, Computer Vision – ECCV 2022 Workshops, Publisher: Springer Nature Switzerland, Pages: 400-432, ISSN: 0302-9743
Causal reasoning provides a language to ask important interventional and counterfactual questions beyond purely statistical association. In medical imaging, for example, we may want to study the causal effect of genetic, environmental, or lifestyle factors on the normal and pathological variation of anatomical phenotypes. However, while anatomical shape models of 3D surface meshes, extracted from automated image segmentation, can be reliably constructed, there is a lack of computational tooling to enable causal reasoning about morphological variations. To tackle this problem, we propose deep structural causal shape models (CSMs), which utilise high-quality mesh generation techniques, from geometric deep learning, within the expressive framework of deep structural causal models. CSMs enable subject-specific prognoses through counterfactual mesh generation (“How would this patient’s brain structure change if they were ten years older?”), which is in contrast to most current works on purely population-level statistical shape modelling. We demonstrate the capabilities of CSMs at all levels of Pearl’s causal hierarchy through a number of qualitative and quantitative experiments leveraging a large dataset of 3D brain structures.
Xu M, Islam M, Glocker B, et al., 2022, Confidence-Aware Paced-Curriculum Learning by Label Smoothing for Surgical Scene Understanding
Curriculum learning and self-paced learning are the training strategies thatgradually feed the samples from easy to more complex. They have captivatedincreasing attention due to their excellent performance in robotic vision. Mostrecent works focus on designing curricula based on difficulty levels in inputsamples or smoothing the feature maps. However, smoothing labels to control thelearning utility in a curriculum manner is still unexplored. In this work, wedesign a paced curriculum by label smoothing (P-CBLS) using paced learning withuniform label smoothing (ULS) for classification tasks and fuse uniform andspatially varying label smoothing (SVLS) for semantic segmentation tasks in acurriculum manner. In ULS and SVLS, a bigger smoothing factor value enforces aheavy smoothing penalty in the true label and limits learning less information.Therefore, we design the curriculum by label smoothing (CBLS). We set a biggersmoothing value at the beginning of training and gradually decreased it to zeroto control the model learning utility from lower to higher. We also designed aconfidence-aware pacing function and combined it with our CBLS to investigatethe benefits of various curricula. The proposed techniques are validated onfour robotic surgery datasets of multi-class, multi-label classification,captioning, and segmentation tasks. We also investigate the robustness of ourmethod by corrupting validation data into different severity levels. Ourextensive analysis shows that the proposed method improves prediction accuracyand robustness.
Gatidis S, Kart T, Fischer M, et al., 2022, Better together: data harmonization and cross-study analysis of abdominal MRI data from UK biobank and the German national cohort., Investigative Radiology, Vol: 58, Pages: 346-354, ISSN: 0020-9996
OBJECTIVES: The UK Biobank (UKBB) and German National Cohort (NAKO) are among the largest cohort studies, capturing a wide range of health-related data from the general population, including comprehensive magnetic resonance imaging (MRI) examinations. The purpose of this study was to demonstrate how MRI data from these large-scale studies can be jointly analyzed and to derive comprehensive quantitative image-based phenotypes across the general adult population. MATERIALS AND METHODS: Image-derived features of abdominal organs (volumes of liver, spleen, kidneys, and pancreas; volumes of kidney hilum adipose tissue; and fat fractions of liver and pancreas) were extracted from T1-weighted Dixon MRI data of 17,996 participants of UKBB and NAKO based on quality-controlled deep learning generated organ segmentations. To enable valid cross-study analysis, we first analyzed the data generating process using methods of causal discovery. We subsequently harmonized data from UKBB and NAKO using the ComBat approach for batch effect correction. We finally performed quantile regression on harmonized data across studies providing quantitative models for the variation of image-derived features stratified for sex and dependent on age, height, and weight. RESULTS: Data from 8791 UKBB participants (49.9% female; age, 63 ± 7.5 years) and 9205 NAKO participants (49.1% female, age: 51.8 ± 11.4 years) were analyzed. Analysis of the data generating process revealed direct effects of age, sex, height, weight, and the data source (UKBB vs NAKO) on image-derived features. Correction of data source-related effects resulted in markedly improved alignment of image-derived features between UKBB and NAKO. Cross-study analysis on harmonized data revealed comprehensive quantitative models for the phenotypic variation of abdominal organs across the general adult population. CONCLUSIONS: Cross-study analysis of MRI data from UKBB and NAKO as proposed in this work can be helpful for futur
Chalkidou A, Shokraneh F, Kijauskaite G, et al., 2022, Recommendations for the development and use of imaging test sets to investigate the test performance of artificial intelligence in health screening, LANCET DIGITAL HEALTH, Vol: 4, Pages: E899-E905
Kart T, Fischer M, Winzeck S, et al., 2022, Automated imaging-based abdominal organ segmentation and quality control in 20,000 participants of the UK Biobank and German National Cohort Studies, SCIENTIFIC REPORTS, Vol: 12, ISSN: 2045-2322
Rosnati M, Ribeiro FDS, Monteiro M, et al., 2022, Analysing the effectiveness of a generative model for semi-supervised medical image segmentation
Image segmentation is important in medical imaging, providing valuable,quantitative information for clinical decision-making in diagnosis, therapy,and intervention. The state-of-the-art in automated segmentation remainssupervised learning, employing discriminative models such as U-Net. However,training these models requires access to large amounts of manually labelleddata which is often difficult to obtain in real medical applications. In suchsettings, semi-supervised learning (SSL) attempts to leverage the abundance ofunlabelled data to obtain more robust and reliable models. Recently, generativemodels have been proposed for semantic segmentation, as they make an attractivechoice for SSL. Their ability to capture the joint distribution over inputimages and output label maps provides a natural way to incorporate informationfrom unlabelled images. This paper analyses whether deep generative models suchas the SemanticGAN are truly viable alternatives to tackle challenging medicalimage segmentation problems. To that end, we thoroughly evaluate thesegmentation performance, robustness, and potential subgroup disparities ofdiscriminative and generative segmentation methods when applied to large-scale,publicly available chest X-ray datasets.
Kori A, Glocker B, Toni F, 2022, Explaining Image Classification with Visual Debates
An effective way to obtain different perspectives on any given topic is byconducting a debate, where participants argue for and against the topic. Here,we propose a novel debate framework for understanding and explaining acontinuous image classifier's reasoning for making a particular prediction bymodeling it as a multiplayer sequential zero-sum debate game. The contrastivenature of our framework encourages players to learn to put forward diversearguments during the debates, picking up the reasoning trails missed by theiropponents and highlighting any uncertainties in the classifier. Specifically,in our proposed setup, players propose arguments, drawn from the classifier'sdiscretized latent knowledge, to support or oppose the classifier's decision.The resulting Visual Debates collect supporting and opposing features from thediscretized latent space of the classifier, serving as explanations for theinternal reasoning of the classifier towards its predictions. We demonstrateand evaluate (a practical realization of) our Visual Debates on the geometricSHAPE and MNIST datasets and on the high-resolution animal faces (AFHQ)dataset, along standard evaluation metrics for explanations (i.e. faithfulnessand completeness) and novel, bespoke metrics for visual debates as explanations(consensus and split ratio).
Shehata N, Bain W, Glocker B, 2022, A Comparative Study of Graph Neural Networks for Shape Classification in Neuroimaging, Proceedings of Machine Learning Research, GeoMedIA Workshop
Graph neural networks have emerged as a promising approach for the analysisof non-Euclidean data such as meshes. In medical imaging, mesh-like data playsan important role for modelling anatomical structures, and shape classificationcan be used in computer aided diagnosis and disease detection. However, with aplethora of options, the best architectural choices for medical shape analysisusing GNNs remain unclear. We conduct a comparative analysis to providepractitioners with an overview of the current state-of-the-art in geometricdeep learning for shape classification in neuroimaging. Using biological sexclassification as a proof-of-concept task, we find that using FPFH as nodefeatures substantially improves GNN performance and generalisation toout-of-distribution data; we compare the performance of three alternativeconvolutional layers; and we reinforce the importance of data augmentation forgraph based learning. We then confirm these results hold for a clinicallyrelevant task, using the classification of Alzheimer's disease.
Rosnati M, Soreq E, Monteiro M, et al., 2022, Automatic lesion analysis for increased efficiency in outcome prediction of traumatic brain injury, 5th International Workshop, MLCN 2022, Publisher: Springer Nature Switzerland, Pages: 135-146, ISSN: 0302-9743
The accurate prognosis for traumatic brain injury (TBI) patients is difficult yet essential to inform therapy, patient management, and long-term after-care. Patient characteristics such as age, motor and pupil responsiveness, hypoxia and hypotension, and radiological findings on computed tomography (CT), have been identified as important variables for TBI outcome prediction. CT is the acute imaging modality of choice in clinical practice because of its acquisition speed and widespread availability. However, this modality is mainly used for qualitative and semi-quantitative assessment, such as the Marshall scoring system, which is prone to subjectivity and human errors. This work explores the predictive power of imaging biomarkers extracted from routinely-acquired hospital admission CT scans using a state-of-the-art, deep learning TBI lesion segmentation method. We use lesion volumes and corresponding lesion statistics as inputs for an extended TBI outcome prediction model. We compare the predictive power of our proposed features to the Marshall score, independently and when paired with classic TBI biomarkers. We find that automatically extracted quantitative CT features perform similarly or better than the Marshall score in predicting unfavourable TBI outcomes. Leveraging automatic atlas alignment, we also identify frontal extra-axial lesions as important indicators of poor outcome. Our work may contribute to a better understanding of TBI, and provides new insights into how automated neuroimaging analysis can be used to improve prognostication after TBI.
Satchwell L, Wedlake L, Greenlay E, et al., 2022, Development of machine learning support for reading whole body diffusion-weighted MRI (WB-MRI) in myeloma for the detection and quantification of the extent of disease before and after treatment (MALIMAR): protocol for a cross-sectional diagnostic test accuracy study, BMJ Open, Vol: 12, Pages: 1-9, ISSN: 2044-6055
Introduction Whole-body MRI (WB-MRI) is recommended by the National Institute of Clinical Excellence as the first-line imaging tool for diagnosis of multiple myeloma. Reporting WB-MRI scans requires expertise to interpret and can be challenging for radiologists who need to meet rapid turn-around requirements. Automated computational tools based on machine learning (ML) could assist the radiologist in terms of sensitivity and reading speed and would facilitate improved accuracy, productivity and cost-effectiveness. The MALIMAR study aims to develop and validate a ML algorithm to increase the diagnostic accuracy and reading speed of radiological interpretation of WB-MRI compared with standard methods.Methods and analysis This phase II/III imaging trial will perform retrospective analysis of previously obtained clinical radiology MRI scans and scans from healthy volunteers obtained prospectively to implement training and validation of an ML algorithm. The study will comprise three project phases using approximately 633 scans to (1) train the ML algorithm to identify active disease, (2) clinically validate the ML algorithm and (3) determine change in disease status following treatment via a quantification of burden of disease in patients with myeloma. Phase 1 will primarily train the ML algorithm to detect active myeloma against an expert assessment (‘reference standard’). Phase 2 will use the ML output in the setting of radiology reader study to assess the difference in sensitivity when using ML-assisted reading or human-alone reading. Phase 3 will assess the agreement between experienced readers (with and without ML) and the reference standard in scoring both overall burden of disease before and after treatment, and response.Ethics and dissemination MALIMAR has ethical approval from South Central—Oxford C Research Ethics Committee (REC Reference: 17/SC/0630). IRAS Project ID: 233501. CPMS Portfolio adoption (CPMS ID: 36766). Participants gave informe
Islam M, Glocker B, 2022, Frequency Dropout: Feature-Level Regularization via Randomized Filtering
Deep convolutional neural networks have shown remarkable performance onvarious computer vision tasks, and yet, they are susceptible to picking upspurious correlations from the training signal. So called `shortcuts' can occurduring learning, for example, when there are specific frequencies present inthe image data that correlate with the output predictions. Both high and lowfrequencies can be characteristic of the underlying noise distribution causedby the image acquisition rather than in relation to the task-relevantinformation about the image content. Models that learn features related to thischaracteristic noise will not generalize well to new data. In this work, we propose a simple yet effective training strategy, FrequencyDropout, to prevent convolutional neural networks from learningfrequency-specific imaging features. We employ randomized filtering of featuremaps during training which acts as a feature-level regularization. In thisstudy, we consider common image processing filters such as Gaussian smoothing,Laplacian of Gaussian, and Gabor filtering. Our training strategy ismodel-agnostic and can be used for any computer vision task. We demonstrate theeffectiveness of Frequency Dropout on a range of popular architectures andmultiple tasks including image classification, domain adaptation, and semanticsegmentation using both computer vision and medical imaging datasets. Ourresults suggest that the proposed approach does not only improve predictiveaccuracy but also improves robustness against domain shift.
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.