Imperial College London

Dr Ben Glocker

Faculty of EngineeringDepartment of Computing

Professor in Machine Learning for Imaging
 
 
 
//

Contact

 

+44 (0)20 7594 8334b.glocker Website CV

 
 
//

Location

 

377Huxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@article{Glocker:2023:10.1148/ryai.230060,
author = {Glocker, B and Jones, C and Bernhardt, M and Winzeck, S},
doi = {10.1148/ryai.230060},
journal = {Radiology: Artificial Intelligence},
title = {Risk of bias in chest radiography deep learning foundation models},
url = {http://dx.doi.org/10.1148/ryai.230060},
volume = {5},
year = {2023}
}

RIS format (EndNote, RefMan)

TY  - JOUR
AB - Purpose:To analyze a recently published chest radiography foundation model for the presence of biases that could lead to subgroup performance disparities across biologic sex and race.Materials and Methods:This Health Insurance Portability and Accountability Act–compliant retrospective study used 127 118 chest radiographs from 42 884 patients (mean age, 63 years ± 17 [SD]; 23 623 male, 19 261 female) from the CheXpert dataset that were collected between October 2002 and July 2017. To determine the presence of bias in features generated by a chest radiography foundation model and baseline deep learning model, dimensionality reduction methods together with two-sample Kolmogorov–Smirnov tests were used to detect distribution shifts across sex and race. A comprehensive disease detection performance analysis was then performed to associate any biases in the features to specific disparities in classification performance across patient subgroups.Results:Ten of 12 pairwise comparisons across biologic sex and race showed statistically significant differences in the studied foundation model, compared with four significant tests in the baseline model. Significant differences were found between male and female (P < .001) and Asian and Black (P < .001) patients in the feature projections that primarily capture disease. Compared with average model performance across all subgroups, classification performance on the “no finding” label decreased between 6.8% and 7.8% for female patients, and performance in detecting “pleural effusion” decreased between 10.7% and 11.6% for Black patients.Conclusion:The studied chest radiography foundation model demonstrated racial and sex-related bias, which led to disparate performance across patient subgroups; thus, this model may be unsafe for clinical applications.
AU - Glocker,B
AU - Jones,C
AU - Bernhardt,M
AU - Winzeck,S
DO - 10.1148/ryai.230060
PY - 2023///
SN - 2638-6100
TI - Risk of bias in chest radiography deep learning foundation models
T2 - Radiology: Artificial Intelligence
UR - http://dx.doi.org/10.1148/ryai.230060
UR - https://pubs.rsna.org/doi/10.1148/ryai.230060
UR - http://hdl.handle.net/10044/1/109345
VL - 5
ER -