Citation

BibTex format

@inproceedings{Zhang:2026:10.1609/aaai.v40i15.38302,
author = {Zhang, Z and Lee, K and Jing, P and Deng, W and Zhou, H and Jin, Z and Huang, J and Gao, Z and Marshall, DC and Fang, Y and Yang, G},
doi = {10.1609/aaai.v40i15.38302},
pages = {13025--13033},
title = {GEMA-Score: Granular Explainable Multi-Agent Scoring Framework for Radiology Report Evaluation},
url = {http://dx.doi.org/10.1609/aaai.v40i15.38302},
year = {2026}
}

RIS format (EndNote, RefMan)

TY  - CPAPER
AB - Automatic medical report generation has the potential to support clinical diagnosis, reduce the workload of radiologists, and demonstrate potential for enhancing diagnostic consistency. However, current evaluation metrics often fail to reflect the clinical reliability of generated reports. Overlap-based methods overlook fine-grained details (e.g., location, sever-ity), diagnostic metrics are constrained by fixed vocabularies. Some diagnostic metrics are limited by fixed vocabularies or templates, reducing their ability to capture diverse clinical expressions. LLM-based metrics lack interpretable reasoning, limiting trust in clinical settings. Therefore, we propose a Granular Explainable Multi-Agent Score (GEMA-Score) in this paper, which conducts both objective quantification and subjective evaluation through a large language model-based multi-agent workflow. Our GEMA-Score parses structured reports and employs stable calculations through interactive exchanges of information among agents to assess disease diagnosis, location, severity, and uncertainty. Additionally, an LLM-based scoring agent evaluates completeness, readability, and clinical terminology while providing explanatory feedback. Extensive experiments show that GEMA-Score achieves the highest correlation with human experts on public datasets (Kendall = 0.69 on ReXVal; 0.45 on RadEvalX), demonstrating improved clinical scoring reliability.
AU - Zhang,Z
AU - Lee,K
AU - Jing,P
AU - Deng,W
AU - Zhou,H
AU - Jin,Z
AU - Huang,J
AU - Gao,Z
AU - Marshall,DC
AU - Fang,Y
AU - Yang,G
DO - 10.1609/aaai.v40i15.38302
EP - 13033
PY - 2026///
SN - 2159-5399
SP - 13025
TI - GEMA-Score: Granular Explainable Multi-Agent Scoring Framework for Radiology Report Evaluation
UR - http://dx.doi.org/10.1609/aaai.v40i15.38302
ER -

Contact


For enquiries about the MRI Physics Collective, please contact:

Mary Finnegan
Senior MR Physicist at the Imperial College Healthcare NHS Trust

Pete Lally
Assistant Professor in Magnetic Resonance (MR) Physics at Imperial College

Jan Sedlacik
MR Physicist at the Robert Steiner MR Unit, Hammersmith Hospital Campus