Citation

BibTex format

@article{Hao:2025:10.1109/JBHI.2025.3538324,
author = {Hao, P and Wang, H and Yang, G and Zhu, L},
doi = {10.1109/JBHI.2025.3538324},
journal = {IEEE Journal of Biomedical and Health Informatics},
title = {Enhancing Visual Reasoning with LLM-Powered Knowledge Graphs for Visual Question Localized-Answering in Robotic Surgery},
url = {http://dx.doi.org/10.1109/JBHI.2025.3538324},
year = {2025}
}

RIS format (EndNote, RefMan)

TY  - JOUR
AB - Expert surgeons often have heavy workloads and cannot promptly respond to queries from medical students and junior doctors about surgical procedures. Thus, research on Visual Question Localized-Answering in Surgery (Surgical-VQLA) is essential to assist medical students and junior doctors in understanding surgical scenarios. Surgical-VQLA aims to generate accurate answers and locate relevant areas in the surgical scene, requiring models to identify and understand surgical instruments, operative organs, and procedures. A key issue is the model's ability to accurately distinguish surgical instruments. Current Surgical-VQLA models rely primarily on sparse textual information, limiting their visual reasoning capabilities. To address this issue, we propose a framework called Enhancing Visual Reasoning with LLM-Powered Knowledge Graphs (EnVR-LPKG) for the Surgical-VQLA task. This framework enhances the model's understanding of the surgical scenario by utilizing knowledge graphs of surgical instruments constructed by the Large Language Model (LLM). Specifically, we design a Fine-grained Knowledge Extractor (FKE) to extract the most relevant information from knowledge graphs and perform contrastive learning with the extracted knowledge graphs and local image. Furthermore, we design a Multi-attention-based Surgical Instrument Enhancer (MSIE) module, which employs knowledge graphs to obtain an enhanced representation of the corresponding surgical instrument in the global scene. Through the MSIE module, the model can learn how to fuse visual features with knowledge graph text features, thereby strengthening the understanding of surgical instruments and further improving visual reasoning capabilities. Extensive experimental results on the EndoVis-17-VQLA and EndoVis-18-VQLA datasets demonstrate that our proposed method outperforms other state-of-the-art methods. We will release our code for future research.
AU - Hao,P
AU - Wang,H
AU - Yang,G
AU - Zhu,L
DO - 10.1109/JBHI.2025.3538324
PY - 2025///
SN - 2168-2194
TI - Enhancing Visual Reasoning with LLM-Powered Knowledge Graphs for Visual Question Localized-Answering in Robotic Surgery
T2 - IEEE Journal of Biomedical and Health Informatics
UR - http://dx.doi.org/10.1109/JBHI.2025.3538324
ER -

Contact


For enquiries about the MRI Physics Collective, please contact:

Mary Finnegan
Senior MR Physicist at the Imperial College Healthcare NHS Trust

Pete Lally
Assistant Professor in Magnetic Resonance (MR) Physics at Imperial College

Jan Sedlacik
MR Physicist at the Robert Steiner MR Unit, Hammersmith Hospital Campus