Publications

Citation

BibTex format

@inproceedings{Liu:2025:10.1109/ICCVW69036.2025.00698,
author = {Liu, C and Ouyang, C and Chen, Y and Quilodrán-Casas, C and Ma, L and Fu, J and Guo, Y and Shah, A and Bai, W and Arcucci, R},
doi = {10.1109/ICCVW69036.2025.00698},
pages = {6763--6773},
title = {T3D: Advancing 3D Medical Vision-Language Pre-Training by Learning Multi-View Visual Consistency},
url = {http://dx.doi.org/10.1109/ICCVW69036.2025.00698},
year = {2025}
}

Download

RIS format (EndNote, RefMan)

TY  - CPAPER
AB  - While 3D visual self-supervised learning (vSSL) shows promising results in capturing visual representations, it over-looks the clinical knowledge from radiology reports. Mean-while, 3D medical vision-language pre-training (MedVLP) remains underexplored due to the lack of a large-scale, pub-licly available 3D medical image-report dataset. To bridge this gap, we introduce CT-3DVLP, the first and largest pub-lic 3D volume-report dataset, establishing a comprehensive benchmarkfor 3D MedVLP research. Meanwhile, we pro-pose the T3D framework, which enhances 3D MedVLP be-yond naive CLIP-style alignment that directly pairs volumes with reports but neglects local visual representations. In-stead, we introduce Text-informed Multi-view Alignment (TMA), a novel approach that clusters volumetric data while enforcing consistency across different views of the same volume-report pair. TMA integrates textual features into fine-grained visual representations, ensuring contextual coher-ence across views. We evaluate T3D across multiple down-stream tasks in both unimodal and cross-modal settings, in-cluding zero-shot and fine-tuned classification, cross-modal retrieval, report generation, and semantic segmentation. Our results show that T3D consistently outperforms existing vSSL and multimodal methods, demonstrating superior zero-shot and fine-tuning capabilities and setting a new benchmark for 3D medical image understanding.
AU  - Liu,C
AU  - Ouyang,C
AU  - Chen,Y
AU  - Quilodrán-Casas,C
AU  - Ma,L
AU  - Fu,J
AU  - Guo,Y
AU  - Shah,A
AU  - Bai,W
AU  - Arcucci,R
DO  - 10.1109/ICCVW69036.2025.00698
EP  - 6773
PY  - 2025///
SP  - 6763
TI  - T3D: Advancing 3D Medical Vision-Language Pre-Training by Learning Multi-View Visual Consistency
UR  - http://dx.doi.org/10.1109/ICCVW69036.2025.00698
ER  -

Download

Publications

Citation

BibTex format

RIS format (EndNote, RefMan)

Contact us

Address

Find us on social media