Publications

BibTex format

@unpublished{Citamak:2020,
author = {Citamak, B and Caglayan, O and Kuyu, M and Erdem, E and Erdem, A and Madhyastha, P and Specia, L},
publisher = {arXiv},
title = {MSVD-Turkish: A comprehensive multimodal dataset for integrated vision  and language research in Turkish},
url = {http://arxiv.org/abs/2012.07098v1},
year = {2020}
}

Download

RIS format (EndNote, RefMan)

TY  - UNPB
AB  - Automatic generation of video descriptions in natural language, also calledvideo captioning, aims to understand the visual content of the video andproduce a natural language sentence depicting the objects and actions in thescene. This challenging integrated vision and language problem, however, hasbeen predominantly addressed for English. The lack of data and the linguisticproperties of other languages limit the success of existing approaches for suchlanguages. In this paper we target Turkish, a morphologically rich andagglutinative language that has very different properties compared to English.To do so, we create the first large scale video captioning dataset for thislanguage by carefully translating the English descriptions of the videos in theMSVD (Microsoft Research Video Description Corpus) dataset into Turkish. Inaddition to enabling research in video captioning in Turkish, the parallelEnglish-Turkish descriptions also enables the study of the role of videocontext in (multimodal) machine translation. In our experiments, we buildmodels for both video captioning and multimodal machine translation andinvestigate the effect of different word segmentation approaches and differentneural architectures to better address the properties of Turkish. We hope thatthe MSVD-Turkish dataset and the results reported in this work will lead tobetter video captioning and multimodal machine translation models for Turkishand other morphology rich and agglutinative languages.
AU  - Citamak,B
AU  - Caglayan,O
AU  - Kuyu,M
AU  - Erdem,E
AU  - Erdem,A
AU  - Madhyastha,P
AU  - Specia,L
PB  - arXiv
PY  - 2020///
TI  - MSVD-Turkish: A comprehensive multimodal dataset for integrated vision  and language research in Turkish
UR  - http://arxiv.org/abs/2012.07098v1
UR  - http://hdl.handle.net/10044/1/86121
ER  -

Download

Professor Lucia Specia

Contact

Location

Summary

Citation

BibTex format

RIS format (EndNote, RefMan)