Imperial College London

Professor Lucia Specia

Faculty of EngineeringDepartment of Computing

Chair in Natural Language Processing
 
 
 
//

Contact

 

l.specia Website

 
 
//

Location

 

572aHuxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Citation

BibTex format

@inproceedings{Li:2019:v1/d19-5543,
author = {Li, Z and Specia, L},
doi = {v1/d19-5543},
pages = {328--336},
publisher = {Association for Computational Linguistics},
title = {Improving neural machine translation robustness via data augmentation: beyond back-translation},
url = {http://dx.doi.org/10.18653/v1/d19-5543},
year = {2019}
}

RIS format (EndNote, RefMan)

TY  - CPAPER
AB - Neural Machine Translation (NMT) models have been proved strong when translating clean texts, but they are very sensitive to noise in the input. Improving NMT models robustness can be seen as a form of “domain” adaption to noise. The recently created Machine Translation on Noisy Text task corpus provides noisy-clean parallel data for a few language pairs, but this data is very limited in size and diversity. The state-of-the-art approaches are heavily dependent on large volumes of back-translated data. This paper has two main contributions: Firstly, we propose new data augmentation methods to extend limited noisy data and further improve NMT robustness to noise while keeping the models small. Secondly, we explore the effect of utilizing noise from external data in the form of speech transcripts and show that it could help robustness.
AU - Li,Z
AU - Specia,L
DO - v1/d19-5543
EP - 336
PB - Association for Computational Linguistics
PY - 2019///
SP - 328
TI - Improving neural machine translation robustness via data augmentation: beyond back-translation
UR - http://dx.doi.org/10.18653/v1/d19-5543
UR - https://www.aclweb.org/anthology/D19-5543/
UR - http://hdl.handle.net/10044/1/79499
ER -