Publications
183 results found
Lertvittayakumjorn P, Specia L, Toni F, 2020, FIND: human-in-the-loop debugging deep text classifiers
Since obtaining a perfect training dataset (i.e., a dataset which isconsiderably large, unbiased, and well-representative of unseen cases) ishardly possible, many real-world text classifiers are trained on the available,yet imperfect, datasets. These classifiers are thus likely to have undesirableproperties. For instance, they may have biases against some sub-populations ormay not work effectively in the wild due to overfitting. In this paper, wepropose FIND -- a framework which enables humans to debug deep learning textclassifiers by disabling irrelevant hidden features. Experiments show that byusing FIND, humans can improve CNN text classifiers which were trained underdifferent types of imperfect datasets (including datasets with biases anddatasets with dissimilar train-test distributions).
Fomicheva M, Sun S, Fonseca E, et al., 2020, MLQE-PE: A multilingual quality estimation and post-editing dataset, Publisher: arXiv
We present MLQE-PE, a new dataset for Machine Translation (MT) QualityEstimation (QE) and Automatic Post-Editing (APE). The dataset contains sevenlanguage pairs, with human labels for 9,000 translations per language pair inthe following formats: sentence-level direct assessments and post-editingeffort, and word-level good/bad labels. It also contains the post-editedsentences, as well as titles of the articles where the sentences were extractedfrom, and the neural MT models used to translate the text.
Lertvittayakumjorn P, Specia L, Toni F, 2020, FIND: Human-in-the-loop debugging deep text classifiers, 2020 Conference on Empirical Methods in Natural Language Processing, Publisher: ACL
Since obtaining a perfect training dataset (i.e., a dataset which is considerably large, unbiased, and well-representative of unseen cases)is hardly possible, many real-world text classifiers are trained on the available, yet imperfect, datasets. These classifiers are thus likely to have undesirable properties. For instance, they may have biases against some sub-populations or may not work effectively in the wild due to overfitting. In this paper, we propose FIND–a framework which enables humans to debug deep learning text classifiers by disabling irrelevant hidden features. Experiments show that by using FIND, humans can improve CNN text classifiers which were trained under different types of imperfect datasets (including datasets with biases and datasets with dissimilar train-test distributions).
Caglayan O, Ive J, Haralampieva V, et al., 2020, Simultaneous machine translation with visual context, Transactions of the Association for Computational Linguistics, Vol: 8, Pages: 539-555, ISSN: 2307-387X
Simultaneous machine translation (SiMT) aims to translate a continuous inputtext stream into another language with the lowest latency and highest qualitypossible. The translation thus has to start with an incomplete source text,which is read progressively, creating the need for anticipation. In this paper,we seek to understand whether the addition of visual information can compensatefor the missing source context. To this end, we analyse the impact of differentmultimodal approaches and visual features on state-of-the-art SiMT frameworks.Our results show that visual context is helpful and that visually-groundedmodels based on explicit object region information are much better thancommonly used global features, reaching up to 3 BLEU points improvement underlow latency scenarios. Our qualitative analysis illustrates cases where onlythe multimodal systems are able to translate correctly from English intogender-marked languages, as well as deal with differences in word order, suchas adjective-noun placement between English and French.
Fomicheva M, Sun S, Yankovskaya L, et al., 2020, Unsupervised quality estimation for neural machine translation, Transactions of the Association for Computational Linguistics, Vol: 8, Pages: 539-555, ISSN: 2307-387X
Quality Estimation (QE) is an important component in making Machine Translation (MT) useful in real-world applications, as it is aimed to inform the user on the quality of the MT output at test time. Existing approaches require large amounts of expert annotated data, computation and time for training. As an alternative, we devise an unsupervised approach to QE where no training or access to additional resources besides the MT system itself is required. Different from most of the current work that treats the MT system as a black box, we explore useful information that can be extracted from the MT system as a by-product of translation. By employing methods for uncertainty quantification, we achieve very good correlation with human judgments of quality, rivalling state-of-the-art supervised QE models. To evaluate our approach we collect the first dataset that enables work on both black-box and glass-box approaches to QE.
Fomicheva M, Sun S, Yankovskaya L, et al., 2020, Unsupervised quality estimation for neural machine translation, Publisher: arXiv
Quality Estimation (QE) is an important component in making MachineTranslation (MT) useful in real-world applications, as it is aimed to informthe user on the quality of the MT output at test time. Existing approachesrequire large amounts of expert annotated data, computation and time fortraining. As an alternative, we devise an unsupervised approach to QE where notraining or access to additional resources besides the MT system itself isrequired. Different from most of the current work that treats the MT system asa black box, we explore useful information that can be extracted from the MTsystem as a by-product of translation. By employing methods for uncertaintyquantification, we achieve very good correlation with human judgments ofquality, rivalling state-of-the-art supervised QE models. To evaluate ourapproach we collect the first dataset that enables work on both black-box andglass-box approaches to QE.
Alva-Manchego F, Scarton C, Specia L, 2020, Data-driven sentence simplification: survey and benchmark, Computational Linguistics, Vol: 46, Pages: 135-187, ISSN: 0891-2017
Sentence Simplification (SS) aims to modify a sentence in order to make it easier to read and understand. In order to do so, several rewriting transformations can be performed such as replacement, reordering, and splitting. Executing these transformations while keeping sentences grammatical, preserving their main idea, and generating simpler output, is a challenging and still far from solved problem. In this article, we survey research on SS, focusing on approaches that attempt to learn how to simplify using corpora of aligned original-simplified sentence pairs in English, which is the dominant paradigm nowadays. We also include a benchmark of different approaches on common data sets so as to compare them and highlight their strengths and limitations. We expect that this survey will serve as a starting point for researchers interested in the task and help spark new ideas for future developments.
Specia L, Barrault L, Caglayan O, et al., 2020, Grounded Sequence to Sequence Transduction, IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, Vol: 14, Pages: 577-591, ISSN: 1932-4553
Caglayan O, Ive J, Haralampieva V, et al., 2020, Simultaneous Machine Translation with Visual Context, Conference on Empirical Methods in Natural Language Processing (EMNLP), Publisher: ASSOC COMPUTATIONAL LINGUISTICS-ACL, Pages: 2350-2361
- Author Web Link
- Cite
- Citations: 7
Ive J, Specia L, Szoc S, et al., 2020, A Post-Editing Dataset in the Legal Domain: Do we Underestimate Neural Machine Translation Quality?, 12th International Conference on Language Resources and Evaluation (LREC), Publisher: EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA, Pages: 3692-3697
- Author Web Link
- Cite
- Citations: 3
Okabe S, Blain F, Specia L, 2020, Multimodal Quality Estimation for Machine Translation, 58th Annual Meeting of the Association-for-Computational-Linguistics (ACL), Publisher: ASSOC COMPUTATIONAL LINGUISTICS-ACL, Pages: 1233-1240
Fomincheva M, Specia L, Guzman F, 2020, Multi-Hypothesis Machine Translation Evaluation, 58th Annual Meeting of the Association-for-Computational-Linguistics (ACL), Publisher: ASSOC COMPUTATIONAL LINGUISTICS-ACL, Pages: 1218-1232
- Author Web Link
- Cite
- Citations: 6
Scarton C, Madhyastha P, Specia L, 2020, Deciding <i>When</i>, <i>How</i> and <i>for Whom</i> to Simplify, 24th European Conference on Artificial Intelligence (ECAI), Publisher: IOS PRESS, Pages: 2172-2179, ISSN: 0922-6389
- Author Web Link
- Cite
- Citations: 1
Li Z, Fomicheva M, Specia L, 2020, Exploring Model Consensus to Generate Translation Paraphrases, 4th Workshop on Neural Generation and Translation, Publisher: ASSOC COMPUTATIONAL LINGUISTICS-ACL, Pages: 161-168
Sung S, Fomicheva M, Blain F, et al., 2020, An Exploratory Study on Multilingual Quality Estimation, 1st Conference of the Asia-Pacific Chapter of the Association-for-Computational-Linguistics / 10th International Joint Conference on Natural Language Processing (AACL-IJCNLP), Publisher: ASSOC COMPUTATIONAL LINGUISTICS-ACL, Pages: 366-377
- Author Web Link
- Cite
- Citations: 2
Alva-Manchego F, Martin L, Bordes A, et al., 2020, ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), Pages: 4668-4679
- Author Web Link
- Cite
- Citations: 14
Sulubacak U, Caglayan O, Grönroos S-A, et al., 2019, Multimodal machine translation through visuals and speech, Publisher: arXiv
Multimodal machine translation involves drawing information from more thanone modality, based on the assumption that the additional modalities willcontain useful alternative views of the input data. The most prominent tasks inthis area are spoken language translation, image-guided translation, andvideo-guided translation, which exploit audio and visual modalities,respectively. These tasks are distinguished from their monolingual counterpartsof speech recognition, image captioning, and video captioning by therequirement of models to generate outputs in a different language. This surveyreviews the major data resources for these tasks, the evaluation campaignsconcentrated around them, the state of the art in end-to-end and pipelineapproaches, and also the challenges in performance evaluation. The paperconcludes with a discussion of directions for future research in these areas:the need for more expansive and challenging datasets, for targeted evaluationsof model performance, and for multimodality in both the input and output space.
Ive J, Madhyastha P, Specia L, 2019, Deep copycat networks for text-to-text generation., Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Pages: 3225-3234
Most text-to-text generation tasks, for example text summarisation and text simplification, require copying words from the input to the output. We introduce Copycat, a transformer-based pointer network for such tasks which obtains competitive results in abstractive text summarisation and generates more abstractive summaries. We propose a further extension of this architecture for automatic post-editing, where generation is conditioned over two inputs (source language and machine translation), and the model is capable of deciding where to copy information from. This approach achieves competitive performance when compared to state-of-the-art automated post-editing systems. More importantly, we show that it addresses a well-known limitation of automatic post-editing - overcorrecting translations - and that our novel mechanism for copying source language words improves the results.
Li Z, Specia L, 2019, Improving neural machine translation robustness via data augmentation: beyond back-translation, Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019), Publisher: Association for Computational Linguistics, Pages: 328-336
Neural Machine Translation (NMT) models have been proved strong when translating clean texts, but they are very sensitive to noise in the input. Improving NMT models robustness can be seen as a form of “domain” adaption to noise. The recently created Machine Translation on Noisy Text task corpus provides noisy-clean parallel data for a few language pairs, but this data is very limited in size and diversity. The state-of-the-art approaches are heavily dependent on large volumes of back-translated data. This paper has two main contributions: Firstly, we propose new data augmentation methods to extend limited noisy data and further improve NMT robustness to noise while keeping the models small. Secondly, we explore the effect of utilizing noise from external data in the form of speech transcripts and show that it could help robustness.
Wu Z, Caglayan O, Ive J, et al., 2019, Transformer-based Cascaded Multimodal Speech Translation
This paper describes the cascaded multimodal speech translation systemsdeveloped by Imperial College London for the IWSLT 2019 evaluation campaign.The architecture consists of an automatic speech recognition (ASR) systemfollowed by a Transformer-based multimodal machine translation (MMT) system.While the ASR component is identical across the experiments, the MMT modelvaries in terms of the way of integrating the visual context (simpleconditioning vs. attention), the type of visual features exploited (pooled,convolutional, action categories) and the underlying architecture. For thelatter, we explore both the canonical transformer and its deliberation versionwith additive and cascade variants which differ in how they integrate thetextual attention. Upon conducting extensive experiments, we found that (i) theexplored visual integration schemes often harm the translation performance forthe transformer and additive deliberation, but considerably improve the cascadedeliberation; (ii) the transformer and cascade deliberation integrate thevisual modality better than the additive deliberation, as shown by theincongruence analysis.
Fomicheva M, Specia L, 2019, Taking MT evaluation metrics to extremes: beyond correlation with human judgments, Computational Linguistics, Vol: 45, Pages: 515-558, ISSN: 0891-2017
Automatic Machine Translation (MT) evaluation is an active field of research, with a handful of new metrics devised every year. Evaluation metrics are generally benchmarked against manual assessment of translation quality, with performance measured in terms of overall correlation with human scores. Much work has been dedicated to the improvement of evaluation metrics to achieve a higher correlation with human judgments. However, little insight has been provided regarding the weaknesses and strengths of existing approaches and their behavior in different settings. In this work we conduct a broad meta-evaluation study of the performance of a wide range of evaluation metrics focusing on three major aspects. First, we analyze the performance of the metrics when faced with different levels of translation quality, proposing a local dependency measure as an alternative to the standard, global correlation coefficient. We show that metric performance varies significantly across different levels of MT quality: Metrics perform poorly when faced with low-quality translations and are not able to capture nuanced quality distinctions. Interestingly, we show that evaluating low-quality translations is also more challenging for humans. Second, we show that metrics are more reliable when evaluating neural MT than the traditional statistical MT systems. Finally, we show that the difference in the evaluation accuracy for different metrics is maintained even if the gold standard scores are based on different criteria.
Madhyastha P, Wang J, Specia L, 2019, End-to-end image captioning exploits multimodal distributional similarity
We hypothesize that end-to-end neural image captioning systems work seemingly well because they exploit and learn 'distributional similarity' in a multimodal feature space by mapping a test image to similar training images in this space and generating a caption from the same space. To validate our hypothesis, we focus on the 'image' side of image captioning, and vary the input image representation but keep the RNN text generation component of a CNN-RNN model constant. Our analysis indicates that image captioning models (i) are capable of separating structure from noisy input representations; (ii) suffer virtually no significant performance loss when a high dimensional representation is compressed to a lower dimensional space; (iii) cluster images with similar visual and linguistic information together. Our findings indicate that our distributional similarity hypothesis holds. We conclude that regardless of the image representation used image captioning systems seem to match images and generate captions in a learned joint image-text semantic subspace.
Wang J, Specia L, 2019, Phrase Localization Without Paired Training Examples, Publisher: IEEE COMPUTER SOC
- Author Web Link
- Cite
- Citations: 29
Lala C, Madhyastha P, Specia L, 2019, Grounded Word Sense Translation, Proceedings of the Second Workshop on Shortcomings in Vision and Language, Publisher: Association for Computational Linguistics
Chow J, Specia L, Madhyastha P, 2019, WMDO: Fluency-based Word Mover's Distance for Machine Translation Evaluation., Proceedings of the Fourth Conference on Machine Translation, WMT 2019, Florence, Italy, August 1-2, 2019 - Volume 2: Shared Task Papers, Day 1, Pages: 494-500
Caglayan O, Wu Z, Madhyastha P, et al., 2019, Imperial College London Submission to VATEX Video Captioning Task.
Li Z, Specia L, 2019, A Comparison on Fine-grained Pre-trained Embeddings for the WMT19 Chinese-English News Translation Task, 4th Conference on Machine Translation (WMT), Publisher: ASSOC COMPUTATIONAL LINGUISTICS-ACL, Pages: 249-256
Alva-Manchego F, Martin L, Scarton C, et al., 2019, EASSE: Easier Automatic Sentence Simplification Evaluation, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, Pages: 49-54
- Author Web Link
- Cite
- Citations: 11
Wu Z, Ive J, Wang J, et al., 2019, Predicting Actions to Help Predict Translations.
Wang Z, Ive J, Velupillai S, et al., 2019, Is artificial data useful for biomedical Natural Language Processing algorithms?, SIGBIOMED WORKSHOP ON BIOMEDICAL NATURAL LANGUAGE PROCESSING (BIONLP 2019), Pages: 240-249
- Author Web Link
- Cite
- Citations: 4
This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.