Publications

Madhyastha P, Wang J, Specia L, 2019, VIFIDEL: Evaluating the Visual Fidelity of Image Descriptions., Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, Pages: 6539-6550

Conference paper

He J, Madhyastha P, Specia L, 2019, Deep copycat Networks for Text-to-Text Generation, Conference on Empirical Methods in Natural Language Processing / 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Publisher: ASSOC COMPUTATIONAL LINGUISTICS-ACL, Pages: 3227-3236

Conference paper

Caglayan O, Madhyastha P, Specia L, Barrault Let al., 2019, Probing the Need for Visual Context in Multimodal Machine Translation., Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Pages: 4159-4170

Conference paper

Ive J, Madhyastha P, Specia L, 2019, Distilling Translations with Visual Awareness., Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers, Pages: 6525-6538

Conference paper

Sanabria R, Caglayan O, Palaskar S, Elliott D, Barrault L, Specia L, Metze Fet al., 2018, How2: A large-scale dataset for multimodal language understanding

In this paper, we introduce How2, a multimodal collection of instructionalvideos with English subtitles and crowdsourced Portuguese translations. We alsopresent integrated sequence-to-sequence baselines for machine translation,automatic speech recognition, spoken language translation, and multimodalsummarization. By making available data and code for several multimodal naturallanguage tasks, we hope to stimulate more research on these and similarchallenges, to obtain a deeper understanding of multimodality in languageprocessing.

Working paper

Smith KS, Specia L, 2018, Assessing crosslingual discourse relations in machine translation

In an attempt to improve overall translation quality, there has been anincreasing focus on integrating more linguistic elements into MachineTranslation (MT). While significant progress has been achieved, especiallyrecently with neural models, automatically evaluating the output of suchsystems is still an open problem. Current practice in MT evaluation relies on asingle reference translation, even though there are many ways of translating aparticular text, and it tends to disregard higher level information such asdiscourse. We propose a novel approach that assesses the translated outputbased on the source text rather than the reference translation, and measuresthe extent to which the semantics of the discourse elements (discourserelations, in particular) in the source text are preserved in the MT output.The challenge is to detect the discourse relations in the source text anddetermine whether these relations are correctly transferred crosslingually tothe target language -- without a reference translation. This methodology couldbe used independently for discourse-level evaluation, or as a component inother metrics, at a time where substantial amounts of MT are online and wouldbenefit from evaluation where the source text serves as a benchmark.

Working paper

Frank S, Elliott D, Specia L, 2018, Assessing multilingual multimodal image description: Studies of native speaker preferences and translator choices, NATURAL LANGUAGE ENGINEERING, Vol: 24, Pages: 393-413, ISSN: 1351-3249

Journal article

Specia L, Scarton C, Paetzold GH, 2018, Quality estimation for machine translation, Synthesis Lectures on Human Language Technologies, Pages: i-148

Many applications within natural language processing involve performing text-to-text transformations, i.e., given a text in natural language as input, systems are required to produce a version of this text (e.g., a translation), also in natural language, as output. Automatically evaluating the output of such systems is an important component in developing text-to-text applications. Two approaches have been proposed for this problem: (i) to compare the system outputs against one or more reference outputs using string matching-based evaluation metrics and (ii) to build models based on human feedback to predict the quality of system outputs without reference texts. Despite their popularity, reference-based evaluation metrics are faced with the challenge that multiple good (and bad) quality outputs can be produced by text-to-text approaches for the same input. This variation is very hard to capture, even with multiple reference texts. In addition, reference-based metrics cannot be used in production (e.g., online machine translation systems), when systems are expected to produce outputs for any unseen input. In this book, we focus on the second set of metrics, so-called Quality Estimation (QE) metrics, where the goal is to provide an estimate on how good or reliable the texts produced by an application are without access to gold-standard outputs. QE enables different types of evaluation that can target different types of users and applications. Machine learning techniques are used to build QE models with various types of quality labels and explicit features or learnt representations, which can then predict the quality of unseen system outputs. This book describes the topic of QE for text-to-text applications, covering quality labels, features, algorithms, evaluation, uses, and state-of-the-art approaches. It focuses on machine translation as application, since this represents most of the QE work done to date. It also briefly describes QE for several other applications

Abstract
Cite
Citations: 48

Book chapter

Chatterjee R, Negri M, Turchi M, Blain F, Specia Let al., 2018, Combining quality estimation and automatic post-editing to enhance machine translation output, Pages: 26-38

We investigate different strategies for combining quality estimation (QE) and automatic postediting (APE) to improve the output of machine translation (MT) systems. The joint contribution of the two technologies is analyzed in different settings, in which QE serves as either: i) an activator of APE corrections, or ii) a guidance to APE corrections, or iii) a selector of the final output to be returned to the user. In the first case (QE as activator), sentence-level predictions on the raw MT output quality are used to trigger its automatic correction when the estimated (TER) scores are below a certain threshold. In the second case (QE as guidance), word-level binary quality predictions (“good”/“bad”) are used to inform APE about problematic words in the MT output that should be corrected. In the last case (QE as selector), both sentence- and word-level quality predictions are used to identify the most accurate translation between the original MT output and its post-edited version. For the sake of comparison, the underlying APE technologies explored in our evaluation are both phrase-based and neural. Experiments are carried out on the English-German data used for the QE/APE shared tasks organized within the First Conference on Machine Translation (WMT 2016). Our evaluation shows positive but mixed results, with higher performance observed when word-level QE is used as a selector for neural APE applied to the output of a phrase-based MT system. Overall, our findings motivate further investigation on QE technologies. By reducing the gap between the performance of current solutions and “oracle” results, QE could significantly add to competitive APE technologies.

Abstract
Cite
Citations: 9

Conference paper

Specia L, 2018, Multi-modal context modelling for machine translation

MultiMT is an European Research Council Starting Grant whose aim is to devise data, methods and algorithms to exploit multi-modal information (images, audio, metadata) for context modelling in machine translation and other cross-lingual tasks. The project draws upon different research fields including natural language processing, computer vision, speech processing and machine learning.

Abstract
Cite

Conference paper

Yimam SM, Biemann C, Malmasi S, Paetzold GH, Specia L, Štajner S, Tack A, Zampieri Met al., 2018, A report on the complex word identification shared task 2018, Pages: 66-78

We report the findings of the second Complex Word Identification (CWI) shared task organized as part of the BEA workshop co-located with NAACL-HLT'2018. The second CWI shared task featured multilingual and multi-genre datasets divided into four tracks: English monolingual, German monolingual, Spanish monolingual, and a multilingual track with a French test set, and two tasks: binary classification and probabilistic classification. A total of 12 teams submitted their results in different task/track combinations and 11 of them wrote system description papers that are referred to in this report and appear in the BEA workshop proceedings.

Abstract
Cite
Citations: 98

Conference paper

Haddow B, Birch A, Forcada ML, Scarton C, Specia Let al., 2018, Exploring Gap Filling as a Cheaper Alternative to Reading Comprehension Questionnaires when Evaluating Machine Translation for Gisting, Pages: 192-203

A popular application of machine translation (MT) is gisting: MT is consumed as is to make sense of text in a foreign language. Evaluation of the usefulness of MT for gisting is surprisingly uncommon. The classical method uses reading comprehension questionnaires (RCQ), in which informants are asked to answer professionally-written questions in their language about a foreign text that has been machine-translated into their language. Recently, gap-filling (GF), a form of cloze testing, has been proposed as a cheaper alternative to RCQ. In GF, certain words are removed from reference translations and readers are asked to fill the gaps left using the machine-translated text as a hint. This paper reports, for the first time, a comparative evaluation, using both RCQ and GF, of translations from multiple MT systems for the same foreign texts, and a systematic study on the effect of variables such as gap density, gap-selection strategies, and document context in GF. The main findings of the study are: (a) both RCQ and GF clearly identify MT to be useful; (b) global RCQ and GF rankings for the MT systems are mostly in agreement; (c) GF scores vary very widely across informants, making comparisons among MT systems hard, and (d) unlike RCQ, which is framed around documents, GF evaluation can be framed at the sentence level. These findings support the use of GF as a cheaper alternative to RCQ.

Abstract
Cite
Citations: 6

Conference paper

Wang J, Madhyastha PS, Specia L, 2018, Object Counts! Bringing Explicit Detections Back into Image Captioning., Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2018, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 1 (Long Papers), Pages: 2180-2193

Conference paper

Specia L, Shah K, 2018, Machine translation quality estimation: applications and future perspectives, Machine Translation: Technologies and Applications, Editors: Moorkens, Castilho, Gaspari, Doherty, Publisher: Springer International Publishing, Pages: 201-235, ISBN: 9783319912400

Predicting the quality of machine translation (MT) output is a topic that has been attracting significant attention. By automatically distinguishing bad from good quality translations, it has the potential to make MT more useful in a number of applications. In this chapter we review various practical applications where quality estimation (QE) at sentence level has shown positive results: filtering low quality cases from post-editing, selecting the best MT system when multiple options are available, improving MT performance by selecting additional parallel data, and sampling for quality assurance by humans. Finally, we discuss QE at other levels (word and document) and general challenges in the field, as well as perspectives for novel directions and applications.

Book chapter

Lala C, Specia L, 2018, Multimodal Lexical Translation, 11th International Conference on Language Resources and Evaluation (LREC), Publisher: EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA, Pages: 3810-3817

Author Web Link
Cite
Citations: 11

Conference paper

Scarton C, Paetzold GH, Specia L, 2018, SimPA: A Sentence-Level Simplification Corpus for the Public Administration Domain, 11th International Conference on Language Resources and Evaluation (LREC), Publisher: EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA, Pages: 4333-4338

Author Web Link
Cite
Citations: 3

Conference paper

Madhyastha PS, Wang J, Specia L, 2018, Defoiling Foiled Image Captions., Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, New Orleans, Louisiana, USA, June 1-6, 2018, Volume 2 (Short Papers), Pages: 433-438

Conference paper

Scarton C, Specia L, 2018, Learning Simplifications for Specific Target Audiences, 56th Annual Meeting of the Association-for-Computational-Linguistics (ACL), Publisher: ASSOC COMPUTATIONAL LINGUISTICS-ACL, Pages: 712-718

Author Web Link
Cite
Citations: 20

Conference paper

Scarton C, Paetzold GH, Specia L, 2018, Text Simplification from Professionally Produced Corpora, 11th International Conference on Language Resources and Evaluation (LREC), Publisher: EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA, Pages: 3504-3510

Author Web Link
Cite
Citations: 3

Conference paper

Madhyastha PS, Wang J, Specia L, 2018, End-to-end Image Captioning Exploits Multimodal Distributional Similarity.

Conference paper

Madhyastha PS, Wang J, Specia L, 2018, The role of image representations in vision to language tasks., Natural Language Engineering, Vol: 24, Pages: 415-439

Journal article

Paetzold GH, Specia L, 2017, A Survey on Lexical Simplification, JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, Vol: 60, Pages: 549-593, ISSN: 1076-9757

Journal article

Zampieri M, Malmasi S, Paetzold G, Specia Let al., 2017, Complex Word Identification: Challenges in Data Annotation and System Performance

This paper revisits the problem of complex word identification (CWI)following up the SemEval CWI shared task. We use ensemble classifiers toinvestigate how well computational methods can discriminate between complex andnon-complex words. Furthermore, we analyze the classification performance tounderstand what makes lexical complexity challenging. Our findings show thatmost systems performed poorly on the SemEval CWI dataset, and one of thereasons for that is the way in which human annotation was performed.

Conference paper

Cer D, Diab M, Agirre E, Lopez-Gazpio I, Specia Let al., 2017, SemEval-2017 Task 1: semantic textual similarity - multilingual and cross-lingual focused evaluation, 11th International Workshop on Semantic Evaluation (SemEval-2017), Publisher: Association for Computational Linguistics, Pages: 1-14

Semantic Textual Similarity (STS) measures the meaning similarity of sentences. Applications include machine translation (MT), summarization, generation, question answering (QA), short answer grading, semantic search, dialog and conversational systems. The STS shared task is a venue for assessing the current state-of-the-art.The 2017 task focuses on multilingual and cross-lingual pairs with one sub-track exploring MT quality estimation (MTQE) data. The task obtained strong participation from 31 teams, with 17 participatingin all language tracks. We summarize performance and review a selection of well performing methods. Analysis highlights common errors, providing insight into the limitations of existing models. To supportongoing work on semantic representations, the STS Benchmark is introduced as a new shared training and evaluation set carefully selected from the corpus of English STS shared task data (2012-2017).

Conference paper

Rabinovich E, Mirkin S, Patel RN, Specia L, Wintner Set al., 2017, Personalized machine translation: preserving original author traits, 15th Conference of the European Chapter of the Association for Computational Linguistics, Publisher: The Association for Computational Linguistics, Pages: 1074-1084

© 2017 Association for Computational Linguistics. The language that we produce reflects our personality, and various personal and demographic characteristics can be detected in natural language texts. We focus on one particular personal trait of the author, gender, and study how it is manifested in original texts and in translations. We show that author's gender has a powerful, clear signal in originals texts, but this signal is obfuscated in human and machine translation. We then propose simple domainadaptation techniques that help retain the original gender traits in the translation, without harming the quality of the translation, thereby creating more personalized machine translation systems.

Conference paper

Paetzold GH, Specia L, 2017, Lexical simplification with neural ranking, Pages: 34-40

We present a new Lexical Simplification approach that exploits Neural Networks to learn substitutions from the Newsela corpus - a large set of professionally produced simplifications. We extract candidate substitutions by combining the Newsela corpus with a retrofitted context-aware word embeddings model and rank them using a new neural regression model that learns rankings from annotated data. This strategy leads to the highest Accuracy, Precision and F1 scores to date in standard datasets for the task.

Abstract
Cite
Citations: 56

Conference paper

Elliott D, Frank S, Barrault L, Bougares F, Specia Let al., 2017, Findings of the second shared task on multimodal machine translation and multilingual image description, Pages: 215-233

We present the results from the second shared task on multimodal machine translation and multilingual image description. Nine teams submitted 19 systems to two tasks. The multimodal translation task, in which the source sentence is supplemented by an image, was extended with a new language (French) and two new test sets. The multilingual image description task was changed such that at test time, only the image is given. Compared to last year, multimodal systems improved, but text-only systems remain competitive.

Abstract
Cite
Citations: 142

Conference paper

Deena S, Ng RWM, Madhyastha PS, Specia L, Hain Tet al., 2017, Exploring the use of acoustic embeddings in neural machine translation., 2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017, Okinawa, Japan, December 16-20, 2017, Pages: 450-457

Conference paper

Deena S, Ng RWM, Madhyastha PS, Specia L, Hain Tet al., 2017, Semi-Supervised Adaptation of RNNLMs by Fine-Tuning with Domain-Specific Auxiliary Features., Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017, Pages: 2715-2719

Conference paper

Paetzold GH, Specia L, 2016, Vicinity-driven paragraph and sentence alignment for comparable corpora, Publisher: arXiv

Parallel corpora have driven great progress in the field of Text Simplification. However, most sentence alignment algorithms either offer a limited range of alignment types supported, or simply ignore valuable clues present in comparable documents. We address this problem by introducing a new set of flexible vicinity-driven paragraph and sentence alignment algorithms that 1-N, N-1, N-N and long distance null alignments without the need for hard-to-replicate supervised models.

Working paper

Professor Lucia Specia

Contact

Location

Summary