Imperial College London

Professor Lucia Specia

Faculty of EngineeringDepartment of Computing

Chair in Natural Language Processing
 
 
 
//

Contact

 

l.specia Website

 
 
//

Location

 

572aHuxley BuildingSouth Kensington Campus

//

Summary

 

Publications

Publication Type
Year
to

183 results found

Elliott D, Frank S, Sima'an K, Specia Let al., 2016, Multi30K: multilingual English-German image descriptions, Publisher: arXiv

We introduce the Multi30K dataset to stimulate multilingual multimodalresearch. Recent advances in image description have been demonstrated on English-language datasets almost exclusively, but image description should not be limited to English. This dataset extends the Flickr30K dataset with i) German translations created by professional translators over a subset of the English descriptions, and ii) descriptions crowdsourced independently of the original English descriptions. We outline how the data can be used for multilingual image description and multimodal machine translation, but we anticipate the data will be useful for a broader range of tasks.

Working paper

Paetzold GH, Specia L, 2016, Understanding the lexical simplification needs of non-native speakers of English, Pages: 717-727

We report three user studies in which the Lexical Simplification needs of non-native English speakers are investigated. Our analyses feature valuable new insight on the relationship between the non-natives' notion of complexity and various morphological, semantic and lexical word properties. Some of our findings contradict long-standing misconceptions about word simplicity. The data produced in our studies consists of 211,564 annotations made by 1,100 volunteers, which we hope will guide forthcoming research on Text Simplification for non-native speakers of English.

Conference paper

Beck D, Specia L, Cohn T, 2016, Exploring prediction uncertainty in machine translation quality estimation, Pages: 208-218

Machine Translation Quality Estimation is a notoriously difficult task, which lessens its usefulness in real-world translation environments. Such scenarios can be improved if quality predictions are accompanied by a measure of uncertainty. However, models in this task are traditionally evaluated only in terms of point estimate metrics, which do not take prediction uncertainty into account. We investigate probabilistic methods for Quality Estimation that can provide well-calibrated uncertainty estimates and evaluate them in terms of their full posterior predictive distributions. We also show how this posterior information can be useful in an asymmetric risk scenario, which aims to capture typical situations in translation workflows.

Conference paper

Paetzold GH, Specia L, 2016, Collecting and exploring everyday language for predicting psycholinguistic properties of words, Pages: 1669-1679

Exploring language usage through frequency analysis in large corpora is a defining feature in most recent work in corpus and computational linguistics. From a psycholinguistic perspective, however, the corpora used in these contributions are often not representative of language usage: they are either domain-specific, limited in size, or extracted from unreliable sources. In an effort to address this limitation, we introduce SubIMDB, a corpus of everyday language spoken text we created which contains over 225 million words. The corpus was extracted from 38,102 subtitles of family, comedy and children movies and series, and is the first sizeable structured corpus of subtitles made available. Our experiments show that word frequency norms extracted from this corpus are more effective than those from well-known norms such as Kucera-Francis, HAL and SUBTLEXus in predicting various psycholinguistic properties of words, such as lexical decision times, familiarity, age of acquisition and simplicity. We also provide evidence that contradict the long-standing assumption that the ideal size for a corpus can be determined solely based on how well its word frequencies correlate with lexical decision times.

Conference paper

Scarton C, Paetzold GH, Specia L, 2016, Quality estimation for language output applications, Pages: 14-17

Conference paper

Paetzold GH, Specia L, 2016, Anita: An intelligent text adaptation tool, Pages: 79-83

We introduce Anita: a flexible and intelligent Text Adaptation tool for web content that provides Text Simplification and Text Enhancement modules. Anita's simplification module features a state-of-the-art system that adapts texts according to the needs of individual users, and its enhancement module allows the user to search for a word's definitions, synonyms, translations, and visual cues through related images. These utilities are brought together in an easy-to-use interface of a freely available web browser extension.

Conference paper

Shah K, Specia L, 2016, Large-scale multitask learning for machine translation quality estimation, Pages: 558-567

Multitask learning has been proven a useful technique in a number of Natural Language Processing applications where data is scarce and naturally diverse. Examples include learning from data of different domains and learning from labels provided by multiple annotators. Tasks in these scenarios would be the domains or the annotators. When faced with limited data for each task, a framework for the learning of tasks in parallel while using a shared representation is clearly helpful: what is learned for a given task can be transferred to other tasks while the peculiarities of each task are still modelled. Focusing on machine translation quality estimation as application, in this paper we show that multitask learning is also useful in cases where data is abundant. Based on two large-scale datasets, we explore models with multiple annotators and multiple languages and show that state-of-the-art multitask learning algorithms lead to improved results in all settings.

Conference paper

Paetzold GH, Specia L, 2016, Inferring psycholinguistic properties of words, Pages: 435-440

We introduce a bootstrapping algorithm for regression that exploits word embedding models. We use it to infer four psycholinguistic properties of words: Familiarity, Age of Acquisition, Concreteness and Imagery and further populate the MRC Psycholinguistic Database with these properties. The approach achieves 0.88 correlation with human-produced values and the inferred psycholinguistic features lead to state-of-the-art results when used in a Lexical Simplification task.

Conference paper

Paetzold GH, Specia L, 2016, SemEval 2016 task 11: Complex word identification, Pages: 560-569

We report the findings of the Complex Word Identification task of SemEval 2016. To create a dataset, we conduct a user study with 400 non-native English speakers, and find that complex words tend to be rarer, less ambiguous and shorter. A total of 42 systems were submitted from 21 distinct teams, and nine baselines were provided. The results highlight the effectiveness of Decision Trees and Ensemble methods for the task, but ultimately reveal that word frequencies remain the most reliable predictor of word complexity.

Conference paper

Paetzold GH, Specia L, 2016, SV000gg at SemEval-2016 task 11: Heavy gauge complex word identification with system voting, Pages: 969-974

We introduce the SV000gg systems: two Ensemble Methods for the Complex Word Identification task of SemEval 2016. While the SV000gg-Hard system exploits basic Hard Voting, the SV000gg-Soft system employs Performance-Oriented Soft Voting, which weights votes according to the voter's performance rather than its prediction confidence, allowing for completely heterogeneous systems to be combined. Our performance comparison shows that our voting techniques outperform traditional Soft Voting, as well as other systems submitted to the shared task, ranking first and second overall.

Conference paper

Tan L, Scarton C, Specia L, Van Genabith Jet al., 2016, SAARSHEFF at SemEval-2016 task 1: Semantic textual similarity with machine translation evaluation metrics and (eXtreme) boosted tree ensembles, Pages: 628-633

This paper describes the SAARSHEFF systems that participated in the English Semantic Textual Similarity (STS) task in SemEval-2016. We extend the work on using machine translation (MT) metrics in the STS task by automatically annotating the STS datasets with a variety of MT scores for each pair of text snippets in the STS datasets. We trained our systems using boosted tree ensembles and achieved competitive results that outperforms he median Pearson correlation scores from all participating systems.

Conference paper

Fomicheva M, Specia L, 2016, Reference Bias in Monolingual Machine Translation Evaluation, 54th Annual Meeting of the Association-for-Computational-Linguistics (ACL), Publisher: ASSOC COMPUTATIONAL LINGUISTICS-ACL, Pages: 77-82

Conference paper

Paetzold GH, Specia L, 2016, Benchmarking Lexical Simplification Systems, 10th International Conference on Language Resources and Evaluation (LREC), Publisher: EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA, Pages: 3074-3080

Conference paper

Paetzold GH, Specia L, 2016, Unsupervised Lexical Simplification for Non-Native Speakers, 30th Association-for-the-Advancement-of-Artificial-Intelligence (AAAI) Conference on Artificial Intelligence, Publisher: ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE, Pages: 3761-3767, ISSN: 2159-5399

Conference paper

Blain F, Logacheva V, Specia L, 2016, Phrase-Level Segmentation and Labelling of Machine Translation Errors, 10th International Conference on Language Resources and Evaluation (LREC), Publisher: EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA, Pages: 2240-2245

Conference paper

Logacheva V, Lukasik M, Specia L, 2016, Metrics for Evaluation of Word-Level Machine Translation Quality Estimation, 54th Annual Meeting of the Association-for-Computational-Linguistics (ACL), Publisher: ASSOC COMPUTATIONAL LINGUISTICS-ACL, Pages: 585-590

Conference paper

Steele D, Specia L, 2016, Predicting and Using Implicit Discourse Elements in Chinese-English Translation, BALTIC JOURNAL OF MODERN COMPUTING, Vol: 4, Pages: 305-317, ISSN: 2255-8942

Journal article

Logacheva V, Hokamp C, Specia L, 2016, MARMOT: A Toolkit for Translation Quality Estimation at the Word Level, 10th International Conference on Language Resources and Evaluation (LREC), Publisher: EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA, Pages: 3671-3674

Conference paper

Scarton C, Specia L, 2016, A Reading Comprehension Corpus for Machine Translation Evaluation, 10th International Conference on Language Resources and Evaluation (LREC), Publisher: EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA, Pages: 3652-3658

Conference paper

Ng RWM, Shah K, Specia L, Hain Tet al., 2016, GROUPWISE LEARNING FOR ASR K-BEST LIST RERANKING IN SPOKEN LANGUAGE TRANSLATION, 41st IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Publisher: IEEE, Pages: 6120-6124, ISSN: 1520-6149

Conference paper

Sim Smith K, Aziz W, Specia L, 2016, The Trouble with Machine Translation Coherence, BALTIC JOURNAL OF MODERN COMPUTING, Vol: 4, Pages: 178-189, ISSN: 2255-8942

Journal article

Bechara H, Parra Escartin C, Orasan C, Specia Let al., 2016, Semantic Textual Similarity in Quality Estimation, BALTIC JOURNAL OF MODERN COMPUTING, Vol: 4, Pages: 256-268, ISSN: 2255-8942

Journal article

Smith KS, Aziz W, Specia L, 2016, Cohere: A Toolkit for Local Coherence, 10th International Conference on Language Resources and Evaluation (LREC), Publisher: EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA, Pages: 4111-4114

Conference paper

Ng RWM, Doulaty M, Doddipatla R, Aziz W, Shah K, Saz O, Hasan M, AlHarbi G, Specia L, Hain Tet al., 2015, The USFD Spoken Language Translation System for IWSLT 2014

The University of Sheffield (USFD) participated in the International Workshopfor Spoken Language Translation (IWSLT) in 2014. In this paper, we willintroduce the USFD SLT system for IWSLT. Automatic speech recognition (ASR) isachieved by two multi-pass deep neural network systems with adaptation andrescoring techniques. Machine translation (MT) is achieved by a phrase-basedsystem. The USFD primary system incorporates state-of-the-art ASR and MTtechniques and gives a BLEU score of 23.45 and 14.75 on the English-to-Frenchand English-to-German speech-to-text translation task with the IWSLT 2014 data.The USFD contrastive systems explore the integration of ASR and MT by using aquality estimation system to rescore the ASR outputs, optimising towards bettertranslation. This gives a further 0.54 and 0.26 BLEU improvement respectivelyon the IWSLT 2012 and 2014 evaluation data.

Conference paper

Beck D, Cohn T, Hardmeier C, Specia Let al., 2015, Learning Structural Kernels for Natural Language Processing, Transactions of the Association for Computational Linguistics, Vol: 3, Pages: 461-473

Structural kernels are a flexible learning paradigm that has been widely used in Natural Language Processing. However, the problem of model selection in kernel-based methods is usually overlooked. Previous approaches mostly rely on setting default values for kernel hyperparameters or using grid search, which is slow and coarse-grained. In contrast, Bayesian methods allow efficient model selection by maximizing the evidence on the training data through gradient-based methods. In this paper we show how to perform this in the context of structural kernels by using Gaussian Processes. Experimental results on tree kernels show that this procedure results in better prediction performance compared to hyperparameter optimization via grid search. The framework proposed in this paper can be adapted to other structures besides trees, e.g., strings and graphs, thereby extending the utility of kernel-based methods.

Journal article

Shah K, Cohn T, Specia L, 2015, A Bayesian non-linear method for feature selection in machine translation quality estimation, MACHINE TRANSLATION, Vol: 29, Pages: 101-125, ISSN: 0922-6567

Journal article

Qin Y, Specia L, 2015, Truly exploring multiple references for machine translation evaluation, Pages: 113-120

Multiple references in machine translation evaluation are usually under-explored: they are ignored by alignment-based metrics and treated as bags of n-grams in string matching evaluation metrics, none of which take full advantage of the recurring information in these references. By exploring information on the n-gram distribution and on divergences in multiple references, we propose a method of ngram weighting and implement it to generate new versions of the popular BLEU and NIST metrics. Our metrics are tested in two into-English machine translation datasets. They lead to a significant increase in Pearson’s correlation with human fluency judgements at system-level evaluation. The new NIST metric also outperforms the standard NIST for document-level evaluation.

Conference paper

Scarton C, Zampieri M, Vela M, van Genabith J, Specia Let al., 2015, Searching for context: A study on document-level labels for translation quality estimation, Pages: 121-128

In this paper we analyse the use of popular automatic machine translation evaluation metrics to provide labels for quality estimation at document and paragraph levels. We highlight crucial limitations of such metrics for this task, mainly the fact that they disregard the discourse structure of the texts. To better understand these limitations, we designed experiments with human annotators and proposed a way of quantifying differences in translation quality that can only be observed when sentences are judged in the context of entire documents or paragraphs. Our results indicate that the use of context can lead to more informative labels for quality annotation beyond sentence level.

Conference paper

Paetzold GH, Specia L, Savourel Y, 2015, Okapi+QuEst: Translation quality estimation within Okapi

Conference paper

Logacheva V, Specia L, 2015, The role of artificially generated negative data for quality estimation of machine translation, Pages: 51-58

The modelling of natural language tasks using data-driven methods is often hindered by the problem of insufficient naturally occurring examples of certain linguistic constructs. The task we address in this paper – quality estimation (QE) of machine translation – suffers from lack of negative examples at training time, i.e., examples of low quality translation. We propose various ways to artificially generate examples of translations containing errors and evaluate the influence of these examples on the performance of QE models both at sentence and word levels.

Conference paper

This data is extracted from the Web of Science and reproduced under a licence from Thomson Reuters. You may not copy or re-distribute this data in whole or in part without the written consent of the Science business of Thomson Reuters.

Request URL: http://wlsprd.imperial.ac.uk:80/respub/WEB-INF/jsp/search-html.jsp Request URI: /respub/WEB-INF/jsp/search-html.jsp Query String: id=00527653&limit=30&person=true&page=4&respub-action=search.html