Publications by year


Béchara, H., Orăsan, C., Parra Escartín, C., Zampieri, M. and Lowe, W. (2021) The Role of Machine Translation Quality Estimation in the Post-Editing Workflow, Informatics, 8(3), online, doi:10.3390/informatics8030061, Abstract:

As Machine Translation (MT) becomes increasingly ubiquitous, so does its use in professional translation workflows. However, its proliferation in the translation industry has brought about new challenges in the field of Post-Editing (PE). We are now faced with a need to find effective tools to assess the quality of MT systems to avoid underpayments and mistrust by professional translators. In this scenario, one promising field of study is MT Quality Estimation (MTQE), as this aims to determine the quality of an automatic translation and, indirectly, its degree of post-editing difficulty. However, its impact on the translation workflows and the translators’ cognitive load is still to be fully explored. We report on the results of an impact study engaging professional translators in PE tasks using MTQE. To assess the translators’ cognitive load we measure their productivity both in terms of time and effort (keystrokes) in three different scenarios: translating from scratch, post-editing without using MTQE, and post-editing using MTQE. Our results show that good MTQE information can improve post-editing efficiency and decrease the cognitive load on translators. This is especially true for cases with low MT quality.

Kanojia, D., Fomicheva, M., Ranasinghe, T., Blain, F., Orăsan, C. and Specia, L. (2021) Pushing the Right Buttons : Adversarial Evaluation of Quality Estimation, In Proceedings of the Sixth Conference on Machine Translation (WMT), pp. 608-621, online, Abstract: Current Machine Translation (MT) systems achieve very good results on a growing variety of language pairs and datasets. However, they are known to produce fluent translation outputs that can contain important meaning errors, thus undermining their reliability in practice. Quality Estimation (QE) is the task of automatically assessing the performance of MT systems at test time. Thus, in order to be useful, QE systems should be able to detect such errors. However, this ability is yet to be tested in the current evaluation practices, where QE systems are assessed only in terms of their correlation with human judgements. In this work, we bridge this gap by proposing a general methodology for adversarial testing of QE for MT. First, we show that despite a high correlation with human judgements achieved by the recent SOTA, certain types of meaning errors are still problematic for QE to detect. Second, we show that on average, the ability of a given model to discriminate between meaning-preserving and meaning-altering perturbations is predictive of its overall performance, thus potentially allowing for comparing QE systems without relying on manual quality annotation.
Orăsan, C. and Mitkov, R. (2021) Recent Developments in Natural Language Processing, In The Oxford Handbook of Computational Linguistics 2nd edition, Mitkov, R. (ed.), Oxford University Press, pp. 1-68, online, doi:10.1093/oxfordhb/9780199573691.013.005, Abstract: Natural Language Processing (NLP) is a dynamic and rapidly developing field in which new trends, techniques, and applications are constantly emerging. This chapter focuses mainly on recent developments in NLP which could not be covered in other chapters of the Handbook. Topics such as crowdsourcing and processing of large datasets, which are no longer that recent but are widely used and not covered at length in any other chapter, are also presented. The chapter starts by describing how the availability of tools and resources has had a positive impact on the field. The proliferation of user-generated content has led to the emergence of research topics such as sarcasm and irony detection, automatic assessment of user-generated content, and stance detection. All of these topics are discussed in the chapter. The field of NLP is approaching maturity, a fact corroborated by the latest developments in the processing of texts for financial purposes and for helping users with disabilities, two topics that are also discussed here. The chapter presents examples of how researchers have successfully combined research in computer vision and natural language processing to enable the processing of multimodal information, as well as how the latest advances in deep learning have revitalized research on chatbots and conversational agents. The chapter concludes with a comprehensive list of further reading material and additional resources.
Ranasinghe, T., Mitkov, R., Orăsan, C. and Quintana, R. C. (2021) Semantic textual similarity based on deep learning, In Corpora in Translation and Contrastive Research in the Digital Age, Lavid-López, J., Maíz-Arévalo, C., and Zamorano-Mansilla, J. R. (eds.), John Benjamins, pp. 102-124, online, doi:10.1075/btl.158.04ran, Abstract: This study proposes an original methodology to underpin the operation of new generation Translation Memory (TM) systems where the translations to be retrieved from the TM database are matched not on the basis of Levenshtein (edit) distance but by employing innovative Natural Language Processing (NLP) and Deep Learning (DL) techniques. Three DL sentence encoders were experimented with to retrieve TM matches in English-Spanish sentence pairs from the DGT TM dataset. Each sentence encoder was compared with Okapi which uses edit distance to retrieve the best match. The automatic evaluation shows the benefit of the DL technology for TM matching and holds promise for the implementation of the TM tool itself, which is our next project.


Ranasinghe, T., Orăsan, C. and Mitkov, R. (2020) Intelligent Translation Memory Matching and Retrieval with Sentence Encoders, In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, Lisbon, Portugal, pp. 175 - 184, online, Abstract: Matching and retrieving previously translated segments from a Translation Memory is the key functionality in Translation Memories systems. However this matching and retrieving process is still limited to algorithms based on edit distance which we have identified as a major drawback in Translation Memories systems. In this paper we introduce sentence encoders to improve the matching and retrieving process in Translation Memories systems - an effective and efficient solution to replace edit distance based algorithms.
Ranasinghe, T., Orăsan, C. and Mitkov, R. (2020) TransQuest at WMT2020: Sentence-Level Direct Assessment, In Proceedings ofthe 5th Conference on Machine Translation (WMT), Online, pp. 1047-1053, online, Abstract: This paper presents the team TransQuest's participation in Sentence-Level Direct Assessment shared task in WMT 2020. We introduce a simple QE framework based on cross-lingual transformers, and we use it to implement and evaluate two different neural architectures. The proposed methods achieve state-of-the-art results surpassing the results obtained by OpenKiwi, the baseline used in the shared task. We further fine tune the QE framework by performing ensemble and data augmentation. Our approach is the winning solution in all of the language pairs according to the WMT 2020 official results.
Ranasinghe, T., Orăsan, C. and Mitkov, R. (2020) TransQuest: Translation quality estimation with cross-lingual transformers, In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, pp. 5070-5081, online, doi:10.18653/v1/2020.coling-main.445, Abstract: Recent years have seen big advances in the field of sentence-level quality estimation (QE), largely as a result of using neural-based architectures. However, the majority of these methods work only on the language pair they are trained on and need retraining for new language pairs. This process can prove difficult from a technical point of view and is usually computationally expensive. In this paper we propose a simple QE framework based on cross-lingual transformers, and we use it to implement and evaluate two different neural architectures. Our evaluation shows that the proposed methods achieve state-of-the-art results outperforming current open-source quality estimation frameworks when trained on datasets from WMT. In addition, the framework proves very useful in transfer learning settings, especially when dealing with low-resourced languages, allowing us to obtain very competitive results.
Saadany, H. and Orăsan, C. (2020) Is it great or terrible? Preserving sentiment in neural machine translation of Arabic reviews, In Proceedings of the Fifth Arabic Natural Language Processing Workshop, Barcelona, Spain (Online), pp. 24-37, online, Abstract: Since the advent of Neural Machine Translation (NMT) approaches there has been a tremendous improvement in the quality of automatic translation. However, NMT output still lacks accuracy in some low-resource languages and sometimes makes major errors that need extensive post-editing. This is particularly noticeable with texts that do not follow common lexico-grammatical standards, such as user generated content (UGC). In this paper we investigate the challenges involved in translating book reviews from Arabic into English, with particular focus on the errors that lead to incorrect translation of sentiment polarity. Our study points to the special characteristics of Arabic UGC, examines the sentiment transfer errors made by Google Translate of Arabic UGC to English, analyzes why the problem occurs, and proposes an error typology specific of the translation of Arabic UGC. Our analysis shows that the output of online translation tools of Arabic UGC can either fail to transfer the sentiment at all by producing a neutral target text, or completely flips the sentiment polarity of the target word or phrase and hence delivers a wrong affect message. We address this problem by fine-tuning an NMT model with respect to sentiment polarity showing that this approach can significantly help with correcting sentiment errors detected in the online translation of Arabic UGC.


Evans, R. and Orăsan, C. (2019) Identifying signs of syntactic complexity for rule-based sentence simplification, Natural Language Engineering, 25(1), pp. 69-119, online, doi:10.1017/S1351324918000384, Abstract: This article presents a new method to automatically simplify English sentences. The approach is designed to reduce the number of compound clauses and nominally bound relative clauses in input sentences. The article provides an overview of a corpus annotated with information about various explicit signs of syntactic complexity and describes the two major components of a sentence simplification method that works by exploiting information on the signs occurring in the sentences of a text. The first component is a sign tagger which automatically classifies signs in accordance with the annotation scheme used to annotate the corpus. The second component is an iterative rule-based sentence transformation tool. Exploiting the sign tagger in conjunction with other NLP components, the sentence transformation tool automatically rewrites long sentences containing compound clauses and nominally bound relative clauses as sequences of shorter single-clause sentences. Evaluation of the different components reveals acceptable performance in rewriting sentences containing compound clauses but less accuracy when rewriting sentences containing nominally bound relative clauses. A detailed error analysis revealed that the major sources of error include inaccurate sign tagging, the relatively limited coverage of the rules used to rewrite sentences, and an inability to discriminate between various subtypes of clause coordination. Despite this, the system performed well in comparison with two baselines. This finding was reinforced by automatic estimations of the readability of system output and by surveys of readers’ opinions about the accuracy, accessibility, and meaning of this output.
Evans, R. and Orăsan, C. (2019) Sentence Simplification for Semantic Role Labelling and Information Extraction, In Proceedings of Recent Advances in Natural Language Processing (RANLP2019), Varna, Bulgaria, pp. 285-294, online, doi:10.26615/978-954-452-056-4_033, Abstract: In this paper, we report on the extrinsic evaluation of an automatic sentence simplification method with respect to two NLP tasks: semantic role labelling (SRL) and information extraction (IE). The paper begins with our observation of challenges in the intrinsic evaluation of sentence simplification systems, which motivates the use of extrinsic evaluation of these sys- tems with respect to other NLP tasks. We describe the two NLP systems and the test data used in the extrinsic evaluation, and present arguments and evidence motivating the integration of a sentence simplification step as a means of improving the accuracy of these systems. Our evaluation reveals that their performance is improved by the simplification step: the SRL system is better able to assign semantic roles to the majority of the arguments of verbs and the IE system is better able to identify fillers for all IE template slots.
Orăsan, C., Escartín, C. P., Torres, L. S. and Barbu, E. (2019) Exploiting Data-Driven Hybrid Approaches to Translation in the EXPERT Project, In Advances in Empirical Translation Studies, Cambridge University Press, pp. 198-216, online, doi:10.1017/9781108525695.011
Orăsan, C. (2019) Automatic summarisation: 25 years On, Natural Language Engineering, 25(6), pp. 735-751, online, doi:10.1017/S1351324919000524, Abstract: Automatic text summarisation is a topic that has been receiving attention from the research community from the early days of computational linguistics, but it really took off around 25 years ago. This article presents the main developments from the last 25 years. It starts by defining what a summary is and how its definition changed over time as a result of the interest in processing new types of documents. The article continues with a brief history of the field and highlights the main challenges posed by the evaluation of summaries. The article finishes with some thoughts about the future of the field.
Plum, A., Ranasinghe, T., Calleja, P., Orăsan, C. and Mitkov, R. (2019) RGCL-WLV at SemEval-2019 Task 12: Toponym Detection, In Proceedings ofthe 13th International Workshop on Semantic Evaluation (SemEval-2019), Minneapolis, Minnesota, USA, pp. 1297-1301, online, doi:10.18653/v1/S19-2228, Abstract: This article describes the system submitted by the RGCL-WLV team to the SemEval 2019 Task 12: Toponym resolution in scientific pa- pers. The system detects toponyms using a bootstrapped machine learning (ML) ap- proach which classifies names identified using gazetteers extracted from the GeoNames ge- ographical database. The paper evaluates the performance of several ML classifiers, as well as how the gazetteers influence the accuracy of the system. Several runs were submitted. The highest precision achieved for one of the sub- missions was 89%, albeit it at a relatively low recall of 49%.
Plum, A., Ranasinghe, T. and Orăsan, C. (2019) Toponym Detection in the Bio-Medical Domain: A Hybrid Approach with Deep Learning, In Proceedings of Recent Advances in Natural Language Processing (RANLP2019), Varna, Bulgaria, pp. 912-921, online, doi:10.26615/978-954-452-056-4_106, Abstract: This paper compares how different machine learning classifiers can be used together with simple string matching and named entity recognition to detect locations in texts. We compare five different state-of-the-art machine learning clas-sifiers in order to predict whether a sentence contains a location or not. Following this classification task, we use a string matching algorithm with a gazetteer to identify the exact index of a toponym within the sentence. We evaluate different approaches in terms of machine learning classifiers, text pre-processing and location extraction on the SemEval-2019 Task 12 dataset, compiled for toponym resolution in the bio-medical domain. Finally , we compare the results with our system that was previously submitted to the SemEval-2019 task evaluation.
Plum, A., Ranasinghe, T., Orăsan, C. and Mitkov, R. (2019) RGCL at GermEval 2019: Offensive Language Detection with Deep Learning, In Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019), Erlangen, Germany, pp. 423 - 428, online, Abstract: This paper describes the system submitted by the RGCL team to GermEval 2019 Shared Task 2: Identification of Offensive Language. We experimented with five different neural network architectures in order to classify Tweets in terms of offensive language. By means of comparative evaluation, we select the best performing for each of the three subtasks. Overall, we demonstrate that using only minimal preprocessing we are able to obtain competitive results.
Plum, A., Zampieri, M., Orăsan, C., Wandl-Vogt, E. and Mitkov, R. (2019) Large-scale Data Harvesting for Biographical Data, In Proceedings of the International Conference on Biographical Data in a Digital World 2019, Varna, Bulgaria, online, Abstract: This paper explores automatic methods to identify relevant biography candidates in large databases, and extract biographical information from encyclopedia entries and databases. In this work, relevant candidates are defined as people who have made an impact in a certain country or region within a pre-defined time frame. We investigate the case of people who had an impact in the Republic of Austria and died between 1951 and 2019. We use Wikipedia and Wikidata as data sources and compare the performance of our information extraction methods on these two databases. We demonstrate the usefulness of a natural language processing pipeline to identify suitable biography candidates and, in a second stage, extract relevant information about them. Even though they are considered by many as an identical resource, our results show that the data from Wikipedia and Wikidata differs in some cases and they can be used in a complementary way providing more data for the compilation of biographies.
Ranasinghe, T., Orăsan, C. and Mitkov, R. (2019) Semantic Textual Similarity with Siamese Neural Networks, In Proceedings of Recent Advances in Natural Language Processing (RANLP2019), Varna, Bulgaria, pp. 1005-1012, online, doi:10.26615/978-954-452-056-4_116, Abstract: Calculating the Semantic Textual Similarity (STS) is an important research area in natural language processing which plays a significant role in many applications such as question answering, document summar-isation, information retrieval and information extraction. This paper evaluates Sia-mese recurrent architectures, a special type of neural networks, which are used here to measure STS. Several variants of the architecture are compared with existing methods .
Ranasinghe, T., Saadany, H., Plum, A., Mandhari, S., Mohamed, E., Orăsan, C. and Mitkov, R. (2019) RGCL at IDAT : Deep Learning models for Irony Detection in Arabic Language, In Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2019), Kolkata, India, pp. 416 - 425, online, Abstract: This article describes the system submitted by the RGCL team to the IDAT 2019 Shared Task: Irony Detection in Arabic Tweets. The system detects irony in Arabic tweets using deep learning. The paper evaluates the performance of several deep learning models, as well as how text cleaning and text pre-processing influence the accuracy of the system. Several runs were submitted. The highest F1 score achieved for one of the submissions was 0.818 making the team RGCL rank 4th out of 10 teams in final results. Overall, we present a system that uses minimal pre-processing but capable of achieving competitive results.
Ranasinghe, T., Orăsan, C. and Mitkov, R. (2019) Enhancing Unsupervised Sentence Similarity Methods with Deep Contextualised Word Representations, In Proceedings of Recent Advances in Natural Language Processing (RANLP 2019), Varna, Bulgaria, pp. 994-1003, online, doi:10.26615/978-954-452-056-4_115, Abstract: Calculating Semantic Textual Similarity (STS) plays a significant role in many applications such as question answering, document summarisation, information re- trieval and information extraction. All modern state of the art STS methods rely on word embeddings one way or another. The recently introduced contextualised word embeddings have proved more effective than standard word embeddings in many natural language processing tasks. This paper evaluates the impact of several contextualised word embeddings on unsupervised STS methods and compares it with the existing supervised/unsupervised STS methods for different datasets in dif- ferent languages and different domains.
Yaneva, V., Orăsan, C., Ha, L. A. and Ponomareva, N. (2019) A Survey of the Perceived Text Adaptation Needs of Adults with Autism, In Proceedings of Recent Advances in Natural Language Processing (RANLP 2019), Varna, Bulgaria, pp. 1356-1363, online, doi:10.26615/978-954-452-056-4_155, Abstract: NLP approaches to automatic text adaptation often rely on user-need guidelines which are generic and do not account for the differences between various types of target groups. One such group are adults with high-functioning autism, who are usually able to read long sentences and comprehend difficult words but whose comprehension may be impeded by other linguistic constructions. This is especially challenging for real-world user-generated texts such as product reviews, which cannot be controlled editorially and are thus in a stronger need of automatic adaptation. To address this problem, we present a mixed-methods survey conducted with 24 adult web-users diagnosed with autism and an age-matched control group of 33 neurotypical par- ticipants. The aim of the survey is to identify whether the group with autism experiences any barriers when reading online reviews, what these potential barriers are, and what NLP methods would be best suited to im- prove the accessibility of online reviews for people with autism. The group with autism consistently reported significantly greater difficulties with understanding online product reviews compared to the control group and identified issues related to text length, poor topic organisation, identifying the intention of the author, trustworthiness, and the use of irony, sarcasm and exaggeration.
Temnikova, I., Orăsan, C., Pastor, G. C. and Mitkov, R. (eds.) (2019) Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019), Varna, Bulgaria, online


Gopalakrishna Pillai, R., Thelwall, M. and Orăsan, C. (2018) Detection of Stress and Relaxation Magnitudes for Tweets, In Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW '18, New York, New York, USA, ACM Press, pp. 1677 - 1684, online, doi:10.1145/3184558.3191627, Abstract: The ability to automatically detect human stress and relaxation is crucial for timely diagnosing stress-related diseases, ensuring customer satisfaction in services and managing human-centric applications such as traffic management. Traditional methods employ stress-measuring scales or physiological monitoring which may be intrusive and inconvenient. Instead, the ubiquitous nature of the social media can be leveraged to identify stress and relaxation, since many people habitually share their recent life experiences through social networking sites. This paper introduces an improved method to detect expressions of stress and relaxation in social media content. It uses word sense disambiguation by word sense vectors to improve the performance of the first and only lexicon-based stress/relaxation detection algorithm TensiStrength. Experimental results show that incorporating word sense disambiguation substantially improves the performance of the original TensiStrength. It performs better than state-of-the-art machine learning methods too in terms of Pearson correlation and percentage of exact matches. We also propose a novel framework for identifying the causal agents of stress and relaxation in tweets as future work.
Orăsan, C., Evans, R. and Mitkov, R. (2018) Intelligent Text Processing to Help Readers with Autism, In Intelligent Natural Language Processing: Trends and Applications, Shaalan, K., Hassanien, A., and Tolba, F. (eds.), Springer, pp. 713-740, online, doi:10.1007/978-3-319-67056-0_33, Abstract: Autistic Spectrum Disorder (ASD) is a neurodevelopmental disorder which has a life-long impact on the lives of people diagnosed with the condition. In many cases, people with ASD are unable to derive the gist or meaning of written documents due to their inability to process complex sentences, understand non-literal text, and understand uncommon and technical terms. This paper presents FIRST, an innovative project which developed language technology (LT) to make documents more accessible to people with ASD. The project has produced a powerful editor which enables carers of people with ASD to prepare texts suitable for this population. Assessment of the texts generated using the editor showed that they are not less readable than those generated more slowly as a result of onerous unaided conversion and were significantly more readable than the originals. Evaluation of the tool shows that it can have a positive impact on the lives of people with ASD.
Orăsan, C. (2018) Aggressive Language Identification Using Word Embeddings and Sentiment Features, In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), Santa Fe, USA, pp. 113 - 119, online, Abstract: This paper describes our participation in the First Shared Task on Aggression Identification. The method proposed relies on machine learning to identify social media texts which contain aggression. The main features employed by our method are information extracted from word embeddings and the output of a sentiment analyser. Several machine learning methods and different combinations of features were tried. The official submissions used Support Vector Machines and Random Forests. The official evaluation showed that for texts similar to the ones in the training dataset Random Forests work best, whilst for texts which are different SVMs are a better choice. The evaluation also showed that despite its simplicity the method performs well when compared with more elaborated methods.
Pillai, R. G., Thelwall, M. and Orăsan, C. (2018) Trouble on the Road : Finding Reasons for Commuter Stress from Tweets, In Proceedings ofthe Workshop on Intelligent Interactive Systems and Language Generation (2IS&NLG), Tilburg, The Netherlands, pp. 20 - 25, online, doi:10.18653/v1/W18-6705, Abstract: Intelligent Transportation Systems could benefit from harnessing social media content to get continuous feedback. In this work, we implement a system to identify reasons for stress in tweets related to traffic using a word vector strategy to select a reason from a predefined list generated by topic modeling and clustering. The proposed system, which performs better than standard machine learning algorithms, could provide inputs to warning systems for commuters in the area and feedback for the authorities.
Pillai, R. G., Thelwall, M. and Orăsan, C. (2018) What Makes You Stressed? Finding Reasons From Tweets, In Proceedings ofthe 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Brussels, Belgium, pp. 266 - 272, online, doi:10.18653/v1/W18-6239, Abstract: Detecting stress from social media gives a non-intrusive and inexpensive alternative to traditional tools such as questionnaires or physiological sensors for monitoring mental state of individuals. This paper introduces a novel framework for finding reasons for stress from tweets, analyzing multiple categories for the first time. Three word-vector based methods are evaluated on collections of tweets about politics or airlines and are found to be more accurate than standard machine learning algorithms.


Carla Parra Escartín,, Hanna Béchara, and Orăsan, C. (2017) Questing for Quality Estimation A User Study, The Prague Bulletin of Mathematical Linguistics, 108, pp. 343–354, online, doi:10.1515/pralin-2017-0032, Abstract: Post-Editing of Machine Translation (MT) has become a reality in professional translation workflows. In order to optimize the management of projects that use post-editing and avoid underpayments and mistrust from professional translators, effective tools to assess the quality of Machine Translation (MT) systems need to be put in place. One field of study that could address this problem is Machine Translation Quality Estimation (MTQE), which aims to determine the quality of MT without an existing reference. Accurate and reliable MTQE can help project managers and translators alike, as it would allow estimating more precisely the cost of post-editing projects in terms of time and adequate fares by discarding those segments that are not worth post-editing (PE) and have to be translated from scratch. In this paper, we report on the results of an impact study which engages professional translators in PE tasks using MTQE. We measured translators’ productivity in different scenarios: translating from scratch, post-editing without using MTQE, and post-editing using MTQE. Our results show that QE information, when accurate, improves post-editing efficiency
Yaneva, V., Orăsan, C., Evans, R. and Rohanian, O. (2017) Combining Multiple Corpora for Readability Assessment for People with Cognitive Disabilities, In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, Copenhagen, Denmark, pp. 121 - 132, online, doi:10.18653/v1/W17-5013, Abstract: Given the lack of large user-evaluated corpora in disability-related NLP research (e.g. text simplification or readability assessment for people with cognitive disabilities), the question of choosing suitable training data for NLP models is not straightforward. The use of large generic corpora may be problematic because such data may not reflect the needs of the target population. At the same time, the available user-evaluated corpora are not large enough to be used as training data. In this paper we explore a third approach, in which a large generic corpus is combined with a smaller population-specific corpus to train a classifier which is evaluated using two sets of unseen user-evaluated data. One of these sets, the ASD Comprehension corpus, is developed for the purposes of this study and made freely available. We explore the effects of the size and type of the training data used on the performance of the classifiers, and the effects of the type of the unseen test datasets on the classification performance.


Barbu, E., Escartín, C. P., Bentivogli, L., Negri, M., Turchi, M., Federico, M., Mastrostefano, L. and Orăsan, C. (2016) 1st Shared Task on Automatic Translation Memory Cleaning: Preparation and Lessons Learned, In Proceedings of the 2nd Workshop on Natural Language Processing for Translation Memories (NLP4TM 2016), Portorož, Slovenia, pp. 1-6, Abstract: This paper summarizes the work done to prepare the first shared task on automatic translation memory cleaning. This shared task aims at finding automatic ways of cleaning TMs that, for some reason, have not been properly curated and include wrong translations. Participants in this task are required to take pairs of source and target segments from TMs and decide whether they are right translations. For this first task three language pairs have been prepared: English→Spanish, English→Italian, and English→German. In this paper, we report on how the shared task was prepared and explain the process of data selection and data annotation, the building of the training and test sets and the implemented baselines for automatic classifiers comparison.
Barbu, E., Parra Escartín, C., Bentivogli, L., Negri, M., Turchi, M., Orăsan, C. and Federico, M. (2016) The first Automatic Translation Memory Cleaning Shared Task, Machine Translation, 30(3-4), pp. 145-166, online, doi:10.1007/s10590-016-9183-x, Abstract: This paper reports on the organization and results of the first Automatic Translation Memory Cleaning Shared Task. This shared task is aimed at finding automatic ways of cleaning translation memories (TMs) that have not been properly curated and thus include incorrect translations. As a follow up of the shared task, we also conducted two surveys, one targeting the teams participating in the shared task, and the other one targeting professional translators. While the researchers-oriented survey aimed at gathering information about the opinion of participants on the shared task, the translators-oriented survey aimed to better understand what constitutes a good TM unit and inform decisions that will be taken in future editions of the task. In this paper, we report on the process of data preparation and the evaluation of the automatic systems submitted, as well as on the results of the collected surveys.
Bechara, H., Parra Escartin, C., Orăsan, C. and Specia, L. (2016) Semantic Textual Similarity in Quality Estimation, Baltic Journal of Modern Computing, 4(2), pp. 256 - 268, online, Abstract: Quality Estimation (QE) predicts the quality of machine translation output without the need for a reference translation. This quality can be defined differently based on the task at hand. In an attempt to focus further on the adequacy and informativeness of translations, we integrate features of semantic similarity into QuEst, a framework for QE feature extraction. By using methods previously employed in Semantic Textual Similarity (STS) tasks, we use semantically similar sentences and their quality scores as features to estimate the quality of machine translated sentences. Preliminary experiments show that finding semantically similar sentences for some datasets is difficult and time-consuming. Therefore, we opt to start from the assumption that we already have access to semantically similar sentences. Our results show that this method can improve the prediction of machine translation quality for semantically similar sentences.
Bechara, H., Gupta, R., Tan, L. L., Orăsan, C., Mitkov, R. and van Genabith, J. (2016) WOLVESAAR at SemEval-2016 Task 1: Replicating the Success of Monolingual Word Alignment and Neural Embeddings for Semantic Textual Similarity, In Proceedings of SemEval-2016, San Diego, California, pp. 634-639, online, doi:10.18653/v1/S16-1096, Abstract: This paper describes the WOLVESAAR systems that participated in the English Semantic Textual Similarity (STS) task in SemEval-2016. We replicated the top systems from the last two editions of the STS task and extended the model using GloVe word embeddings and dense vector space LSTM based sentence representations. We compared the difference in performance of the replicated system and the extended variants. Our variants to the replicated system show improved correlation scores and all of our submissions outperform the median scores from all participating systems.
Gupta, R., Orăsan, C., Liu, Q. and Mitkov, R. (2016) A Dynamic Programming Approach to Improving Translation Memory Matching and Retrieval Using Paraphrases, In Text, Speech and Dialogue, Sojka, P., Horák, A., Kopeček, I., and Pala, K. (eds.), Brno, CZ, Springer, pp. 259 - 269, online, doi:10.1007/978-3-319-45510-5_30, Abstract: Translation memory tools lack semantic knowledge like paraphrasing when they perform matching and retrieval. As a result, paraphrased segments are often not retrieved. One of the primary reasons for this is the lack of a simple and efficient algorithm to incorporate paraphrasing in the TM matching process. Gupta and Orăsan [1] proposed an algorithm which incorporates paraphrasing based on greedy approximation and dynamic programming. However, because of greedy approximation, their approach does not make full use of the paraphrases available. In this paper we propose an efficient method for incorporating para- phrasing in matching and retrieval based on dynamic programming only. We tested our approach on English-German, English-Spanish and English-French lan- guage pairs and retrieved better results for all three language pairs compared to the earlier approach [1].
Gupta, R., Orăsan, C., Zampieri, M., Vela, M., van Genabith, J. and Mitkov, R. (2016) Improving translation memory matching and retrieval using paraphrases, Machine Translation, 30(1), pp. 19 - 40, online, doi:10.1007/s10590-016-9180-0, Abstract: Most current translation memory (TM) systems work on the string level (character or word level) and lack semantic knowledge while matching. They use simple edit-distance (ED) calculated on the surface form or some variation on it (stem, lemma), which does not take into consideration any semantic aspects in matching. This paper presents a novel and efficient approach to incorporating semantic information in the form of paraphrasing (PP) in the ED metric. The approach computes ED while efficiently considering paraphrases using dynamic programming and greedy approximation. In addition to using automatic evaluation metrics like BLEU and METEOR, we have carried out an extensive human evaluation in which we measured post-editing time, keystrokes, HTER, HMETEOR, and carried out three rounds of subjective evaluations. Our results show that PP substantially improves TM matching and retrieval, resulting in translation performance increases when translators use paraphrase-enhanced TMs.
Orăsan, C. (2016) The EXPERT Project: Training the Future Experts in Translation Technology, In In Proceedings of the 19th Annual Conference of the EAMT: Projects/Products, Riga, Latvia, p. 393, online


Béchara, H., Može, S., El-Maarouf, I., Orăsan, C., Hanks, P. and Mitkov, R. (2015) The Role of Corpus Pattern Analysis in Machine Translation Evaluation, In Proceedings of the The 7th International Conference of the Iberian Association of Translation and Interpreting Studies (AIETI), Malaga, Spain
Gupta, R., Orăsan, C., Zampieri, M., Vela, M. and Genabith, J. V. (2015) Can Translation Memories afford not to use paraphrasing?, In Proceedings of the 18th Annual Conference of the European Association for Machine Translation (EAMT), Antalya, Turkey, pp. 35 - 42, online, Abstract: This paper investigates to what extent the use of paraphrasing in translation memory (TM) matching and retrieval is useful for human translators. Current translation memories lack semantic knowledge like paraphrasing in matching and retrieval. Due to this, paraphrased segments are often not retrieved. Lack of semantic knowledge also results in inappropriate ranking of the retrieved segments. Gupta and Orasan (2014) proposed an improved matching algorithm which incorporates paraphrasing. Its automatic evaluation suggested that it could be beneficial to translators. In this paper we perform an extensive human evaluation of the use of paraphrasing in the TM matching and retrieval process. We measure post-editing time, keystrokes, two subjective evaluations, and HTER and HMETEOR to assess the impact on human performance. Our results show that paraphrasing improves TM matching and retrieval, resulting in translation performance increases when translators use paraphrase enhanced TMs.
Gupta, R., Orăsan, C. and van Genabith, J. (2015) Machine Translation Evaluation using Recurrent Neural Networks, In Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal, pp. 380-384, online, Abstract: This paper presents our metric (UoWLSTM) submitted in the WMT-15 metrics task. Many state-of-the-art Machine Translation (MT) evaluation metrics are complex, involve extensive external resources (e.g. for paraphrasing) and require tuning to achieve the best results. We use a metric based on dense vector spaces and Long Short Term Memory (LSTM) networks, which are types of Recurrent Neural Networks (RNNs). For WMT- 15 our new metric is the best performing metric overall according to Spearman and Pearson (Pre-TrueSkill) and second best according to Pearson (TrueSkill) system level correlation.
Gupta, R., Orăsan, C. and van Genabith, J. (2015) ReVal: A Simple and Effective Machine Translation Evaluation Metric Based on Recurrent Neural Networks, In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1066-1072, online, doi:10.18653/v1/D15-1124, Abstract: Many state-of-the-art Machine Translation (MT) evaluation metrics are complex, involve extensive external resources (e.g. for paraphrasing) and require tuning to achieve best results. We present a simple alternative approach based on dense vector spaces and recurrent neural networks (RNNs), in particular Long Short Term Memory (LSTM) networks. For WMT-14, our new metric scores best for two out of five language pairs, and overall best and second best on all language pairs, using Spearman and Pearson correlation, respectively. We also show how training data is computed automatically from WMT ranks data
Orăsan, C., Cattelan, A., Corpas Pastor, G., van Genabith, J., Herranz, M., Arevalillo, J. J., Liu, Q., Sima’an, K. and Specia, L. (2015) The EXPERT Project: Advancing the State of the Art in Hybrid Translation Technologies, In Proceedings of Translating and the Computer 37, London, UK


Gupta, R. and Orăsan, C. (2014) Incorporating Paraphrasing in Translation Memory Matching and Retrieval, In Proceedings of the Seventeenth Annual Conference of the European Association for Machine Translation (EAMT2014), Dubrovnik, Croatia, pp. 3 - 10, online, Abstract: Current Translation Memory (TM) systems work at the surface level and lack semantic knowledge while matching. This paper presents an approach to incorporating semantic knowledge in the form of paraphrasing in matching and retrieval. Most of the TMs use Levenshtein edit-distance or some variation of it. Generating additional segments based on the paraphrases available in a segment results in exponential time complexity while matching. The reason is that a particular phrase can be paraphrased in several ways and there can be several possible phrases in a segment which can be paraphrased. We propose an efficient approach to incorporating paraphrasing with edit-distance. The approach is based on greedy approximation and dynamic programming. We have obtained significant improvement in both retrieval and translation of retrieved segments for TM thresholds of 100%, 95% and 90%.
Gupta, R., Bechara, H. and Orasan, C. (2014) Intelligent Translation Memory Matching and Retrieval Metric Exploiting Linguistic Technology, In Proceedings of the Translating and Computer 36, London, UK, pp. 86-89, online, Abstract: Translation Memories (TM) help translators in their task by retrieving previously translated sentences and editing fuzzy matches when no exact match is found by the system. Current TM systems use simple edit-distance or some variation of it, which largely relies on the surface form of the sentences and does not necessarily reflect the semantic similarity of segments as judged by humans. In this paper, we propose an intelligent metric to compute the fuzzy match score, which is inspired by similarity and entailment techniques developed in Natural Language Processing.