Publications

Publications on topics related to translation technology

Ranasinghe, T., Orăsan, C. and Mitkov, R. (2021) An Exploratory Analysis of Multilingual Word-Level Quality Estimation with Cross-Lingual Transformers, In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Stroudsburg, PA, USA, Association for Computational Linguistics, pp. 434-440, online, doi:10.18653/v1/2021.acl-short.55, Abstract: Most studies on word-level Quality Estimation (QE) of machine translation focus on language-specific models. The obvious disadvantages of these approaches are the need for labelled data for each language pair and the high cost required to maintain several language-specific models. To overcome these problems, we explore different approaches to multilingual, word-level QE. We show that multilingual QE models perform on par with the current language-specific models. In the cases of zero-shot and few-shot QE, we demonstrate that it is possible to accurately predict word-level quality for any given new language pair from models trained on other language pairs. Our findings suggest that the word-level QE models based on powerful pre-trained transformers that we propose in this paper generalise well across languages, making them more useful in real-world scenarios.
Béchara, H., Orăsan, C., Parra Escartín, C., Zampieri, M. and Lowe, W. (2021) The Role of Machine Translation Quality Estimation in the Post-Editing Workflow, Informatics, 8(3), online, doi:10.3390/informatics8030061, Abstract:

As Machine Translation (MT) becomes increasingly ubiquitous, so does its use in professional translation workflows. However, its proliferation in the translation industry has brought about new challenges in the field of Post-Editing (PE). We are now faced with a need to find effective tools to assess the quality of MT systems to avoid underpayments and mistrust by professional translators. In this scenario, one promising field of study is MT Quality Estimation (MTQE), as this aims to determine the quality of an automatic translation and, indirectly, its degree of post-editing difficulty. However, its impact on the translation workflows and the translators’ cognitive load is still to be fully explored. We report on the results of an impact study engaging professional translators in PE tasks using MTQE. To assess the translators’ cognitive load we measure their productivity both in terms of time and effort (keystrokes) in three different scenarios: translating from scratch, post-editing without using MTQE, and post-editing using MTQE. Our results show that good MTQE information can improve post-editing efficiency and decrease the cognitive load on translators. This is especially true for cases with low MT quality.

Ranasinghe, T., Orăsan, C. and Mitkov, R. (2020) Intelligent Translation Memory Matching and Retrieval with Sentence Encoders, In Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, Lisbon, Portugal, pp. 175 - 184, online, Abstract: Matching and retrieving previously translated segments from a Translation Memory is the key functionality in Translation Memories systems. However this matching and retrieving process is still limited to algorithms based on edit distance which we have identified as a major drawback in Translation Memories systems. In this paper we introduce sentence encoders to improve the matching and retrieving process in Translation Memories systems - an effective and efficient solution to replace edit distance based algorithms.
Ranasinghe, T., Orăsan, C. and Mitkov, R. (2020) TransQuest at WMT2020: Sentence-Level Direct Assessment, In Proceedings ofthe 5th Conference on Machine Translation (WMT), Online, pp. 1047-1053, online, Abstract: This paper presents the team TransQuest's participation in Sentence-Level Direct Assessment shared task in WMT 2020. We introduce a simple QE framework based on cross-lingual transformers, and we use it to implement and evaluate two different neural architectures. The proposed methods achieve state-of-the-art results surpassing the results obtained by OpenKiwi, the baseline used in the shared task. We further fine tune the QE framework by performing ensemble and data augmentation. Our approach is the winning solution in all of the language pairs according to the WMT 2020 official results.
Ranasinghe, T., Orăsan, C. and Mitkov, R. (2020) TransQuest: Translation quality estimation with cross-lingual transformers, In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, pp. 5070-5081, online, doi:10.18653/v1/2020.coling-main.445, Abstract: Recent years have seen big advances in the field of sentence-level quality estimation (QE), largely as a result of using neural-based architectures. However, the majority of these methods work only on the language pair they are trained on and need retraining for new language pairs. This process can prove difficult from a technical point of view and is usually computationally expensive. In this paper we propose a simple QE framework based on cross-lingual transformers, and we use it to implement and evaluate two different neural architectures. Our evaluation shows that the proposed methods achieve state-of-the-art results outperforming current open-source quality estimation frameworks when trained on datasets from WMT. In addition, the framework proves very useful in transfer learning settings, especially when dealing with low-resourced languages, allowing us to obtain very competitive results.
Saadany, H. and Orăsan, C. (2020) Is it great or terrible? Preserving sentiment in neural machine translation of Arabic reviews, In Proceedings of the Fifth Arabic Natural Language Processing Workshop, Barcelona, Spain (Online), pp. 24-37, online, Abstract: Since the advent of Neural Machine Translation (NMT) approaches there has been a tremendous improvement in the quality of automatic translation. However, NMT output still lacks accuracy in some low-resource languages and sometimes makes major errors that need extensive post-editing. This is particularly noticeable with texts that do not follow common lexico-grammatical standards, such as user generated content (UGC). In this paper we investigate the challenges involved in translating book reviews from Arabic into English, with particular focus on the errors that lead to incorrect translation of sentiment polarity. Our study points to the special characteristics of Arabic UGC, examines the sentiment transfer errors made by Google Translate of Arabic UGC to English, analyzes why the problem occurs, and proposes an error typology specific of the translation of Arabic UGC. Our analysis shows that the output of online translation tools of Arabic UGC can either fail to transfer the sentiment at all by producing a neutral target text, or completely flips the sentiment polarity of the target word or phrase and hence delivers a wrong affect message. We address this problem by fine-tuning an NMT model with respect to sentiment polarity showing that this approach can significantly help with correcting sentiment errors detected in the online translation of Arabic UGC.
Orăsan, C., Escartín, C. P., Torres, L. S. and Barbu, E. (2019) Exploiting Data-Driven Hybrid Approaches to Translation in the EXPERT Project, In Advances in Empirical Translation Studies, Cambridge University Press, pp. 198-216, online, doi:10.1017/9781108525695.011
Temnikova, I., Orăsan, C., Pastor, G. C. and Mitkov, R. (eds.) (2019) Proceedings of the Human-Informed Translation and Interpreting Technology Workshop (HiT-IT 2019), Varna, Bulgaria, online
Carla Parra Escartín,, Hanna Béchara, and Orăsan, C. (2017) Questing for Quality Estimation A User Study, The Prague Bulletin of Mathematical Linguistics, 108, pp. 343–354, online, doi:10.1515/pralin-2017-0032, Abstract: Post-Editing of Machine Translation (MT) has become a reality in professional translation workflows. In order to optimize the management of projects that use post-editing and avoid underpayments and mistrust from professional translators, effective tools to assess the quality of Machine Translation (MT) systems need to be put in place. One field of study that could address this problem is Machine Translation Quality Estimation (MTQE), which aims to determine the quality of MT without an existing reference. Accurate and reliable MTQE can help project managers and translators alike, as it would allow estimating more precisely the cost of post-editing projects in terms of time and adequate fares by discarding those segments that are not worth post-editing (PE) and have to be translated from scratch. In this paper, we report on the results of an impact study which engages professional translators in PE tasks using MTQE. We measured translators’ productivity in different scenarios: translating from scratch, post-editing without using MTQE, and post-editing using MTQE. Our results show that QE information, when accurate, improves post-editing efficiency
Bechara, H., Parra Escartin, C., Orăsan, C. and Specia, L. (2016) Semantic Textual Similarity in Quality Estimation, Baltic Journal of Modern Computing, 4(2), pp. 256 - 268, online, Abstract: Quality Estimation (QE) predicts the quality of machine translation output without the need for a reference translation. This quality can be defined differently based on the task at hand. In an attempt to focus further on the adequacy and informativeness of translations, we integrate features of semantic similarity into QuEst, a framework for QE feature extraction. By using methods previously employed in Semantic Textual Similarity (STS) tasks, we use semantically similar sentences and their quality scores as features to estimate the quality of machine translated sentences. Preliminary experiments show that finding semantically similar sentences for some datasets is difficult and time-consuming. Therefore, we opt to start from the assumption that we already have access to semantically similar sentences. Our results show that this method can improve the prediction of machine translation quality for semantically similar sentences.
Gupta, R., Orăsan, C., Liu, Q. and Mitkov, R. (2016) A Dynamic Programming Approach to Improving Translation Memory Matching and Retrieval Using Paraphrases, In Text, Speech and Dialogue, Sojka, P., Horák, A., Kopeček, I., and Pala, K. (eds.), Brno, CZ, Springer, pp. 259 - 269, online, doi:10.1007/978-3-319-45510-5_30, Abstract: Translation memory tools lack semantic knowledge like paraphrasing when they perform matching and retrieval. As a result, paraphrased segments are often not retrieved. One of the primary reasons for this is the lack of a simple and efficient algorithm to incorporate paraphrasing in the TM matching process. Gupta and Orăsan [1] proposed an algorithm which incorporates paraphrasing based on greedy approximation and dynamic programming. However, because of greedy approximation, their approach does not make full use of the paraphrases available. In this paper we propose an efficient method for incorporating para- phrasing in matching and retrieval based on dynamic programming only. We tested our approach on English-German, English-Spanish and English-French lan- guage pairs and retrieved better results for all three language pairs compared to the earlier approach [1].
Barbu, E., Escartín, C. P., Bentivogli, L., Negri, M., Turchi, M., Federico, M., Mastrostefano, L. and Orăsan, C. (2016) 1st Shared Task on Automatic Translation Memory Cleaning: Preparation and Lessons Learned, In Proceedings of the 2nd Workshop on Natural Language Processing for Translation Memories (NLP4TM 2016), Portorož, Slovenia, pp. 1-6, Abstract: This paper summarizes the work done to prepare the first shared task on automatic translation memory cleaning. This shared task aims at finding automatic ways of cleaning TMs that, for some reason, have not been properly curated and include wrong translations. Participants in this task are required to take pairs of source and target segments from TMs and decide whether they are right translations. For this first task three language pairs have been prepared: English→Spanish, English→Italian, and English→German. In this paper, we report on how the shared task was prepared and explain the process of data selection and data annotation, the building of the training and test sets and the implemented baselines for automatic classifiers comparison.
Orăsan, C. (2016) The EXPERT Project: Training the Future Experts in Translation Technology, In In Proceedings of the 19th Annual Conference of the EAMT: Projects/Products, Riga, Latvia, p. 393, online
Gupta, R., Orăsan, C., Zampieri, M., Vela, M., van Genabith, J. and Mitkov, R. (2016) Improving translation memory matching and retrieval using paraphrases, Machine Translation, 30(1), pp. 19 - 40, online, doi:10.1007/s10590-016-9180-0, Abstract: Most current translation memory (TM) systems work on the string level (character or word level) and lack semantic knowledge while matching. They use simple edit-distance (ED) calculated on the surface form or some variation on it (stem, lemma), which does not take into consideration any semantic aspects in matching. This paper presents a novel and efficient approach to incorporating semantic information in the form of paraphrasing (PP) in the ED metric. The approach computes ED while efficiently considering paraphrases using dynamic programming and greedy approximation. In addition to using automatic evaluation metrics like BLEU and METEOR, we have carried out an extensive human evaluation in which we measured post-editing time, keystrokes, HTER, HMETEOR, and carried out three rounds of subjective evaluations. Our results show that PP substantially improves TM matching and retrieval, resulting in translation performance increases when translators use paraphrase-enhanced TMs.
Barbu, E., Parra Escartín, C., Bentivogli, L., Negri, M., Turchi, M., Orăsan, C. and Federico, M. (2016) The first Automatic Translation Memory Cleaning Shared Task, Machine Translation, 30(3-4), pp. 145-166, online, doi:10.1007/s10590-016-9183-x, Abstract: This paper reports on the organization and results of the first Automatic Translation Memory Cleaning Shared Task. This shared task is aimed at finding automatic ways of cleaning translation memories (TMs) that have not been properly curated and thus include incorrect translations. As a follow up of the shared task, we also conducted two surveys, one targeting the teams participating in the shared task, and the other one targeting professional translators. While the researchers-oriented survey aimed at gathering information about the opinion of participants on the shared task, the translators-oriented survey aimed to better understand what constitutes a good TM unit and inform decisions that will be taken in future editions of the task. In this paper, we report on the process of data preparation and the evaluation of the automatic systems submitted, as well as on the results of the collected surveys.
Gupta, R., Orăsan, C., Zampieri, M., Vela, M. and Genabith, J. V. (2015) Can Translation Memories afford not to use paraphrasing?, In Proceedings of the 18th Annual Conference of the European Association for Machine Translation (EAMT), Antalya, Turkey, pp. 35 - 42, online, Abstract: This paper investigates to what extent the use of paraphrasing in translation memory (TM) matching and retrieval is useful for human translators. Current translation memories lack semantic knowledge like paraphrasing in matching and retrieval. Due to this, paraphrased segments are often not retrieved. Lack of semantic knowledge also results in inappropriate ranking of the retrieved segments. Gupta and Orasan (2014) proposed an improved matching algorithm which incorporates paraphrasing. Its automatic evaluation suggested that it could be beneficial to translators. In this paper we perform an extensive human evaluation of the use of paraphrasing in the TM matching and retrieval process. We measure post-editing time, keystrokes, two subjective evaluations, and HTER and HMETEOR to assess the impact on human performance. Our results show that paraphrasing improves TM matching and retrieval, resulting in translation performance increases when translators use paraphrase enhanced TMs.
Gupta, R., Orăsan, C. and van Genabith, J. (2015) Machine Translation Evaluation using Recurrent Neural Networks, In Proceedings of the Tenth Workshop on Statistical Machine Translation, Lisbon, Portugal, pp. 380-384, online, Abstract: This paper presents our metric (UoWLSTM) submitted in the WMT-15 metrics task. Many state-of-the-art Machine Translation (MT) evaluation metrics are complex, involve extensive external resources (e.g. for paraphrasing) and require tuning to achieve the best results. We use a metric based on dense vector spaces and Long Short Term Memory (LSTM) networks, which are types of Recurrent Neural Networks (RNNs). For WMT- 15 our new metric is the best performing metric overall according to Spearman and Pearson (Pre-TrueSkill) and second best according to Pearson (TrueSkill) system level correlation.
Gupta, R., Orăsan, C. and van Genabith, J. (2015) ReVal: A Simple and Effective Machine Translation Evaluation Metric Based on Recurrent Neural Networks, In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, pp. 1066-1072, online, doi:10.18653/v1/D15-1124, Abstract: Many state-of-the-art Machine Translation (MT) evaluation metrics are complex, involve extensive external resources (e.g. for paraphrasing) and require tuning to achieve best results. We present a simple alternative approach based on dense vector spaces and recurrent neural networks (RNNs), in particular Long Short Term Memory (LSTM) networks. For WMT-14, our new metric scores best for two out of five language pairs, and overall best and second best on all language pairs, using Spearman and Pearson correlation, respectively. We also show how training data is computed automatically from WMT ranks data
Orăsan, C., Cattelan, A., Corpas Pastor, G., van Genabith, J., Herranz, M., Arevalillo, J. J., Liu, Q., Sima’an, K. and Specia, L. (2015) The EXPERT Project: Advancing the State of the Art in Hybrid Translation Technologies, In Proceedings of Translating and the Computer 37, London, UK
Béchara, H., Može, S., El-Maarouf, I., Orăsan, C., Hanks, P. and Mitkov, R. (2015) The Role of Corpus Pattern Analysis in Machine Translation Evaluation, In Proceedings of the The 7th International Conference of the Iberian Association of Translation and Interpreting Studies (AIETI), Malaga, Spain
Gupta, R. and Orăsan, C. (2014) Incorporating Paraphrasing in Translation Memory Matching and Retrieval, In Proceedings of the Seventeenth Annual Conference of the European Association for Machine Translation (EAMT2014), Dubrovnik, Croatia, pp. 3 - 10, online, Abstract: Current Translation Memory (TM) systems work at the surface level and lack semantic knowledge while matching. This paper presents an approach to incorporating semantic knowledge in the form of paraphrasing in matching and retrieval. Most of the TMs use Levenshtein edit-distance or some variation of it. Generating additional segments based on the paraphrases available in a segment results in exponential time complexity while matching. The reason is that a particular phrase can be paraphrased in several ways and there can be several possible phrases in a segment which can be paraphrased. We propose an efficient approach to incorporating paraphrasing with edit-distance. The approach is based on greedy approximation and dynamic programming. We have obtained significant improvement in both retrieval and translation of retrieved segments for TM thresholds of 100%, 95% and 90%.
Gupta, R., Bechara, H. and Orasan, C. (2014) Intelligent Translation Memory Matching and Retrieval Metric Exploiting Linguistic Technology, In Proceedings of the Translating and Computer 36, London, UK, pp. 86-89, online, Abstract: Translation Memories (TM) help translators in their task by retrieving previously translated sentences and editing fuzzy matches when no exact match is found by the system. Current TM systems use simple edit-distance or some variation of it, which largely relies on the surface form of the sentences and does not necessarily reflect the semantic similarity of segments as judged by humans. In this paper, we propose an intelligent metric to compute the fuzzy match score, which is inspired by similarity and entailment techniques developed in Natural Language Processing.

Publications related to text simplification and text accessibility

Evans, R. and Orăsan, C. (2019) Identifying signs of syntactic complexity for rule-based sentence simplification, Natural Language Engineering, 25(1), pp. 69-119, online, doi:10.1017/S1351324918000384, Abstract: This article presents a new method to automatically simplify English sentences. The approach is designed to reduce the number of compound clauses and nominally bound relative clauses in input sentences. The article provides an overview of a corpus annotated with information about various explicit signs of syntactic complexity and describes the two major components of a sentence simplification method that works by exploiting information on the signs occurring in the sentences of a text. The first component is a sign tagger which automatically classifies signs in accordance with the annotation scheme used to annotate the corpus. The second component is an iterative rule-based sentence transformation tool. Exploiting the sign tagger in conjunction with other NLP components, the sentence transformation tool automatically rewrites long sentences containing compound clauses and nominally bound relative clauses as sequences of shorter single-clause sentences. Evaluation of the different components reveals acceptable performance in rewriting sentences containing compound clauses but less accuracy when rewriting sentences containing nominally bound relative clauses. A detailed error analysis revealed that the major sources of error include inaccurate sign tagging, the relatively limited coverage of the rules used to rewrite sentences, and an inability to discriminate between various subtypes of clause coordination. Despite this, the system performed well in comparison with two baselines. This finding was reinforced by automatic estimations of the readability of system output and by surveys of readers’ opinions about the accuracy, accessibility, and meaning of this output.
Evans, R. and Orăsan, C. (2019) Sentence Simplification for Semantic Role Labelling and Information Extraction, In Proceedings of Recent Advances in Natural Language Processing (RANLP2019), Varna, Bulgaria, pp. 285-294, online, doi:10.26615/978-954-452-056-4_033, Abstract: In this paper, we report on the extrinsic evaluation of an automatic sentence simplification method with respect to two NLP tasks: semantic role labelling (SRL) and information extraction (IE). The paper begins with our observation of challenges in the intrinsic evaluation of sentence simplification systems, which motivates the use of extrinsic evaluation of these sys- tems with respect to other NLP tasks. We describe the two NLP systems and the test data used in the extrinsic evaluation, and present arguments and evidence motivating the integration of a sentence simplification step as a means of improving the accuracy of these systems. Our evaluation reveals that their performance is improved by the simplification step: the SRL system is better able to assign semantic roles to the majority of the arguments of verbs and the IE system is better able to identify fillers for all IE template slots.
Yaneva, V., Orăsan, C., Ha, L. A. and Ponomareva, N. (2019) A Survey of the Perceived Text Adaptation Needs of Adults with Autism, In Proceedings of Recent Advances in Natural Language Processing (RANLP 2019), Varna, Bulgaria, pp. 1356-1363, online, doi:10.26615/978-954-452-056-4_155, Abstract: NLP approaches to automatic text adaptation often rely on user-need guidelines which are generic and do not account for the differences between various types of target groups. One such group are adults with high-functioning autism, who are usually able to read long sentences and comprehend difficult words but whose comprehension may be impeded by other linguistic constructions. This is especially challenging for real-world user-generated texts such as product reviews, which cannot be controlled editorially and are thus in a stronger need of automatic adaptation. To address this problem, we present a mixed-methods survey conducted with 24 adult web-users diagnosed with autism and an age-matched control group of 33 neurotypical par- ticipants. The aim of the survey is to identify whether the group with autism experiences any barriers when reading online reviews, what these potential barriers are, and what NLP methods would be best suited to im- prove the accessibility of online reviews for people with autism. The group with autism consistently reported significantly greater difficulties with understanding online product reviews compared to the control group and identified issues related to text length, poor topic organisation, identifying the intention of the author, trustworthiness, and the use of irony, sarcasm and exaggeration.
Orăsan, C., Evans, R. and Mitkov, R. (2018) Intelligent Text Processing to Help Readers with Autism, In Intelligent Natural Language Processing: Trends and Applications, Shaalan, K., Hassanien, A., and Tolba, F. (eds.), Springer, pp. 713-740, online, doi:10.1007/978-3-319-67056-0_33, Abstract: Autistic Spectrum Disorder (ASD) is a neurodevelopmental disorder which has a life-long impact on the lives of people diagnosed with the condition. In many cases, people with ASD are unable to derive the gist or meaning of written documents due to their inability to process complex sentences, understand non-literal text, and understand uncommon and technical terms. This paper presents FIRST, an innovative project which developed language technology (LT) to make documents more accessible to people with ASD. The project has produced a powerful editor which enables carers of people with ASD to prepare texts suitable for this population. Assessment of the texts generated using the editor showed that they are not less readable than those generated more slowly as a result of onerous unaided conversion and were significantly more readable than the originals. Evaluation of the tool shows that it can have a positive impact on the lives of people with ASD.
Yaneva, V., Orăsan, C., Evans, R. and Rohanian, O. (2017) Combining Multiple Corpora for Readability Assessment for People with Cognitive Disabilities, In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, Copenhagen, Denmark, pp. 121 - 132, online, doi:10.18653/v1/W17-5013, Abstract: Given the lack of large user-evaluated corpora in disability-related NLP research (e.g. text simplification or readability assessment for people with cognitive disabilities), the question of choosing suitable training data for NLP models is not straightforward. The use of large generic corpora may be problematic because such data may not reflect the needs of the target population. At the same time, the available user-evaluated corpora are not large enough to be used as training data. In this paper we explore a third approach, in which a large generic corpus is combined with a smaller population-specific corpus to train a classifier which is evaluated using two sets of unseen user-evaluated data. One of these sets, the ASD Comprehension corpus, is developed for the purposes of this study and made freely available. We explore the effects of the size and type of the training data used on the performance of the classifiers, and the effects of the type of the unseen test datasets on the classification performance.
Evans, R., Orasan, C. and Dornescu, I. (2014) An evaluation of syntactic simplification rules for people with autism, In Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR), Gothenburg, Sweden, pp. 131 - 140, online, Abstract: Syntactically complex sentences constitute an obstacle for some people with Autistic Spectrum Disorders. This paper evaluates a set of simplification rules specifically designed for tackling complex and compound sentences. In total, 127 different rules were developed for the rewriting of complex sentences and 56 for the rewriting of compound sentences. The evaluation assessed the accuracy of these rules individually and revealed that fully automatic conversion of these sentences into a more accessible form is not very reliable.
Dornescu, I., Evans, R. and Orasan, C. (2014) Relative clause extraction for syntactic simplification, In Proceedings of the Workshop on Automatic Text Simplification: Methods and Applications in the Multilingual Society, Dublin, Ireland, pp. 1-10, online, Abstract: This paper investigates non-destructive simplification, a type of syntactic text simplification which focuses on extracting embedded clauses from structurally complex sentences and rephrasing them without affecting their original meaning. This process reduces the average sentence length and complexity to make text simpler. Although relevant for human readers with low reading skills or language disabilities, the process has direct applications in NLP. In this paper we analyse the extraction of relative clauses through a tagging approach. A dataset covering three genres was manually annotated and used to develop and compare several approaches for automatically detecting appositions and non-restrictive relative clauses. The best results are obtained by a ML model developed using crfsuite, followed by a rule based method.
Dornescu, I., Evans, R. and Orasan, C. (2013) A Tagging Approach to Identify Complex Constituents for Text Simplification, In Proceedings of Recent Advances in Natural Language Processing, Hissar, Bulgaria, p. 221 -- 229, online, Abstract: The occurrence of syntactic phenomena such as coordination and subordination is characteristic of long, complex sentences. Text simplification systems need to detect and categorise constituents in order to generate simpler sentences. These constituents are typically bounded or linked by signs of syntactic complexity, which include conjunctions, complementisers, wh-words, and punctuation marks. This paper proposes a supervised tagging approach to classify these signs in accordance with their linking and bounding functions. The performance of the approach is evaluated both intrinsically, using an annotated corpus covering three different genres, and extrinsically, by evaluating the impact of classification errors on an automatic text simplification system. The results are encouraging.
Evans, R. and Orasan, C. (2013) Annotating signs of syntactic complexity to support sentence simplification, In Text, Speech and Dialogue. Proceedings of the 16th International Conference TSD 2013, Habernal, I. and Matousek, V. (eds.), Plzen, Czech Republic, Springer, pp. 92-104, online, doi:10.1007/978-3-642-40585-3_13, Abstract: This article presents a new annotation scheme for syntactic complexity in text which has the advantage over other existing syntactic annotation schemes that it is easy to apply, is reliable and it is able to encode a wide range of phenomena. It is based on the notion that the syntactic complexity of sentences is explicitly indicated by signs such as conjunctions, complementisers and punctuation marks. The article describes the annotation scheme developed to annotate these signs and evaluates three corpora containing texts from three genres that were annotated using it. Inter-annotator agreement calculated on the three corpora shows that there is at least “substantial agreement” and motivates directions for future work.
Orǎsan, C., Evans, R. and Dornescu, I. (2013) Text Simplification for People with Autistic Spectrum Disorders, In Towards Multilingual Europe 2020: A Romanian Perspective, pp. 287-312, Abstract: People affected by autism spectrum disorders usually have language deficits which limit their ability to comprehend speech and written text. This is usually caused by the presence in text of linguistic phenomena such as long and syntactically complex sentences, figurative language including metaphor and idioms, semantically ambiguous words and phrases, and technical/specialised words. The FIRST project’s main objective is to implement, deploy and evaluate NLP technologies to support the authoring of accessible content in Bulgarian, English and Spanish. Two experiments aimed at determining the needs of people with ASD confirmed that syntactic simplification transformations needed to make texts more accessible. In addition to presenting the FIRST project and the experiments which determined the users’ requirements, this paper also presents a syntactic simplification method which combines a machine learning approach with manually created rules is presented.

Publications related to social media

Saadany, H. and Orăsan, C. (2020) Is it great or terrible? Preserving sentiment in neural machine translation of Arabic reviews, In Proceedings of the Fifth Arabic Natural Language Processing Workshop, Barcelona, Spain (Online), pp. 24-37, online, Abstract: Since the advent of Neural Machine Translation (NMT) approaches there has been a tremendous improvement in the quality of automatic translation. However, NMT output still lacks accuracy in some low-resource languages and sometimes makes major errors that need extensive post-editing. This is particularly noticeable with texts that do not follow common lexico-grammatical standards, such as user generated content (UGC). In this paper we investigate the challenges involved in translating book reviews from Arabic into English, with particular focus on the errors that lead to incorrect translation of sentiment polarity. Our study points to the special characteristics of Arabic UGC, examines the sentiment transfer errors made by Google Translate of Arabic UGC to English, analyzes why the problem occurs, and proposes an error typology specific of the translation of Arabic UGC. Our analysis shows that the output of online translation tools of Arabic UGC can either fail to transfer the sentiment at all by producing a neutral target text, or completely flips the sentiment polarity of the target word or phrase and hence delivers a wrong affect message. We address this problem by fine-tuning an NMT model with respect to sentiment polarity showing that this approach can significantly help with correcting sentiment errors detected in the online translation of Arabic UGC.
Ranasinghe, T., Saadany, H., Plum, A., Mandhari, S., Mohamed, E., Orăsan, C. and Mitkov, R. (2019) RGCL at IDAT : Deep Learning models for Irony Detection in Arabic Language, In Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2019), Kolkata, India, pp. 416 - 425, online, Abstract: This article describes the system submitted by the RGCL team to the IDAT 2019 Shared Task: Irony Detection in Arabic Tweets. The system detects irony in Arabic tweets using deep learning. The paper evaluates the performance of several deep learning models, as well as how text cleaning and text pre-processing influence the accuracy of the system. Several runs were submitted. The highest F1 score achieved for one of the submissions was 0.818 making the team RGCL rank 4th out of 10 teams in final results. Overall, we present a system that uses minimal pre-processing but capable of achieving competitive results.
Plum, A., Ranasinghe, T., Orăsan, C. and Mitkov, R. (2019) RGCL at GermEval 2019: Offensive Language Detection with Deep Learning, In Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019), Erlangen, Germany, pp. 423 - 428, online, Abstract: This paper describes the system submitted by the RGCL team to GermEval 2019 Shared Task 2: Identification of Offensive Language. We experimented with five different neural network architectures in order to classify Tweets in terms of offensive language. By means of comparative evaluation, we select the best performing for each of the three subtasks. Overall, we demonstrate that using only minimal preprocessing we are able to obtain competitive results.
Orăsan, C. (2018) Aggressive Language Identification Using Word Embeddings and Sentiment Features, In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), Santa Fe, USA, pp. 113 - 119, online, Abstract: This paper describes our participation in the First Shared Task on Aggression Identification. The method proposed relies on machine learning to identify social media texts which contain aggression. The main features employed by our method are information extracted from word embeddings and the output of a sentiment analyser. Several machine learning methods and different combinations of features were tried. The official submissions used Support Vector Machines and Random Forests. The official evaluation showed that for texts similar to the ones in the training dataset Random Forests work best, whilst for texts which are different SVMs are a better choice. The evaluation also showed that despite its simplicity the method performs well when compared with more elaborated methods.
Gopalakrishna Pillai, R., Thelwall, M. and Orăsan, C. (2018) Detection of Stress and Relaxation Magnitudes for Tweets, In Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW '18, New York, New York, USA, ACM Press, pp. 1677 - 1684, online, doi:10.1145/3184558.3191627, Abstract: The ability to automatically detect human stress and relaxation is crucial for timely diagnosing stress-related diseases, ensuring customer satisfaction in services and managing human-centric applications such as traffic management. Traditional methods employ stress-measuring scales or physiological monitoring which may be intrusive and inconvenient. Instead, the ubiquitous nature of the social media can be leveraged to identify stress and relaxation, since many people habitually share their recent life experiences through social networking sites. This paper introduces an improved method to detect expressions of stress and relaxation in social media content. It uses word sense disambiguation by word sense vectors to improve the performance of the first and only lexicon-based stress/relaxation detection algorithm TensiStrength. Experimental results show that incorporating word sense disambiguation substantially improves the performance of the original TensiStrength. It performs better than state-of-the-art machine learning methods too in terms of Pearson correlation and percentage of exact matches. We also propose a novel framework for identifying the causal agents of stress and relaxation in tweets as future work.
Pillai, R. G., Thelwall, M. and Orăsan, C. (2018) Trouble on the Road : Finding Reasons for Commuter Stress from Tweets, In Proceedings ofthe Workshop on Intelligent Interactive Systems and Language Generation (2IS&NLG), Tilburg, The Netherlands, pp. 20 - 25, online, doi:10.18653/v1/W18-6705, Abstract: Intelligent Transportation Systems could benefit from harnessing social media content to get continuous feedback. In this work, we implement a system to identify reasons for stress in tweets related to traffic using a word vector strategy to select a reason from a predefined list generated by topic modeling and clustering. The proposed system, which performs better than standard machine learning algorithms, could provide inputs to warning systems for commuters in the area and feedback for the authorities.
Pillai, R. G., Thelwall, M. and Orăsan, C. (2018) What Makes You Stressed? Finding Reasons From Tweets, In Proceedings ofthe 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Brussels, Belgium, pp. 266 - 272, online, doi:10.18653/v1/W18-6239, Abstract: Detecting stress from social media gives a non-intrusive and inexpensive alternative to traditional tools such as questionnaires or physiological sensors for monitoring mental state of individuals. This paper introduces a novel framework for finding reasons for stress from tweets, analyzing multiple categories for the first time. Three word-vector based methods are evaluated on collections of tweets about politics or airlines and are found to be more accurate than standard machine learning algorithms.

Publications related to text summarization

Orăsan, C. (2019) Automatic summarisation: 25 years On, Natural Language Engineering, 25(6), pp. 735-751, online, doi:10.1017/S1351324919000524, Abstract: Automatic text summarisation is a topic that has been receiving attention from the research community from the early days of computational linguistics, but it really took off around 25 years ago. This article presents the main developments from the last 25 years. It starts by defining what a summary is and how its definition changed over time as a result of the interest in processing new types of documents. The article continues with a brief history of the field and highlights the main challenges posed by the evaluation of summaries. The article finishes with some thoughts about the future of the field.
Orăsan, C. (2009) Comparative Evaluation of Term-Weighting Methods for Automatic Summarization, Journal of Quantitative Linguistics, Routledge, 16(1), pp. 67-95, online, doi:10.1080/09296170802514187, Abstract: Term-based summarization assumes that it is possible to determine the importance of a sentence on the basis of the words it contains. To achieve this, words are weighted using term-weighting measures which in turn are used to weight the sentences. This article presents a comparative evaluation of summaries produced using different term-weighting measures and different combinations of parameters which are used to calculate these measures. Comparative evaluation of summaries produced reveals that in many cases simple methods such as term frequency can produce informative summaries.
Orăsan, C. and Chiorean, O. A. (2008) Evaluation of a Cross-lingual Romanian-English Multi-document Summariser, In Proceedings of 6th Language Resources and Evaluation Conference (LREC2008), Marrakech, Morocco, pp. 2114 -2119, online, Abstract: The rapid growth of the Internet means that more information is available than ever before. Multilingual multi-document summarisation offers a way to access this information even when it is not in a language spoken by the reader by extracting the gist from related documents and translating it automatically. This paper presents an experiment in which Maximal Marginal Relevance (MMR), a well known multi-document summarisation method, is used to produce summaries from Romanian news articles. A task-based evaluation performed on both the original summaries and on their automatically translated versions reveals that they still contain a significant portion of the important information from the original texts. However, direct evaluation of the automatically translated summaries shows that they are not very legible and this can put off some readers who want to find out more about a topic.
Orǎsan, C. (2003) An evolutionary approach for improving the quality of automatic summaries, In Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering, Sapporo, Japan, p. 37, online, Abstract: Automatic text extraction techniques have proved robust, but very often their summaries are not coherent. In this paper, we propose a new extraction method which uses local coherence as a means to improve the overall quality of automatic summaries. Two algorithms for sentence selection are proposed and evaluated on scientific documents. Evaluation showed that the method ameliorates the quality of summaries, noticeable improvements being obtained for longer summaries produced by an algorithm which selects sentences using an evolutionary algorithm.

Publications related to anaphora and coreference resolution

Mitkov, R., Evans, R., Orasan, C., Dornescu, I. and Rios, M. (2012) Coreference Resolution: To What Extent Does It Help NLP Applications, In Text, Speech and Dialogue, Sojka, P., Horák, A., Kopeček, I., and Pala, K. (eds.), Berlin, Heidelberg, Springer Berlin Heidelberg, pp. 16-27, online, doi:10.1007/978-3-642-32790-2_2, Abstract: This paper describes a study of the impact of coreference res- olution on NLP applications. Further to our previous study [19], in which we investigated whether anaphora resolution could be beneficial to NLP applications, we now seek to establish whether a different, but related task - that of coreference resolution, could improve the performance of three NLP applications: text summarisation, recognising textual entailment and text classification. The study discusses experiments in which the aforementioned applications were implemented in two versions, one in which the BART coreference resolution system was integrated and one in which it was not, and then tested in processing input text. The paper discusses the results obtained.
Recasens, M., Martí, M. A. and Orasan, C. (2012) Annotating Near-Identity from Coreference Disagreements, In Proceedings of The Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, pp. 165-172, online, Abstract: We present an extension of the coreference annotation in the English NP4E and the Catalan AnCora-CA corpora with near-identity relations, which are borderline cases of coreference. The annotated subcorpora have 50K tokens each. Near-identity relations, as presented by Recasens et al. (2010; 2011), build upon the idea that identity is a continuum rather than an either/or relation, thus introducing a middle ground category to explain currently problematic cases. The first annotation effort that we describe shows that it is not possible to annotate near-identity explicitly because subjects are not fully aware of it. Therefore, our second annotation effort used an indirect method, and arrived at near-identity annotations by inference from the disagreements between five annotators who had only a two-alternative choice between coreference and non-coreference. The results show that whereas as little as 2-6% of the relations were explicitly annotated as near-identity in the former effort, up to 12-16% of the relations turned out to be near-identical following the indirect method of the latter effort.
Camargo de Souza, G. and Orăsan, C. (2011) Can Projected Chains in Parallel Corpora Help Coreference Resolution?, In Anaphora Processing and Applications, pp. 59-69, doi:10.1007/978-3-642-25917-3_6, Abstract: The majority of current coreference resolution systems rely on annotated corpora to train classifiers for this task. However, this is possible only for languages for which annotated corpora are available. This paper presents a system that automatically extracts coreference chains from texts in Portuguese without the need for Portuguese corpora manually annotated with coreferential information. To achieve this, an English coreference resolver is run on the English part of an English-Portuguese parallel corpus. The coreference pairs identi ed by the resolver are projected to the Portuguese part of the corpus using automatic word alignment. These projected pairs are then used to train the coreference resolver for Portuguese. Evaluation of the system reveals that it does not outperform a head match baseline. This is due to the fact that most of the projected pairs have the same head, which is learnt by the Portuguese classifier. This suggests that a more accurate English coreference resolver is necessary. A better projection algorithm is also likely to improve the performance of the system.
Mitkov, R., Evans, R., Orăsan, C., Ha, L. A. and Pekar, V. (2007) Anaphora resolution: to what extent does it help NLP applications?, In Lecture Notes In Artificial Intelligence, Branco, A. (ed.), Springer-Verlag, pp. 179-190, online, Abstract: Papers discussing anaphora resolution algorithms or systems usually focus on the intrinsic evaluation of the algorithm/system and not on the issue of extrinsic evaluation. In the context of anaphora resolution, extrinsic evaluation concerns the impact of an anaphora resolution module on a larger NLP system of which it is part. In this paper we explore the extent to which the well-known anaphora resolution system MARS [1] can improve the performance of three NLP applications: text summarisation, term extraction and text categorisation. On the basis of the results so far we conclude that the deployment of anaphora resolution has a positive albeit limited impact.
Orasan, C. and Evans, R. (2007) NP animacy identification for anaphora resolution, Journal of Artificial Intelligence Research, 29(1), pp. 79-103, online, doi:10.1613/jair.2088, Abstract: In anaphora resolution for English, animacy identification can play an integral role in the application of agreement restrictions between pronouns and candidates, and as a result, can improve the accuracy of anaphora resolution systems. In this paper, two methods for animacy identification are proposed and evaluated using intrinsic and extrinsic measures. The first method is a rule-based one which uses information about the unique beginners in WordNet to classify NPs on the basis of their animacy. The second method relies on a machine learning algorithm which exploits a WordNet enriched with animacy information for each sense. The effect of word sense disambiguation on the two methods is also assessed. The intrinsic evaluation reveals that the machine learning method reaches human levels of performance. The extrinsic evaluation demonstrates that animacy identification can be beneficial in anaphora resolution, especially in the cases where animate entities are identified with high precision.
Mitkov, R., Evans, R. and Orasan, C. (2002) A New, Fully Automatic Version of Mitkov's Knowledge-Poor Pronoun Resolution Method, In Lecture Notes In Computer Science; Vol. 2276, Alexander F. Gelbukh, (ed.), Springer-Verlag, p. 168, online
Orǎsan, C. and Evans, R. (2001) Learning to identify animate references, In Proceedings of the 2001 workshop on Computational Natural Language Learning, online, Abstract: Information about the animacy of nouns is important for a wide range of tasks in NLP. In this paper, we present a method for determining the animacy of English nouns using WordNet and machine learning techniques. Our method firstly categorises the senses from WordNet using an annotated corpus and then uses this information in order to classify nouns for which the sense is not known. Our evaluation results show that the accuracy of the classification of a noun is around 97% and that animate entities are more difficult to identify than inanimate ones.
Orasan, C., Evans, R. and Mitkov, R. (2000) Enhancing Preference-Based Anaphora Resolution with Genetic Algorithms, In Lecture Notes In Computer Science, Dimitris Christodoulakis, (ed.), Springer-Verlag, p. 185, online, Abstract: The paper argues that a promising way to improve the success rate of preference-based anaphora resolution algorithms is the use of machine learning. The paper outlines MARS - a program for automatic resolution of pronominal anaphors and describes an experiment which we have conducted to optimise the success rate of MARS with the help of a genetic algorithm. After the optimisation we noted an improvement up to 8% for some files. The results obtained after optimisation are discussed.

Publications related to information extraction and question answering

Plum, A., Ranasinghe, T., Calleja, P., Orăsan, C. and Mitkov, R. (2019) RGCL-WLV at SemEval-2019 Task 12: Toponym Detection, In Proceedings ofthe 13th International Workshop on Semantic Evaluation (SemEval-2019), Minneapolis, Minnesota, USA, pp. 1297-1301, online, doi:10.18653/v1/S19-2228, Abstract: This article describes the system submitted by the RGCL-WLV team to the SemEval 2019 Task 12: Toponym resolution in scientific pa- pers. The system detects toponyms using a bootstrapped machine learning (ML) ap- proach which classifies names identified using gazetteers extracted from the GeoNames ge- ographical database. The paper evaluates the performance of several ML classifiers, as well as how the gazetteers influence the accuracy of the system. Several runs were submitted. The highest precision achieved for one of the sub- missions was 89%, albeit it at a relatively low recall of 49%.
Plum, A., Ranasinghe, T. and Orăsan, C. (2019) Toponym Detection in the Bio-Medical Domain: A Hybrid Approach with Deep Learning, In Proceedings of Recent Advances in Natural Language Processing (RANLP2019), Varna, Bulgaria, pp. 912-921, online, doi:10.26615/978-954-452-056-4_106, Abstract: This paper compares how different machine learning classifiers can be used together with simple string matching and named entity recognition to detect locations in texts. We compare five different state-of-the-art machine learning clas-sifiers in order to predict whether a sentence contains a location or not. Following this classification task, we use a string matching algorithm with a gazetteer to identify the exact index of a toponym within the sentence. We evaluate different approaches in terms of machine learning classifiers, text pre-processing and location extraction on the SemEval-2019 Task 12 dataset, compiled for toponym resolution in the bio-medical domain. Finally , we compare the results with our system that was previously submitted to the SemEval-2019 task evaluation.
El Maarouf, I., Marsic, G. and Orăsan, C. (2015) Barbecued Opakapaka : Using Semantic Preferences for Ontology Population, In Proceedings of Recent Advances in Natural Language Processing, Hissar, Bulgaria, pp. 153-159, online, Abstract: This paper investigates the use of semantic preferences for ontology population. It draws on a new resource, the Pattern Dictionary of English Verbs, which lists semantic categories expected in each syntactic slot of a verb pattern. Knowledge of semantic preferences is used to drive and control bootstrapped pattern extraction techniques on the EnClueWeb09 corpus with the aim of identifying common nouns belonging to twelve semantic types. Evaluation reveals that syntactic patterns perform better than lexical and surface patterns, at the same time raising issues about assessing ontology population candidates out of context.
Dornescu, I. and Orasan, C. (2014) Densification: Semantic document analysis using Wikipedia, Natural Language Engineering, Cambridge University Press, 20(4), pp. 469 - 500, online, doi:10.1017/S1351324913000296, Abstract: This paper proposes a new method for semantic document analysis: densification, which identifies and ranks Wikipedia pages relevant to a given document. Although there are similarities with established tasks such as wikification and entity linking, the method does not aim for strict disambiguation of named entity mentions. Instead, densification uses existing links to rank additional articles that are relevant to the document, a form of explicit semantic indexing that enables higher-level semantic retrieval procedures that can be beneficial for a wide range of NLP applications. Because a gold standard for densification evaluation does not exist, a study is carried out to investigate the level of agreement achievable by humans, which questions the feasibility of creating an annotated data set. As a result, a semi-supervised approach is employed to develop a two-stage densification system: filtering unlikely candidate links and then ranking the remaining links. In a first evaluation experiment, Wikipedia articles are used to automatically estimate the performance in terms of recall. Results show that the proposed densification approach outperforms several wikification systems. A second experiment measures the impact of integrating the links predicted by the densification system into a semantic question answering (QA) system that relies on Wikipedia links to answer complex questions. Densification enables the QA system to find twice as many additional answers than when using a state-of-the-art wikification system.
Konstantinova, N. and Orasan, C. (2013) Interactive Question Answering, In Emerging Applications of Natural Language Processing: Concepts and New Research, Bandyopadhyay, S., Naskar, S. K., and Ekbal, A. (eds.), IGI Global, p. 149 -- 169, online, doi:10.4018/978-1-4666-2169-5.ch007, Abstract: The increasing amount of information available online has led to the development of technologies that help to deal with it. One of them is Interactive Question Answering (IQA), a research field that has emerged at the intersection of question answering and dialogue systems, and which allows users to find the answers to questions in an interactive way. During the answering process, the automatic system can initiate a dialogue with the user in order to clarify missing or ambiguous information, or suggest further topics for discussion. This chapter presents the state-of-the-art in the field of interactive question answering. Given that IQA inherits a lot of features from dialogue systems and question answering, these fields are also briefly presented. Analysis of the existing systems reveals that in general IQA systems rely on a scaled-down version of a dialogue system, sometimes built on top of question answering systems. Evaluation of IQA is also discussed, showing that it combines evaluation techniques from question answering and dialogue systems.
Konstantinova, N., Orasan, C. and Balage, P. P. (2012) A Corpus-Based Method for Product Feature Ranking for Interactive Question Answering Systems, International Journal of Computational Linguistics and Applications, 3(1), pp. 57 - 70, Abstract: At times choosing a product can be a difficult task due to the fact that customers need to consider many features before they can reach a decision. Interactive question answering (IQA) systems can help customers in this process, by answering questions about products and initiating a dialogue with the customer when their needs are not clearly defined. For this purpose we propose a corpus-based method for weighting the importance of product features depending on how likely they are to be of interest for a user. By using this method, we hope that users can select the desired product in an optimal way. For the experiments a corpus of user reviews is used, the assumption being that the features mentioned in a review are probably more important for a person who is likely to purchase a product. In an attempt to improve the method, a sentiment classification system is also employed in order to distinguish between features mentioned in positive and negative contexts. Evaluation shows that the ranking method which incorporates this information is one of the best performing ones.
Ferrández, Ó., Spurk, C., Kouylekov, M., Dornescu, I., Ferrández, S., Negri, M., Izquierdo, R., Tomás, D., Orasan, C., Neumann, G., Magnini, B. and Vicedo, J. L. (2011) The QALL-ME framework: A specifiable-domain multilingual question answering architecture, Journal of Web Semantics, Elsevier B.V., 9(2), pp. 137-145, online, doi:10.1016/j.websem.2011.01.002, Abstract: Abstract: This paper presents the QALL-ME Framework, a reusable architecture for building multi- and cross-lingual Question Answering (QA) systems working on structured data modelled by an ontology. It is released as free open source software with a set of demo components and extensive documentation, which makes it easy to use and adapt. The main characteristics of the QALL-ME Framework are: (i) its domain portability, achieved by an ontology modelling the target domain; (ii) the context awareness regarding space and time of the question; (iii) the use of textual entailment engines as the core of the question interpretation; and (iv) an architecture based on Service Oriented Architecture (SOA), which is realized using interchangeable web services for the framework components. Furthermore, we present a running example to clarify how the framework processes questions as well as a case study that shows a QA application built as an instantiation of the QALL-ME Framework for cinema/movie events in the tourism domain. © 2011 Elsevier B.V.
Varga, A., Puscasu, G. and Orasan, C. (2009) Identification of temporal expressions in the domain of tourism, In KEPT 2009. Knowledge Engineering Principles and Techniques. Selected Papers, Frentiu, M. and Pop, H. (eds.), Cluj-Napoca, Cluj University Press, pp. 61-68, online, Abstract: This paper presents how an existing temporal processor was adapted to be used by the English Question Answering system developed part of the EU-funded project QALL-ME. Experiments applying the existing temporal processor to questions from the domain of tourism revealed that the existing temporal processor tackles far too many temporal expressions, and this makes it slower than necessary. In light of this, a simplified temporal processor which identifies only temporal expressions present in user questions was implemented. The two temporal annotators are evaluated on 1,118 randomly selected user questions and an error analysis is presented.
Ou, S., Mekhaldi, D. and Orăsan, C. (2009) An ontology-based question answering method with the use of textual entailment, In 2009 International Conference on Natural Language Processing and Knowledge Engineering, IEEE, pp. 1-8, online, doi:10.1109/NLPKE.2009.5313770, Abstract: This paper presents a new method for ontology-based Question Answering (QA) with the use of textual entailment. In this method, a set of question patterns, called hypothesis questions, was automatically produced from a domain ontology, along with their corresponding SPARQL query templates for answer retrieval. Then the QA task was reduced to the problem of looking for the hypothesis question that was entailed by a user question and taking its corresponding query template to produce a complete query for retrieving the answers from underlying knowledge bases. An entailment engine was used to discover the entailed hypothesis questions with the help of question classification. An evaluation was carried out to assess the accuracy of the QA method, and the results revealed that most of the user questions (65%) can be correctly answered with a semantic entailment engine enhanced by the domain ontology.
Sacaleanu, B., Orasan, C., Spurk, C., Ou, S., Ferrandez, O., Kouylekov, M. and Negri, M. (2008) Entailment-based question answering for structured data, In Coling 2008: Companion volume – Posters and Demonstrations, Manchester, UK, pp. 173-176, online, Abstract: This paper describes a Question Answering system which retrieves answers from structured data regarding cinemas and movies. The system represents the first prototype of a multilingual and multimodal QA system for the domain of tourism. Based on specially designed domain ontology and using Textual Entailment as a means for semantic inference, the system can be used in both monolingual and cross-language settings with slight adjustments for new input languages.
Orǎsan, C., Tatar, D., Şerban, G., Lupsa, D. and Onet, A. (2003) How to build a QA system in your back-garden: application for Romanian, In Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics, Budapest, Hungary, pp. 139 - 142, online, Abstract: Even though the question answering (QA) field appeared only in recent years, there are systems for English which obtain good results for open-domain questions. The situation is very different for other languages, mainly due to the lack of NLP resources which are normally used by QA systems. In this paper, we present a project which develops a QA system for Romanian. The challenges we face and decisions we have to make are discussed.

Publications related to semantic text similarity

Ranasinghe, T., Orăsan, C. and Mitkov, R. (2019) Semantic Textual Similarity with Siamese Neural Networks, In Proceedings of Recent Advances in Natural Language Processing (RANLP2019), Varna, Bulgaria, pp. 1005-1012, online, doi:10.26615/978-954-452-056-4_116, Abstract: Calculating the Semantic Textual Similarity (STS) is an important research area in natural language processing which plays a significant role in many applications such as question answering, document summar-isation, information retrieval and information extraction. This paper evaluates Sia-mese recurrent architectures, a special type of neural networks, which are used here to measure STS. Several variants of the architecture are compared with existing methods .
Ranasinghe, T., Orăsan, C. and Mitkov, R. (2019) Enhancing Unsupervised Sentence Similarity Methods with Deep Contextualised Word Representations, In Proceedings of Recent Advances in Natural Language Processing (RANLP 2019), Varna, Bulgaria, pp. 994-1003, online, doi:10.26615/978-954-452-056-4_115, Abstract: Calculating Semantic Textual Similarity (STS) plays a significant role in many applications such as question answering, document summarisation, information re- trieval and information extraction. All modern state of the art STS methods rely on word embeddings one way or another. The recently introduced contextualised word embeddings have proved more effective than standard word embeddings in many natural language processing tasks. This paper evaluates the impact of several contextualised word embeddings on unsupervised STS methods and compares it with the existing supervised/unsupervised STS methods for different datasets in dif- ferent languages and different domains.
Bechara, H., Parra Escartin, C., Orăsan, C. and Specia, L. (2016) Semantic Textual Similarity in Quality Estimation, Baltic Journal of Modern Computing, 4(2), pp. 256 - 268, online, Abstract: Quality Estimation (QE) predicts the quality of machine translation output without the need for a reference translation. This quality can be defined differently based on the task at hand. In an attempt to focus further on the adequacy and informativeness of translations, we integrate features of semantic similarity into QuEst, a framework for QE feature extraction. By using methods previously employed in Semantic Textual Similarity (STS) tasks, we use semantically similar sentences and their quality scores as features to estimate the quality of machine translated sentences. Preliminary experiments show that finding semantically similar sentences for some datasets is difficult and time-consuming. Therefore, we opt to start from the assumption that we already have access to semantically similar sentences. Our results show that this method can improve the prediction of machine translation quality for semantically similar sentences.
Bechara, H., Gupta, R., Tan, L. L., Orăsan, C., Mitkov, R. and van Genabith, J. (2016) WOLVESAAR at SemEval-2016 Task 1: Replicating the Success of Monolingual Word Alignment and Neural Embeddings for Semantic Textual Similarity, In Proceedings of SemEval-2016, San Diego, California, pp. 634-639, online, doi:10.18653/v1/S16-1096, Abstract: This paper describes the WOLVESAAR systems that participated in the English Semantic Textual Similarity (STS) task in SemEval-2016. We replicated the top systems from the last two editions of the STS task and extended the model using GloVe word embeddings and dense vector space LSTM based sentence representations. We compared the difference in performance of the replicated system and the extended variants. Our variants to the replicated system show improved correlation scores and all of our submissions outperform the median scores from all participating systems.
Béchara, H., Costa, H., Taslimipoor, S., Gupta, R., Orăsan, C., Corpas Pastor, G. and Mitkov, R. (2015) MiniExperts: An SVM Approach for Measuring Semantic Textual Similarity, In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, pp. 96 - 101, online, Abstract: This paper describes the system submitted by the University of Wolverhampton and the University of Malaga for SemEval-2015 Task 2: Semantic Textual Similarity. The system uses a Supported Vector Machine approach based on a number of linguistically motivated features. Our system performed satisfactorily for English and obtained a mean 0.7216 Pearson correlation. However, it performed less adequately for Spanish, obtaining only a mean 0.5158.
Gupta, R., Bechara, H. and Orasan, C. (2014) Intelligent Translation Memory Matching and Retrieval Metric Exploiting Linguistic Technology, In Proceedings of the Translating and Computer 36, London, UK, pp. 86-89, online, Abstract: Translation Memories (TM) help translators in their task by retrieving previously translated sentences and editing fuzzy matches when no exact match is found by the system. Current TM systems use simple edit-distance or some variation of it, which largely relies on the surface form of the sentences and does not necessarily reflect the semantic similarity of segments as judged by humans. In this paper, we propose an intelligent metric to compute the fuzzy match score, which is inspired by similarity and entailment techniques developed in Natural Language Processing.

Publications on other topics

Orăsan, C. and Mitkov, R. (2021) Recent Developments in Natural Language Processing, In The Oxford Handbook of Computational Linguistics 2nd edition, Mitkov, R. (ed.), Oxford University Press, pp. 1-68, online, doi:10.1093/oxfordhb/9780199573691.013.005, Abstract: Natural Language Processing (NLP) is a dynamic and rapidly developing field in which new trends, techniques, and applications are constantly emerging. This chapter focuses mainly on recent developments in NLP which could not be covered in other chapters of the Handbook. Topics such as crowdsourcing and processing of large datasets, which are no longer that recent but are widely used and not covered at length in any other chapter, are also presented. The chapter starts by describing how the availability of tools and resources has had a positive impact on the field. The proliferation of user-generated content has led to the emergence of research topics such as sarcasm and irony detection, automatic assessment of user-generated content, and stance detection. All of these topics are discussed in the chapter. The field of NLP is approaching maturity, a fact corroborated by the latest developments in the processing of texts for financial purposes and for helping users with disabilities, two topics that are also discussed here. The chapter presents examples of how researchers have successfully combined research in computer vision and natural language processing to enable the processing of multimodal information, as well as how the latest advances in deep learning have revitalized research on chatbots and conversational agents. The chapter concludes with a comprehensive list of further reading material and additional resources.
Plum, A., Zampieri, M., Orăsan, C., Wandl-Vogt, E. and Mitkov, R. (2019) Large-scale Data Harvesting for Biographical Data, In Proceedings of the International Conference on Biographical Data in a Digital World 2019, Varna, Bulgaria, online, Abstract: This paper explores automatic methods to identify relevant biography candidates in large databases, and extract biographical information from encyclopedia entries and databases. In this work, relevant candidates are defined as people who have made an impact in a certain country or region within a pre-defined time frame. We investigate the case of people who had an impact in the Republic of Austria and died between 1951 and 2019. We use Wikipedia and Wikidata as data sources and compare the performance of our information extraction methods on these two databases. We demonstrate the usefulness of a natural language processing pipeline to identify suitable biography candidates and, in a second stage, extract relevant information about them. Even though they are considered by many as an identical resource, our results show that the data from Wikipedia and Wikidata differs in some cases and they can be used in a complementary way providing more data for the compilation of biographies.
Orăsan, C., Ha, L. A., Evans, R., Hasler, L. and Mitkov, R. (2007) Corpora for computational linguistics, Ilha do Desterro: A Journal of Language and Literature, 52, pp. 65-101, online, Abstract: Since the mid 90s corpora has become very important for computational linguistics. This paper offers a survey of how they are currently used in different fields of the discipline, with particular emphasis on anaphora and coreference resolution, automatic summarisation and term extraction. Their influence on other fields is also briefly discussed.
Orăsan, C. (2000) A hybrid method for clause splitting in unrestricted English texts, In Proceedings of ACIDCA '2000, Corpora and Natural Language Processing, Monastir, Tunisia, pp. 129-134, online, Abstract: It is important to know the structure of the sentence for many NLP tasks. In this paper we propose a hybrid method for clause splitting in unrestricted English texts which re-quires less human work than existing approaches. The results of a machine learning algorithm, trained on an an-notated corpus, are processed by a shallow rule-based mod-ule in order to improve the accuracy of the method. The evaluation of the results showed that the machine learn-ing algorithm is useful for identification of clause?s bound-aries and the rule-based module improves the results. Using some very simple rules we can report precision of around 88%.