Older publications

(added in a more ad-hoc manner. Check my CV for a more complete list)


Orăsan, C. (2009) Comparative Evaluation of Term-Weighting Methods for Automatic Summarization, Journal of Quantitative Linguistics, Routledge, 16(1), pp. 67-95, online, doi:10.1080/09296170802514187, Abstract: Term-based summarization assumes that it is possible to determine the importance of a sentence on the basis of the words it contains. To achieve this, words are weighted using term-weighting measures which in turn are used to weight the sentences. This article presents a comparative evaluation of summaries produced using different term-weighting measures and different combinations of parameters which are used to calculate these measures. Comparative evaluation of summaries produced reveals that in many cases simple methods such as term frequency can produce informative summaries.
Ou, S., Mekhaldi, D. and Orăsan, C. (2009) An ontology-based question answering method with the use of textual entailment, In 2009 International Conference on Natural Language Processing and Knowledge Engineering, IEEE, pp. 1-8, online, doi:10.1109/NLPKE.2009.5313770, Abstract: This paper presents a new method for ontology-based Question Answering (QA) with the use of textual entailment. In this method, a set of question patterns, called hypothesis questions, was automatically produced from a domain ontology, along with their corresponding SPARQL query templates for answer retrieval. Then the QA task was reduced to the problem of looking for the hypothesis question that was entailed by a user question and taking its corresponding query template to produce a complete query for retrieving the answers from underlying knowledge bases. An entailment engine was used to discover the entailed hypothesis questions with the help of question classification. An evaluation was carried out to assess the accuracy of the QA method, and the results revealed that most of the user questions (65%) can be correctly answered with a semantic entailment engine enhanced by the domain ontology.


Orăsan, C. and Chiorean, O. A. (2008) Evaluation of a Cross-lingual Romanian-English Multi-document Summariser, In Proceedings of 6th Language Resources and Evaluation Conference (LREC2008), Marrakech, Morocco, pp. 2114 -2119, online, Abstract: The rapid growth of the Internet means that more information is available than ever before. Multilingual multi-document summarisation offers a way to access this information even when it is not in a language spoken by the reader by extracting the gist from related documents and translating it automatically. This paper presents an experiment in which Maximal Marginal Relevance (MMR), a well known multi-document summarisation method, is used to produce summaries from Romanian news articles. A task-based evaluation performed on both the original summaries and on their automatically translated versions reveals that they still contain a significant portion of the important information from the original texts. However, direct evaluation of the automatically translated summaries shows that they are not very legible and this can put off some readers who want to find out more about a topic.


Orăsan, C., Ha, L. A., Evans, R., Hasler, L. and Mitkov, R. (2007) Corpora for computational linguistics, Ilha do Desterro: A Journal of Language and Literature, 52, pp. 65-101, online, Abstract: Since the mid 90s corpora has become very important for computational linguistics. This paper offers a survey of how they are currently used in different fields of the discipline, with particular emphasis on anaphora and coreference resolution, automatic summarisation and term extraction. Their influence on other fields is also briefly discussed.


Orǎsan, C. (2003) An evolutionary approach for improving the quality of automatic summaries, In Proceedings of the ACL 2003 workshop on Multilingual summarization and question answering, Sapporo, Japan, p. 37, online, Abstract: Automatic text extraction techniques have proved robust, but very often their summaries are not coherent. In this paper, we propose a new extraction method which uses local coherence as a means to improve the overall quality of automatic summaries. Two algorithms for sentence selection are proposed and evaluated on scientific documents. Evaluation showed that the method ameliorates the quality of summaries, noticeable improvements being obtained for longer summaries produced by an algorithm which selects sentences using an evolutionary algorithm.


Orăsan, C. (2000) A hybrid method for clause splitting in unrestricted English texts, In Proceedings of ACIDCA '2000, Corpora and Natural Language Processing, Monastir, Tunisia, pp. 129-134, online, Abstract: It is important to know the structure of the sentence for many NLP tasks. In this paper we propose a hybrid method for clause splitting in unrestricted English texts which re-quires less human work than existing approaches. The results of a machine learning algorithm, trained on an an-notated corpus, are processed by a shallow rule-based mod-ule in order to improve the accuracy of the method. The evaluation of the results showed that the machine learn-ing algorithm is useful for identification of clause?s bound-aries and the rule-based module improves the results. Using some very simple rules we can report precision of around 88%.