Constantin Orasan's page

All kind of random stuff on NLP, AI, photography and who knows

 

Combining innovations in speech-to-text (STT) technology and Natural Language Processing (NLP), this InnovateUK funded collaborative project between University of Surrey and JUST:access developed an automated transcription tool designed specifically for the justice sector. The developed solution also provides an easy way to navigate the lengthy court hearings with the help of the final judgement. The project had two main phases:

  1. Improving the Automatic Speech Recognition (ASR): we used NLP to automatically identify important phrases from the legal domain. This information, together with a large dataset of court hearings, was used to fine tune a generic off the shelf ASR system. Evaluation revealed not only a reduction of the word error rate, but also better recognition of legal terminology and legal entities. The results are presented in our AI4AJ 2023 paper.
  2. Method for linking court hearings with the final judgement: we employed Generative AI technology to identify timespans in court hearings which are relevant to the paragraphs from the court judgement. This enables better access to justice by allowing fast navigation of very long court hearings. In order to train our method, I designed an annotation tool specifically for this purpose. The proposed linking method was presented at EMNLP 2023.

Surrey project team

  • Constantin Orasan, principal investigator
  • Hadeel Saadany, research fellow

Publications

Resources

Project poster

Annotation tool

In order to train and evaluate our linking method, I developed a tool that allows users to easily annotate whether a timespan from a court hearing is linked to a paragraph from a court judgement. The tool allows to play the video/speech of the court hearing in order to facilitate the annotation. The tool is written in python and uses Django web framework. Below is a screenshot of the annotation tool. If you would like to use it, please get in touch. The tool is open source, but I need to tidy up the code a bit and add documentation in order to make it useful for others.



This page is for archival purposes only. The project concluded in 2023, but the information may still be interesting for some people. This page was last modified on 10th Feb 2024.

More about my research