Postediting dataset

This page provides information about the dataset with postediting information we developed and how to obtain it.

Characteristics of the dataset

The dataset contains 260 pairs of sentences extracted from English-Spanish part of the Autodesk Post-Editing Data corpus. The sentences were extracted in such a way that they contain approximately 3,000 words. This number represents a day’s work for the average professional translator and allowed us to emulate a real-world setting by asking the translators to complete the task in one day. In addition, they were selected in such a way that we can investigate four different scenarios:

  • translators are not provided with any automatic translation and have to translate from scratch
  • an automatic translation is provided, but no information about the quality of the translation is given to translators. In this scenario, the translators have to decide whether to post-edit the given sentence or translate from scratch
  • an automatic translation is provided and the translator is informed that the quality of the translation is poor. In this scenario, the translators is advised to translate from scratch, but they can decide post-edit the given translation
  • an automatic translation is provided and the translator is informed that the quality of the translation is good. In this scenario, the translators is advised to post-edit the given translation

The selected sentences were equally distributed between these categories.

The experiment

For our experiments we enlisted the help of four professional Spanish translators with several years’ translating experience. All 4 translators had some experience with Post-Editing tools. They were asked to use the post-editing tool PET which records all the operations performed by our translators.

The dataset is available as four XML which record the operations performed by the translators. The example below presents the format used to record all these operations.

Obtaining the dataset

In order to obtain the dataset please fill in the following form.

See a translator in action

We are currently developing an interface for observing the operations carried out by our translators. You can see an early demo at http://dinel.org.uk/demos/postedit/