L10Ntrain Knowledge Bits

CAT Tools and NMT

Quality checks need to become a living system

Recently I have been asked to evaluate the output of an NMT system as to whether translating sentence-based or paragraph-based would provide better results.

The small number of sample texts I had suggested that paragraph-based translation could be slightly better, because of the larger context. But the inconsistencies in terminology even with larger segments showed that the client would need to invest some time and effort into setting up a good terminology database to check for their specific word usage (they were not using a system trained on their own material, but a general system).

In addition, there were some situations where we didn’t have automated checks in the CAT tool yet, but would have to create some.

One example was that the source segment contained a number. The number was translated correctly, but suddenly the currency EURO was appended to the number. In the source text there was no currency mentioned (and as it was a Swiss text, it probably would have had to be CHF not EURO).

But still, when you know what kind of mistakes can happen, you can come up with checking routines (most probably with regular expressions) for that.

But then, sometime afterwards I attended a session on neural machine translation (Thanks to Moni Höge, who did a great job explaining the workings of those systems). And one of the things she said made me a bit uneasy. What she said was that when you train an existing NMT system with new material, the type of mistakes the system makes can change.

That basically means that we will have to check NMT output again and again for new types of mistakes and create new types of checks to catch these mistakes. The QA check will have to become a living system that needs to adapt to the current output of the NMT system.

This could mean that the time spent on finding out what new mistakes the machine is making and defining them for Quality checking in TM tools takes up some of the time that we want to save by using machine translation.

Quality checking will then need to become a living system and adapt to the NMT output continuously.

What I find intriguing (and also a bit scary) about NMT is the unpredictability of the outcome as we don’t know exactly what is happening inside that NTM black box. 🙂

Details matter – Review process outside a translation tool

Even when two tools have the same kind of feature they not necessarily work the same way.

A translation tool is used by the translator, but not necessarily by the person who reviews the translation. These might be subject matter experts who don’t have access to a translation tool of their own.

Because of this, translation tools may provide a format that can be handled outside of a translation tool. This could be a browser-based view of the source and target language segments, but it could also be a “simple” Word-type document.

But beware, these documents don’t always behave the same way even though they are created for the same purpose.

Let us compare the review document from SDL Trados Studio and the bilingual RTF table from memoQ.

They look pretty much the same, but here are some differences:

1. The file format

  • SDL Trados Studio produces a DOCX file (which needs to be handled within Microsoft Office 2007 or later. Anything else could destroy the XML header of the file, thus preventing you from importing the reviewed file back again). The file name must not be changed and the file format needs to stay DOCX.
  • memoQ produces an RTF file (which can be handled in any text editor that can open and save RTF). The file name can be changed, but the file format needs to be RTF for back import.

2. The structure of the file

  • SDL Trados Studio shows a table with segment number, status, source segment and target segment. Comments use the commenting feature in Word.
  • memoQ shows a table with segment number, source segment, target segment, comment and status.

3. The process

  • In SDL Trados Studio, the file is a REVIEW file, i.e. only if the target language column contains text, any changed text can be imported back onto the project in Studio. If you want to use it for translation, the source text needs to be copied to the target column to be overwritten with the translation.
  • In memoQ, the file can be used for REVIEW or TRANSLATION. The target column can be filled or empty. Either way, what is entered in the target column will be imported back into the memoQ project.

And these are just the most important differences between these two formats.