CAT Tools and NMT

Quality checks need to become a living system

Recently I have been asked to evaluate the output of an NMT system as to whether translating sentence-based or paragraph-based would provide better results.

The small number of sample texts I had suggested that paragraph-based translation could be slightly better, because of the larger context. But the inconsistencies in terminology even with larger segments showed that the client would need to invest some time and effort into setting up a good terminology database to check for their specific word usage (they were not using a system trained on their own material, but a general system).

In addition, there were some situations where we didn’t have automated checks in the CAT tool yet, but would have to create some.

One example was that the source segment contained a number. The number was translated correctly, but suddenly the currency EURO was appended to the number. In the source text there was no currency mentioned (and as it was a Swiss text, it probably would have had to be CHF not EURO).

But still, when you know what kind of mistakes can happen, you can come up with checking routines (most probably with regular expressions) for that.

But then, sometime afterwards I attended a session on neural machine translation (Thanks to Moni HΓΆge, who did a great job explaining the workings of those systems). And one of the things she said made me a bit uneasy. What she said was that when you train an existing NMT system with new material, the type of mistakes the system makes can change.

That basically means that we will have to check NMT output again and again for new types of mistakes and create new types of checks to catch these mistakes. The QA check will have to become a living system that needs to adapt to the current output of the NMT system.

This could mean that the time spent on finding out what new mistakes the machine is making and defining them for Quality checking in TM tools takes up some of the time that we want to save by using machine translation.

Quality checking will then need to become a living system and adapt to the NMT output continuously.

What I find intriguing (and also a bit scary) about NMT is the unpredictability of the outcome as we don’t know exactly what is happening inside that NTM black box. πŸ™‚