After talking about the things that can produce different word counts, we should also look at what can be the reason for different match values.
Even with the same file and the same TM, the analysis results can differ, because the settings that influence the match values are usually project-based settings.
Let us take penalties first. A penalty can be applied to matches that come from a specific TM, that have metadata other than the one used in your current project or maybe even to segments with a certain user name or user role saved. This means instead of the “real” match value the segment would have, it shows up with a lower match value.
There are many reasons to apply a penalty:
- The TM has been provided by a client and has not been created by yourself, so you cannot guarantee for its quality.
- The material in the TM is old or comes from an alignment (most tools will apply a penalty for alignment segments automatically).
- You have decided to start a new, fresh TM and use the existing TM as a reference in the background.
- The content was saved to the TM by a certain person (maybe by an intern who did an alignment and was not very careful during aligning the segments) or with a certain role (you want to trust segments confirmed by a reviewer more than those confirmed by a translator).
- The content was translated for a different subject matter area and this information was saved to the TM as well as metadata (the TM contains translations from marketing, but you now want to translate a contract).
Then, there are filter settings. Usually, applying a filter means to apply a penalty. But it could also be that certain segments do not appear at all, because the filter does not permit segments with different metadata, from a TM with a specific name or from a specific user.
Still another reason could be that the segmentation rules don’t contain all abbreviations. This will result in 2 segments in the document where there might be just one segment in the TM (maybe the translator joined the segments during translation, creating one segment in the TM but not updating the segmentation rules).
And another reason could be the use of different TM tools. As the way how match values are calculated differ from tool to tool, a 82% match in tool 1 can very well be a 80% match tool 2 and an 85% match in tool 3. The match values can differ quite a lot actually, depending on what is in the segments in the way of tags etc.
Here are some examples for differing match values:
Tool 1 shows 70%, tool 2 shows 89%.
The difference is one full word (short -> nice).
Tool 1 shows 95%, tool 2 shows 92%.
The differences are the number, the formatting and the capitalization and the spacing.
And to make it even more complex, the examples show that it is not necessarily the case that one tool always shows lower match values than the other 🙂