REGEX – the hidden language for our translation tools
Our tools offer a lot of functionality, but in many places a knowledge of some simple regex (regular expressions) can enhance these functionalities a lot.
- You can create your own filters, i.e. determine what parts of a (text-based) document get imported for translation.
- You can convert text elements into tags, which is especially useful for placeholders like these {1}, ##NAME## or %sd.
- You can use regex to search for a pattern, like a web address, a date, a combination of number and measurement…
- You can use it to run a replace action (changing date formats or the sequence of elements, like 25% -.> % 25).
- You can use regex in the QA checkers to find specific things, like numbers and measurement units that are not separated by a non-breaking space.
- You can use regex when specifying segmentation rules and segmentation exceptions.
We all know that a good preparation at the beginning of a project can save a lot of repair work (in all the target languages) and regex is definitely a good thing to include into your preparation considerations.
I often get asked where there is material to learn how to do regex. Well, there are a lot of very good tutorials on the internet (just use the search words “regex” and “tutorial”). But none of them focuses on the needs of the translation industry (hence my course on Regex for Translation on L10Ntrain).
From my experience, the regex you need to know starts with these few expressions:
- Brackets and what they do: ( ) for grouping, [ ] for character ranges/lists and { , } for minimum and maximum numbers of characters.
- Characters that have their own meaning in regex and need the backslash (escape character) before them, when you need to search for the actual character:
- Dot (.), plus (+), asterisk (*) Dollar ($), circumflex (^), backslash (\)
- Searching for spaces in general: \s
- Searching for digits: \d or [0-9]
- Searching for letters: [a-z], [A-Z], \p{Lu}…
Of course, there are many more and you can do wonderful things with regex, but these few can get you started quite quickly.
For an overview and more examples, check out the introductory course on Regular Expressions in Translation 🙂