All posts by Gary Lefman

Terminology work (5) – extraction

This series offers some insights from the many workshops and presentations on terminology that I have done over the years.

The terminology you are using appears in any written text, be it website pages, brochures, manuals, guidelines, contracts, reports…

Someone will have to read all that text and decide what is a term and extract (copy/paste) it. This can be helped by tools, but make no mistake, the tools are not intelligent. Most terminology extraction tools work on a statistical basis – the more often a term appears, the more important it is. Which is not always the case. An important term might come up only twice, one in the heading and once in the first paragraph and afterwards it might be referred to with its short form. In this case, most statistical tools would not extract the term, as it appears less than 5 times.

There are linguistic extraction tools, but they are limited to the language pair they were built for and are not available for all language pairs. They at least can be configured for example, to extract noun phrases of up to 4 words, which are usually good candidates for a term list. Statistical tools will create a huge list of possible terms, but then this list needs to be checked for the real terms.

From my experience (mostly extractions from English and German technical and medical documents) there is a threshold from which the extraction with a tool makes more sense. I found that up to 20.000 words of text, it does not really make a difference if you read through the text sentence by sentence and select the terms manually or run a statistical extraction tool and then go through the list and mark the terms you want to keep. After that, the extraction with a tool is faster.

Most translation tools will have a component that allows the extraction of terms and can be used both for monolingual (usually source language) material and also for bilingual material, i.e. translation memories, bilingual files from the translation process or alignments of files.

To estimate how much terminology can be extracted, I usually calculate with about 20% or the terms of a list extracted by a tool or between 5 and 15% of the overall word count of the document(s), depending on whether they are more general or more technical in nature.

When extracting terms, make sure you have defined what kind of terms you are looking for (see part 3 of this series: Terminology work (3) – fundamental decisions).



(Trainer for translation tools since 1997)

Terminology work (4) – fundamental decisions about the user

This series offers some insights from the many workshops and presentations on terminology that I have done over the years.

Most terminology works starts life in Excel – which is a very good way to get started, but not something you would use for professional terminology management.

Usually, when you start to think of terminology work, you already have a goal in mind or a pain point that needs your attention.

  • Recurring questions from translators – you want to provide them with a term list or term base that can be used in the translation tool (for terminology recognition and terminology checking)
  • Support tickets because users misunderstand the product or process description
  • Company-internal effort to check translated documentation for the correct terminology
  • You want to provide the company terminology to all users in the company through the intranet
  • You want to provide terminology lists to the authors

Depending on the intended user group, the information associated with each term can be different. Whereas a translator needs to know the term, the translation, any forbidden alternatives and the product the term belongs to, other users in your company might need something more like a dictionary with information on gender, plural forms or context examples.

If you want to provide terminology for translation, ask your translation vendors what format a list should have, maybe they already provide online access to their term base system and allow collaboration on terminology online.

If you want to provide terminology as a company dictionary through the intranet, talk to your webmaster how a list can be brought online and, most important, how it can be updated periodically.

If you want to provide term lists for authors, ask them, if they are using a term checking tools in their authoring environment and how a term list would need to look like, to be easily importable.

Any of these settings differ in the way the term lists need to be set up and what kind of information (metadata to the term) needs to be added.



(Trainer for translation tools since 1997)

Terminology work (3) – fundamental decisions

This series offers some insights from the many workshops and presentations on terminology that I have done over the years.

Now let’s move on to more complex things – you need to decide what a term is.

If you take the view of a dictionary, a term is something that needs to be explained. But make no mistake, mostly the general words, words that everybody knows and seems to understand generate most of the problems when creating or translating text. Words like cap, bolt, device etc. seem to be so general that you would not put them into a company dictionary and therefore also not into a terminology database. But these are exactly the terms that will produce most of the questions and misunderstandings.

Mostly, because they are used as the short form of a longer term. Instead of talking about a “multiple-output generating device” in every second sentence, you would probably use it once or twice and then shorten it to “device”. Everyone who reads the text will see what you mean – but what if this text comes out of a content management system? A translator might get a small module to translate where the long form of the word in not to be found – how should the translator know what “device” exactly the text is talking about?

In this case a good terminology database that states the word “device” as the short form of one or several longer terms and gives some explanation what it is and how it should be translated in different circumstances, can help a lot.

When deciding on what a term is in your special case, try these categories:

  • Everything that has to do with your company and differs from other companies.
  • Everything that is special to your products and where a term differentiates between you and your competitors although you are producing the same thing (keep the term of the competitor as a forbidden term).
  • Things that are special to the subject matter area you work in.
  • Things that need an explanation (don’t forget the everyday words here)
  • Abbreviated forms, acronyms, slogans, mission statements

And now we are at a point where terminology work can get messy and starts to grow uncontrollably.

In order to keep things manageable, limit the terminology collection to the source language and to one product (maybe the base product for others or the most used product). Once you have collected the most important terms here, you can move on to other products or other languages.


(Trainer for translation tools since 1997)

Terminology work (2) – how to get started (continued)

This series offers some insights from the many workshops and presentations on terminology that I have done over the years.

In addition to product names, company names and abbreviations, you probably have a lot of other lists with things that can be considered terminology.

How about…

  • Lists of products or product categories on your website?
  • Lists of trademarks, trade names and maybe even some definition or explanation with it
  • Images in manuals or brochures with associated parts lists (maybe already bilingual or multilingual)
  • Lists of job titles (for e-mail signatures, business cards…) and job descriptions
  • Lists of acronyms (abbreviated forms in capital letters, like OSW or MF’s)
  • Glossaries on your website or in user/training manuals
  • Table of contents and index of larger documents

And once you have collected all the stuff that is used and should be used, don’t forget all the terms and expressions that should NOT be used…

  • Because they are used by a competitor
  • Because the term is outmoded/outdated
  • Because the term should not be used any longer after a merger of companies

Next, check the feedback in the social media, ask people in the support hotline or legal department to see what terms or phrases have drawn comments, complaints or help requests – these are the terms that definitely need to be explained and defined and need to go into your term lists.


(Trainer for translation tools since 1997)

Terminology work (1) – how to get started

This series offers some insights from the many workshops and presentations on terminology that I have done over the years.

Don’t be afraid of terminology work!

Yes, there are many things you can do with terminology, but you can also start small and build upon it when time and resources permit.

How would you get started?

The most obvious thing is to collect what is already there.

A list of product and company names


  • Decide on the spelling that you want to use.
  • Decide if and when any of these names needs to be different in one of your target markets (for example in countries with different alphabets or Asian countries that use characters rather than letters
  • Make sure everybody in the company knows about this list and that translators have access to it as well.
Now build upon this list
  • Think about how product names are created in your company. Is there a pattern? Should there be a pattern?
  • How do you make sure that everybody uses the product names correctly? Are there checks for the source text authors and translators in place?
  • Do you discuss new product or company names with target language experts who can tell you if the proposed name might have any issues or unintended meanings in that language?
  • Make sure that whoever wants to change one of these names knows that they will have to shoulder the cost of changing it in all documents and all languages.


A list of abbreviations and their meanings

Everyone in the company will have a list or post-it or file that lists some of the company-specific abbreviations and their meaning.

  • Collect these lists
  • Award the person who comes up with the longest list
Check the list
  • Make sure that the combination of abbreviation and long form of the word are accurate.
  • Make sure everybody in the company knows about this list and that translators have access to it as well – they will be especially thankful as this list can help the translation tools to recognize better where a sentence ends (i.e. NOT at the dot of an abbreviation).
Now build upon this list
  • Think about how abbreviations are created in your company. Is there a pattern? Should there be a pattern?
  • How do you make sure that everybody uses the abbreviations correctly? Are there checks for the source text authors and translators in place?
  • Make sure that whoever wants to change one of these abbreviations knows that they will have to shoulder the cost of changing it in all documents and all languages.
  • Talk to your translation vendors and create the list in such a way that it can be easily imported into the term base components of the translation tools.
  • See if the lists can also be used within content management or authoring tools to help the authors.


These things sound obvious, don’t they? But you would be surprised how often this is one of the last steps when people talk about terminology management.


(Trainer for translation tools since 1997)

Video topic setup (3) – testing and script


Here is another step before I actually go into recording a training video – the testing.

It happens quite frequently that I have this nice idea for a video, get started with my samples, play around with settings and features and then realize during testing that I am either missing a step or the sample sentences are not in the right sequence or I find a bug in the software and cannot use the sample I had planned to use or, or, or…

Testing is essential. This quality assurance – the steps you take so that quality can happen – is absolutely necessary. I guess, everybody who tried to “repair” a video by adding boxes over things that should not be visible or tries to insert a screenshot in a video knows what I mean.

Going through the steps that I want to show also helps me to create the text that I want to have in the video (either written or spoken or both). I collect the text in a table format so that I can associate a certain slide number or dialog in the software with the text that belongs to it. If I am working with slides, the text document also shows when I need to move on to the next animation or next slide altogether.


video topic setup 3 - script

And, as tedious as it is, another QA step is to use the script to actually go through the feature you are showing again and again. If the topic is a very complex one and I need a lot of specific settings to show something, I make a list of the settings as well. Like, a special database to load, a special checkbox to activate, and the size of the windows I am showing etc.

But still, every time I actually do a recording, I find a typo in the script or a missing mark for starting an animation or I have to redo everything because this time I did not revert to the default settings I wanted to start out with. But there would probably be a lot more of those things if I had not done all the testing upfront.

Adding voice to a video can be done in several ways. You can record your voice while you click through the software or show the slides. Or you can do a video of the feature and add your voice later. For me, it does not really matter, the time I need to spend on it is about the same either way. I have now tried to work with a slightly different setup. Slides and voice to explain things and then small pieces of clicking through the software without voice-over.

I would be interested to hear what strategies other trainers are using.



(Trainer for translation tools since 1997)

Video topic setup (2) – sample files


After selecting your topic and creating a list of sub-topics and a structure for the video, there is one more step that can take quite some time – the creation of samples.

Very often, I know exactly what I want to show, but creating the right sample file for the purpose is not as easy as it looks like.

First, it should be a sample in the right language – if I create a German video, I try to use German source text in my samples. For other languages I tend to create separate sample documents instead of just translating existing ones. The obvious reason is that not everything that work in one language works also in another. Prominent example would be the use of numbers and dots in German (Das 3. Treffen der 2. Gruppe…) which can be nicely used to show segmentation issues in a translation tool. But the same example does not work for English (The second meeting of the third group…).

Next it should be simple, very very simple. The simpler the text the more people concentrate on the actual feature.

During my early training days some 15 years back I already used such very simple files. You know “this is a test”, “this is another test” etc. Some people (translators) complained that these simple segments did not represent the kind of text they had to deal with and they asked for some more sophisticated texts. When I tried that in my next class it turned out that now the participants were arguing over how to translate a sentence correctly instead of listening to me while I explained the feature. That is why I now use very short, simple sentence (and tell my audience why I do so, to avoid any discussion).

My motto is: One sentence – one feature. I found that it is easier to focus on the features of a software if one sample only applies to one specific feature you want to show. So one segment for showing the term check, another for showing a certain match value, another for showing the number substitution etc. People tend to get confused if the same sentence is used to show different features.

The hardest thing is the logical sequence of things to show. The sentences in the sample document do not need to make sense as a text, but the things you want to show will have a certain logic to them and therefore the sample sentences should follow that logic.

As our learning bits are quite small, so should be the sample files – short and easy to navigate. Put too much into one sample file and you keep jumping around in the file, losing your participants. The more you show them the more they will try to find out what the other sentences are there for which will distract them from the actual goal.



(Trainer for translation tools since 1997)

Video topic setup (1) – topic selection and structure


Having said in an earlier post that the smaller the learning unit gets the better the trainer needs to prepare such a topic, I would like to share with you how I approach the creation of a new video.

1. Select the topic

This sounds much easier than it actually is. The topic should not be too big or small. You need to be able to cover every angle of it within a maximum of 15 minutes. Also, if this topic ties in with other topics, there needs to be a logical sequence or each topics needs to be self-explaining to be able to stand alone.

Example: Topic = XML in translation

Too big -> split up into: XML basics, XML filter creation, XML in tool A, XML in tool B, multilingual XML, specialized XML like XLIFF or TMX, details in XML (attributes, elements, conditions, entities)…

2. Decide on the sequence.

Decide on the general structure (for learning purposes, it is useful to have the same structural setup for every video). Here is what I came up with:

  • Introduction (what is it good for, when do we use it, what will you learn to do…)
  • Setting the stage (give the basic background information a user needs to understand the following explanations: examples of when such a feature would be used, describe the situation when this feature might come in handy…)
  • Technical groundwork(information on what the feature does, what kind of input it needs and what kind of output is to be expected)
  • The show (video or slides with screenshots of the process)
  • Conclusion (summarize the process, what goes in, what happens, what goes out, when is it useful)
  • But wait, there’s more (additional information on pitfalls, things to consider, mistakes that can be made, things this feature cannot do)

Example: This is roughly what I would do, if the topic was “Analysis Statistics”.

  • To create word counts and match statistics
  • Used for pricing in translation projects
  • Estimation of workload
Setting the stage
  • What does the statistic do (count words, compare source language sentences/segments)?
  • Where and when can you run the statistics?
  • How do the results look like?
Technical groundwork
  • What is a word and how do different tools count words
  • What is matching and what different match values are there
The show
  • Select files in a project and start the statistics feature
  • Go through the settings and explain what they mean
  • Create the statistics
  • Explain how to read the outcome
  • Export the statistics
  • What do you use the analysis for?
  • When do you run the statistics?
  • What can the statistics tell you and what not?
But wait, there’s more
  • What settings can influence the number of words or segments?
  • What other settings can influence the match values (penalties on TM segments or alignments…)
  • What if the statistics tell you there are more words in the file than there could be (pitfalls, mistakes that can happen…)?

What do you think? Looking forward to your feedback.



(Trainer for translation tools since 1997)

Micro Learning / Learning with bits and pieces


Can you also feel it?

The attention span getting shorter and shorter?


We are so used to dealing with smaller and smaller bits and pieces of information that it seems to get harder and harder to concentrate on reading a text or watching a video for longer than, say, 5 minutes. I realize that myself when I sign up for one of the webinars that are so frequently offered on different tools and new functionalities. After a few minutes, and especially when the presenter does not come to the point, but goes on and on with some marketing-heavy slides, I tend to get frustrated, bored, angry and sometimes I even leave the webinar then.

Always with the thought in mind that I can go back and watch the recoding any time I like.

I almost never do that. Or if I do, I tend to skip through the recording with the fast forward button.


What does that mean for the training industry? It means that we must cater to the needs of the users. That’s why I have tried to create 5-10 minute videos instead of a 1-hour recording.

But let me tell you, the smaller the information pieces get, the more focused you have to be as a trainer. It takes a lot of preparation. Would you believe that a 15-minute video on a specific topic can take up to 6 hours of preparation?

  • Deciding on the topic / coming up with a concept of what you want to show
  • Creating the sample files and settings (takes longer than you think)
  • Running through the process to see if it works or if you forgot a setting / sample file / topic to mention
  • Create a script of what you want to say
  • Create some slides that can be shown as introduction or an image to be used as cover for the video/course/lesson
  • Run through the whole thing again: showing slides and clicking through the tool while reading the script (I found that it can be easier to do these things one after the other, i.e. adding the voice after producing the video)
  • Do the recording
  • Go through the recording to eliminate any background noises or sections where you made a mistake.
  • Produce the final video
  • Upload, add a description

This was a full day of work for a 15-minute video.


I would love to hear how you are creating learning videos and what your experiences are on how long it takes and how long the videos should be to be useful and are really watched all the way through.



(Trainer for translation tools since 1997)

How small can it get?



I have noticed that I tend to expect smaller and smaller pieces of information that focus on the exact aspect I am interested in. How about you? How much of a text on a website, in a newspaper or in an e-mail do you really read before skipping to the next paragraph?

The same thing happens in education. Instead of the 3-day training classes for Trados tools that I started with in 1997, I am now asked to cram all important information on much more complex tools into one day – without exercises. But what I also see happening is that the companies expect their employees to be able to work efficiently with a tool after a ½-day introduction. No time for learning by doing, no time to make mistakes, no time to try out features and processes and adapt them to the way you work. Mostly, this means that people will not work with the tools at all (I see that because one year after the initial training, the clients ask me to do another training, because they did not have time to implement the tool yet…).

So we move more and more into self-paced online training, which is a good addition, but not the best way to go for everybody

And make no mistake, the smaller the learning items get, the better the trainer has to be who creates those learning bits. They need to be able to focus on one particular feature without leaving out all the background information that you will need to understand the feature and to be able to use it effectively. A video that only tells you to click A then click B without telling you why and what you will need to consider before you click these options is not a training video – it is a video and audio representation of most online help contents I have seen so far – which mostly is not as helpful as the name Online Help suggests 🙂

So, how small is still big enough to be useful?


From my own experience, I would say anything up to 5-8 minutes will be watched fully. Anything that is longer than that, people will start skipping parts or do something else on the side.

What is your opinion on this? Looking forward to your input and experience.


(Trainer for translation tools since 1997)