Archives: August 4, 2016

Terminology work (5) – extraction

This series offers some insights from the many workshops and presentations on terminology that I have done over the years.


The terminology you are using appears in any written text, be it website pages, brochures, manuals, guidelines, contracts, reports…

Someone will have to read all that text and decide what is a term and extract (copy/paste) it. This can be helped by tools, but make no mistake, the tools are not intelligent. Most terminology extraction tools work on a statistical basis – the more often a term appears, the more important it is. Which is not always the case. An important term might come up only twice, one in the heading and once in the first paragraph and afterwards it might be referred to with its short form. In this case, most statistical tools would not extract the term, as it appears less than 5 times.

There are linguistic extraction tools, but they are limited to the language pair they were built for and are not available for all language pairs. They at least can be configured for example, to extract noun phrases of up to 4 words, which are usually good candidates for a term list. Statistical tools will create a huge list of possible terms, but then this list needs to be checked for the real terms.

From my experience (mostly extractions from English and German technical and medical documents) there is a threshold from which the extraction with a tool makes more sense. I found that up to 20.000 words of text, it does not really make a difference if you read through the text sentence by sentence and select the terms manually or run a statistical extraction tool and then go through the list and mark the terms you want to keep. After that, the extraction with a tool is faster.

Most translation tools will have a component that allows the extraction of terms and can be used both for monolingual (usually source language) material and also for bilingual material, i.e. translation memories, bilingual files from the translation process or alignments of files.

To estimate how much terminology can be extracted, I usually calculate with about 20% or the terms of a list extracted by a tool or between 5 and 15% of the overall word count of the document(s), depending on whether they are more general or more technical in nature.

When extracting terms, make sure you have defined what kind of terms you are looking for (see part 3 of this series: Terminology work (3) – fundamental decisions).

 

Angelika

(Trainer for translation tools since 1997)


Terminology work (4) – fundamental decisions about the user

This series offers some insights from the many workshops and presentations on terminology that I have done over the years.


Most terminology works starts life in Excel – which is a very good way to get started, but not something you would use for professional terminology management.

Usually, when you start to think of terminology work, you already have a goal in mind or a pain point that needs your attention.

  • Recurring questions from translators – you want to provide them with a term list or term base that can be used in the translation tool (for terminology recognition and terminology checking)
  • Support tickets because users misunderstand the product or process description
  • Company-internal effort to check translated documentation for the correct terminology
  • You want to provide the company terminology to all users in the company through the intranet
  • You want to provide terminology lists to the authors

Depending on the intended user group, the information associated with each term can be different. Whereas a translator needs to know the term, the translation, any forbidden alternatives and the product the term belongs to, other users in your company might need something more like a dictionary with information on gender, plural forms or context examples.

If you want to provide terminology for translation, ask your translation vendors what format a list should have, maybe they already provide online access to their term base system and allow collaboration on terminology online.

If you want to provide terminology as a company dictionary through the intranet, talk to your webmaster how a list can be brought online and, most important, how it can be updated periodically.

If you want to provide term lists for authors, ask them, if they are using a term checking tools in their authoring environment and how a term list would need to look like, to be easily importable.

Any of these settings differ in the way the term lists need to be set up and what kind of information (metadata to the term) needs to be added.

 

Angelika

(Trainer for translation tools since 1997)


Terminology work (3) – fundamental decisions

This series offers some insights from the many workshops and presentations on terminology that I have done over the years.


Now let’s move on to more complex things – you need to decide what a term is.

If you take the view of a dictionary, a term is something that needs to be explained. But make no mistake, mostly the general words, words that everybody knows and seems to understand generate most of the problems when creating or translating text. Words like cap, bolt, device etc. seem to be so general that you would not put them into a company dictionary and therefore also not into a terminology database. But these are exactly the terms that will produce most of the questions and misunderstandings.

Mostly, because they are used as the short form of a longer term. Instead of talking about a “multiple-output generating device” in every second sentence, you would probably use it once or twice and then shorten it to “device”. Everyone who reads the text will see what you mean – but what if this text comes out of a content management system? A translator might get a small module to translate where the long form of the word in not to be found – how should the translator know what “device” exactly the text is talking about?

In this case a good terminology database that states the word “device” as the short form of one or several longer terms and gives some explanation what it is and how it should be translated in different circumstances, can help a lot.

When deciding on what a term is in your special case, try these categories:

  • Everything that has to do with your company and differs from other companies.
  • Everything that is special to your products and where a term differentiates between you and your competitors although you are producing the same thing (keep the term of the competitor as a forbidden term).
  • Things that are special to the subject matter area you work in.
  • Things that need an explanation (don’t forget the everyday words here)
  • Abbreviated forms, acronyms, slogans, mission statements

And now we are at a point where terminology work can get messy and starts to grow uncontrollably.

In order to keep things manageable, limit the terminology collection to the source language and to one product (maybe the base product for others or the most used product). Once you have collected the most important terms here, you can move on to other products or other languages.

Angelika

(Trainer for translation tools since 1997)


Terminology work (2) – how to get started (continued)

This series offers some insights from the many workshops and presentations on terminology that I have done over the years.


In addition to product names, company names and abbreviations, you probably have a lot of other lists with things that can be considered terminology.

How about…

  • Lists of products or product categories on your website?
  • Lists of trademarks, trade names and maybe even some definition or explanation with it
  • Images in manuals or brochures with associated parts lists (maybe already bilingual or multilingual)
  • Lists of job titles (for e-mail signatures, business cards…) and job descriptions
  • Lists of acronyms (abbreviated forms in capital letters, like OSW or MF’s)
  • Glossaries on your website or in user/training manuals
  • Table of contents and index of larger documents

And once you have collected all the stuff that is used and should be used, don’t forget all the terms and expressions that should NOT be used…

  • Because they are used by a competitor
  • Because the term is outmoded/outdated
  • Because the term should not be used any longer after a merger of companies

Next, check the feedback in the social media, ask people in the support hotline or legal department to see what terms or phrases have drawn comments, complaints or help requests – these are the terms that definitely need to be explained and defined and need to go into your term lists.

Angelika

(Trainer for translation tools since 1997)


Terminology work (1) – how to get started

This series offers some insights from the many workshops and presentations on terminology that I have done over the years.


Don’t be afraid of terminology work!

Yes, there are many things you can do with terminology, but you can also start small and build upon it when time and resources permit.

How would you get started?

The most obvious thing is to collect what is already there.

A list of product and company names

 

  • Decide on the spelling that you want to use.
  • Decide if and when any of these names needs to be different in one of your target markets (for example in countries with different alphabets or Asian countries that use characters rather than letters
  • Make sure everybody in the company knows about this list and that translators have access to it as well.
Now build upon this list
  • Think about how product names are created in your company. Is there a pattern? Should there be a pattern?
  • How do you make sure that everybody uses the product names correctly? Are there checks for the source text authors and translators in place?
  • Do you discuss new product or company names with target language experts who can tell you if the proposed name might have any issues or unintended meanings in that language?
  • Make sure that whoever wants to change one of these names knows that they will have to shoulder the cost of changing it in all documents and all languages.

 

A list of abbreviations and their meanings

Everyone in the company will have a list or post-it or file that lists some of the company-specific abbreviations and their meaning.

  • Collect these lists
  • Award the person who comes up with the longest list
Check the list
  • Make sure that the combination of abbreviation and long form of the word are accurate.
  • Make sure everybody in the company knows about this list and that translators have access to it as well – they will be especially thankful as this list can help the translation tools to recognize better where a sentence ends (i.e. NOT at the dot of an abbreviation).
Now build upon this list
  • Think about how abbreviations are created in your company. Is there a pattern? Should there be a pattern?
  • How do you make sure that everybody uses the abbreviations correctly? Are there checks for the source text authors and translators in place?
  • Make sure that whoever wants to change one of these abbreviations knows that they will have to shoulder the cost of changing it in all documents and all languages.
  • Talk to your translation vendors and create the list in such a way that it can be easily imported into the term base components of the translation tools.
  • See if the lists can also be used within content management or authoring tools to help the authors.

 

These things sound obvious, don’t they? But you would be surprised how often this is one of the last steps when people talk about terminology management.

Angelika

(Trainer for translation tools since 1997)