Tool Box Logo

 A computer journal for translation professionals

Issue 16-6-262
(the two hundred sixty-second edition)  


1. Neural Machine Translation (Premium Edition)

2. Little Bit of Lilt

3. Useful Numbers?

4. Hieronymic Oath

5. New Password for the Tool Box Archive

The Last Word on the Tool Box

Biggest Pet Peeve (Cringe Alert)

There it was: the first of many language "mistakes" in this introduction (a "pet peeve" is by definition the biggest). Now that the figurative use of "literally" finally made it's way into the Oxford English Dictionary, many used in here will surely follow. And I could care less! One of the many rather unique properties of English -- or any language for that matter -- is it's constant development, irregardless of whether we like it.  Between you and I: its better then if the modern world had no affect on language; standing in developments way is a mute point. After all, that's the opinion of most of my translator friends, with who I share my passion for language. And they should know because its they're job.

I did almost loose my mind the other day, though, when someone said, "You did good!" If I was not as good-mannered as I were, I would have lied down the law for that guy. What was he thinking? Because, if anything, I did very good.

Enjoy your summer!


Introducing Translation Technology Insights Research Survey 2016

SDL has conducted a ground breaking study revealing a variety of statistics on the use of technology within the translation industry.

The study has gathered the opinion from individuals and organizations on current trends and provides suggestions as to what the future holds.

Download the free eBook to discover some interesting and often surprising results. 


1. Neural Machine Translation  (Premium Edition)

There has been much in the news lately about the next wave of MT technology driven by something called deep learning neural nets (DNN). With the help of some folks who understand more about it than I do (see below), I will attempt to provide a brief overview about what this is. I need to confess first that I will be trying to explain something here that I don't fully understand myself. Still, I hope that my research has helped me comprehend and communicate some basic underlying principles.

Most lectures you've listened to about machine translation in the past few years have likely included a statement like this: "There are basically two different kinds of machine translation -- rules-based and statistical MT -- and a third that combines the two -- hybrid machine translation." You then probably heard that rules-based machine translation was the earliest form of machine translation in the computer age, going back all the way to the 1950s and consisting of a set of rules about source and target language as well as a dictionary. The transfer between source and target language in rules-based machine translation happens either via an "interlingua," a computerized representation of the source text, or directly between source and target language.

Statistical machine translation, on the other hand, became all the rage in the early 2000s. (The first commercial offering, LanguageWeaver [now owned by SDL], was launched in 2002; the widely used open-source engine Moses emerged in 2005; Google and Microsoft switched to statistical MT in 2007; and Yandex and Baidu started using SMT as recently as 2011.) Statistical machine translation, or more accurately for all of these implementations, "phrase-based statistical machine translation," is trained on bilingual data and monolingual data. It parses the data into "n-grams," phrases consisting of an "n" number of words. The same thing happens to the source segment in the translation process. The source n-grams are then matched with target n-grams, which are then combined to form whole segments again -- and that's often where things goes awry. (This is why SMT can prove to be a much richer resource when using an approach that just looks for fragments rather than whole segments.) Another potential flaw with SMT is the faulty selection of which of the many possible target n-grams should be used. One way to guard against bad choices is by validating on the basis of the monolingual target data that the system was trained with, but that only goes so far. (And, by the way, that's why an approach that offers access to more than just one of those n-gram fragments at a time within a translation environment tool has to be one of the up-and-coming developments.)

Neural machine translation (NMT) -- and let's pause and be thankful that one of this technology's first proposed terms, "recursive hetero-associative memories for translation" (by Mikel L. Forcada et al. in 1997), did not survive -- is an extremely computing-power-heavy process (which is why it didn't go anywhere in 1997), and is part of the larger field of "machine learning." According to one of the field's pioneers, Arthur Samuel, machine learning is the "field of study that gives computers the ability to learn without being explicitly programmed" (1959).

In SMT, the focus is on translated phrases that the computer is taught, which are then reused and fitted together according to statistics; NMT, on the other hand, uses neural networks that consist of many nodes (conceptually modeled after the human brain) which relate to each other and can hold single words, phrases, or any other segment. These nodes build relationships with each other based on bilingual texts with which you train the system. Because of these manifold and detailed relationships, it's possible to look not just at limited n-grams as in SMT, but at whole segments or even beyond individual segments, allowing the formation of significantly more educated guesses about the context and therefore the meaning of any word in a segment that needs to be translated. For instance, it's at least theoretically unlikely to have "Prince" translated as a (royal) prince by an NMT in a sentence like "The music world mourns the death of Prince" as Google, Microsoft, Yandex, and Baidu all do at the moment (and, by the way, I'm mourning as well).

In languages like German with separable verbs such as umfahren ("run over"), there is a much greater likelihood that the system will notice the missing part of the verb at the end of the sentence if the machine does not have to bother with chopping it into n-grams first. Take, for example, the simple sentence "Ich fahre den Fußgänger um" -- "I run over the pedestrian." Bing translates it (today) as "I'm going to the pedestrian" and Google as "I drive around the pedestrian"; only Yandex gets it right (Baidu does not offer this language combination).

The machine learning (itself a subfield of artificial intelligence) additionally comes into play as common usage gradually forges certain linguistic connections ("music world" and "Prince"; "fahren" and "um"), so the computer continues to "learn" without explicitly being programmed as Samuel had predicted.

At least theoretically, therefore, the NMT approach is very promising for generic engines like those of the search engines mentioned above (Google, Microsoft, Yandex, Baidu) because "context" does not necessarily have to be specified by the training data but can be recognized by the system evaluating the context (provided that the user supplies more than just a word or single phrase). So you won't be surprised to hear that all those companies have already entered the realm of NMT. Naturally they don't reveal how much of their present system is "neural" vs. "statistical only," but chances are it's a mix of both. And that would make all the more sense since one of the ways to use NMT is in combination with SMT -- either as a quasi-independent verification process or as an integrated process that helps in selecting the "right" n-grams.

In some areas similar processes have already demonstrated remarkable success -- including some that are used by the search engines. Remember when I mentioned how to use the search on Google Image for translation tasks? That is one of the areas where neural network processes have proven to be quite successful.

You probably read that Facebook launched its own machine translation system earlier this year specifically geared for the very casual language of its users. While that system is still mostly SMT-based, they're working on an NMT solution as well. You might want to take a look at Facebook's Alan Packer's (formerly of Microsoft) talk right here.

One misconception in his talk is his description of all this as a linear development. He paints SMT as more or less having run its course, now to be taken over by NMT. While I understand that someone so deeply embedded in one particular field must automatically think it the only worthwhile one, it's really unlikely to be the case. The same was said in the early days by proponents of SMT about RbMT (rules-based MT), and that assumption has not proven to be true. Many systems are using a hybrid approach between SMT and RbMT, and for some language combinations RbMT might still be a better solution (especially for language pairs that are very close to each other, like Catalan and Spanish or Croatian and Serbian).

But are we on the verge of a big new breakthrough overall? To answer that, you might want to look through this joint presentation by Tauyou and Prompsit. Since there is no open-source toolkit for NMT like Moses for SMT, very few companies actually offer customized NMT systems. There are components like the deep learning frameworks Theano and Torch and specific NMT software like GroundHog and seq2seq, but these are anything but user-friendly and require significant expertise. Using them to build an NMT engine takes a lot of computing power (10 CPUs or 1 GPU -- graphics processing unit) and time (about two weeks of training per language pair once the training data is assembled and cleaned). Tauyou and Prompsit are some of the first vendors who are working on commercial versions of NMT (interestingly, Tauyou comes with an SMT background and Prompsit with a background in RbMT). While they are not actively selling the NMT solutions yet, they are doing a lot of pilots as you can see from the presentation. And the results of these pilots are mixed.

I already mentioned the much larger processing and time requirements, and there are also limitations as far as the number of words per language that can be trained with the processing power currently available to mere mortals (in opposition to companies like Google), the approximately three-fold time the system takes to actually translate, and the fact that retraining the system with new data would once again take two weeks. But there are some improvements in the quality -- although, according to the presentation, this is not adequately appreciated by translators (which I assume has to do with even less predictability when it comes to post-editing -- and presumably even more erratic decisions when it comes to partial suggestions). However, this is still very early in the game, so I wouldn't be surprised to see the quality continue to improve.

Do we need to start shivering in fear when we hear folks talking about neural machine translation? Although I don't completely understand the technology, I (and now you) have seen numbers showing only moderate progress. So, no, we'll continue to be very assured of our jobs for a long time. I do look forward, though, to seeing how NMT will creatively find its way into our translation environment and improve our work.

Thanks very much to Kirti Vashee (formerly of Asia Online but now very eager to consult on any kind of machine translation implementation), Jay Marciano (who will not only present his "regular" talk on demystifying MT at this year's ATA but also one on what the increased use of artificial intelligence means for translators over the next several years), and Gema Ramírez Sánchez and Sergio Ortiz Rojas of Prompsit.



Your June memoQ Build is Out:
More Speed and Even More Quality

New Build - New Features: the most comprehensive CAT system in the market offers you more in June!


2. Little Bit of Lilt

I was thinking to myself: How can I write something about Lilt that would finally communicate why it's so different? And why would I want to do that? Because I think Lilt's functionality hasn't yet been communicated well enough to be widely understood. I've looked at a number of articles and blog posts about Lilt that talk about it positively but miss the essential point. That point? Lilt presents a complete paradigm change. Pretty much nothing that we've assumed to be unshakable for how translation environment tools work holds true in Lilt (like needing to have a separate translation memory, termbase, or machine translation engine, or having to take care of formatting -- through tags or otherwise -- while translating), and mixed with its astonishing productivity results in a recent case study, there is way too little noise about this.

So rather than writing about it again, I thought it might just be easier to record myself narrating the process as I translate in it. (For those who have never heard me speak, you might finally discover that the fine English in the Tool Box Journal is all thanks to my good editor, who unfortunately could do very little to change my rambling in the video. Hats off to all interpreters -- there's a reason why I don't talk while I translate!)

There are just a couple items that should be mentioned separately: Lilt now also supports Danish and Swedish as well as Simplified Chinese, so the available language combinations are now English <> Danish, Dutch, French, German, Italian, Portuguese, Spanish, and Swedish, and English > Simplified Chinese. And it now supports virtually the whole range of formats you'd expect, including Trados SDLPPX packages.

Lilt video




Fresh on Memsource's blog:


3. Useful Numbers?

About three years ago, I reported on some data that David Canek from Memsource gave me regarding the use of translation resources. You can read the article right here.

The most important numbers were these:

  • 71% used the TM feature
  • 38% used the termbase feature
  • 46% use machine translation

Now three years later, these number have changed quite drastically.

Machine translation is used by only 31% (vs. 46%), TM by 85% (vs. 71%), and the termbase feature by 50% (vs. 38%).

I was immediately struck by the rise of termbase data use (and, yes, I felt very encouraged), but the decreased use of machine translation is also very interesting.

How is that possible?

First of all, these numbers have to be viewed with a degree of caution. We are talking about just one (kind of) tool with a specific kind of user group. In addition, the usage of Memsource has risen from about 100 million words per month to between 600 and 800 million a month. But knowing all that could also give us some good clues.

So while I don't have a final answer, here is what I think/hope: Overall MT use might actually be rising, but I think that's mostly due to translators' increased use as an additional resource in the form of auto-suggest-like scenarios (and there of course are language combination-specific differences in how much this applies). For LSPs and translation agencies (the primary license owners for Memsource and therefore typically those who determine whether translators have access to MTs), MT usage might actually be on the decline. There were quite a few early LSP starters who may have eventually realized that it's a lot harder to do machine translation right and profitably than they first thought. This may be particularly true for small and medium-sized companies. This leaves fewer, larger, and technologically more sophisticated and experienced agencies still using MT. Again, I could be wrong about my conclusions, but I bet I'm not completely off.



MateCat. More matches than any other CAT.

Translate in the cloud, faster than with any other CAT tool.
Supporting over 60 file formats and now also Google Drive files.


4. Hieronymic Oath

I was inspired last week to read about the (now retired) University of Helsinki's Andrew Chesterman's proposal for a "Hieronymic Oath" that would be comparable to a Hippocratic Oath for translators and "would help to distinguish between professionals and amateurs, and promote professionalization." I have to say I really like it.

It now feels doubly good every morning to wink at the little Jerome statue next to my door as I walk into my office. 

jerome statue



Leave the office 20 minutes earlier today!

Using MindReader for Outlook, you can write e-mails more quickly and more consistently.

Get your free trial license at STAR Group webshop 


5. New Password for the Tool Box Archive

As a subscriber to the Premium version of this journal you have access to an archive of Premium journals going back to 2007.

. . . you can find the access link password in the premium edition. You can subscribe to an annual subscription to the premium edition costs just $25 at Or you can purchase the new edition of the Translator's Tool Box ebook and receive an annual subscription for free.


The Last Word on the Tool Box Journal

If you would like to promote this journal by placing a link on your website, I will in turn mention your website in a future edition of the Tool Box Journal. Just paste the code you find here into the HTML code of your webpage, and the little icon that is displayed on that page with a link to my website will be displayed.

If you are subscribed to this journal with more than one email address, it would be great if you could unsubscribe redundant addresses through the links Constant Contact offers below.

Should you be interested in reprinting one of the articles in this journal for promotional purposes, please contact me for information about pricing.

© 2016 International Writers' Group    


Home || Subscribe to the Tool Box Journal

©2016 International Writers' Group, LLC