You can view earlier editions of the Tool Box Journal going all the way back the 2007
in the archives to which you have access if you support my work on the Journal.


Tool Box Logo

 A computer journal for translation professionals


Issue 19-4-299
(the two hundred ninety ninth edition)  

Contents

1. "Neural Machine Translation Is Not Translation"

2. Translation Commons

3. This 'n' That

4. Using Neural Machine Translation Beyond Post-Editing -- A Conversation (Part 2)

The Last Word on the Tool Box

Journalism and Translation

My name is Jost and I'm a bit of a news junkie. I'm not particularly proud of it, and I certainly don't think it's something to emulate. I am very happy that two of my three adult kids at least follow the news -- which I think is important -- but I hope they won't develop the same obsession that I have.

Anyway...

During the last couple of years, I have been (mostly) in awe at how journalism has blossomed. In a way, it's crazy to say that. At least here in the U.S., journalism has never been more criticized, and print journalism especially has struggled -- and often failed -- to stay afloat. But those media that have been successful have often produced amazing stories. Does anyone remember the long piece in the New York Times in 2015 on nail salons in New York? To me this heralded the movement toward (even more) amazing journalism. (And, yes, the translation of that article into three other languages besides English to be accessible to the workers in the nail salons, which is something that especially the NYT has often done since, certainly made it near and dear to my heart as well.)

Journalism has also increasingly flourished in with the use of multimedia tools even in venues not traditionally known for their cachet. Check out this piece about the US-Mexican border by The Arizona Republic and USA Today, or this one about the heroin epidemic by the Cincinnati Enquirer, both of which were awarded a Pulitzer Prize for their reporting this week.

The point is this: The highest levels of journalism (here in the U.S., but I'm sure elsewhere as well) have responded to the threats they face day-in and day-out not with resignation but with new heights of excellence, not only in (multilingual) content but also in innovative and creative ways never used before. I find this incredibly encouraging.

Translation and interpreting are not facing the same threat as journalism -- not yet, and likely not for a long time to come. But if they ever do, I hope we remember these lessons. (And there's no reason not to ramp up our excellence and product offering even at this point.)

Have a blessed Passover and Easter.

ADVERTISEMENT

Translating subtitles is now easy with SDL Trados Studio

Introducing our new Studio Subtitling app. This app will make translating subtitles easier than ever, whilst enabling you to utilize the QA checks, termbases and translation memories within SDL Trados Studio.

Join our live webinar on 29 April to see the Studio Subtitling app in action.  

Sign up here >> 

 

1. "Neural Machine Translation Is Not Translation"

Mikel L. Forcada, who teaches at the Universitat d'Alacant, co-founded and works for Prompsit, and has been the president of the European Association for Machine Translation for the last four years (and an all-around nice man), recently said this in a comment for a blog post:

"Neural machine translation is sometimes intelligible and fluent, but not really a translation of the source. Post-editing is simply editing the 'machine translation' of the source so that it is a 'translation' of the source. Therefore, post-editors have to be translators, not just 'linguists.' Intelligibility is just one ingredient in the mix."

This is such an interesting statement from someone who is deeply involved in machine translation.

Let's pick this apart a little bit. The one comment I differ with somewhat is calling the interaction of translators with MT "post-editing." I still maintain that this is only one of the possibilities, and often not the most effective. But let's not focus on that for now (especially because Mikel agrees). Let's instead look at two other things that he is saying.

"Neural machine translation is sometimes intelligible and fluent, but not really a translation of the source."

Coming from a translator this would not be worth much attention, but coming from one of the leading MT experts in Europe, this is really meaningful. Mind you, he is not making a judgment here about the quality of the output of neural machine translation per se. He is talking about the nature of the output and its generation. And the one thing it is not is translation. It can be similar or even identical to translation, but because of how it is generated and what might be missing, it's not a translation.

The other statement that is just as important -- maybe more so, in fact -- is that translators are needed to fix the outcome. I used that quote for a presentation I gave in Forlì recently, and someone from the audience quoted me on Twitter, which in turn spawned some rather critical comments. The fact is that with statistical machine translation, many thought that translators were not needed for a first pass of post-editing. Massive errors were relatively easy to spot, and it was often thought to be cheaper to have those marked by an underpaid bi- or even monolingual reader and then sent off to a translator rather than have the translator do the whole text.

With neural machine translation, this is not the case. As Mikel correctly asserts, NMT tends to be fluent, no matter whether it's correct or not. This makes it harder for a translator to spot errors, and virtually impossible for a non-translator (who can't evaluate the source adequately). This in turn means that it's (once again) us, and only us, who are needed, and who therefore can have a say about the way machine translation is being used.

And speaking of machine translation in Europe, the MT Summit in August in Dublin has a promising-looking translator track. 

 

ADVERTISEMENT

Now in Beta: Memsource Translate - Unlock the Full Potential of Machine Translation

Use the best-performing machine translation engine for your content, easily identify high-quality machine translation suggestions, increase translator productivity, and reduce costs.  

Try it now.

 

2. Translation Commons

Translation Commons is a very ambitious project, spearheaded by Jeannette Stewart. In Jeannette's own words

"Translation Commons is a nonprofit self-managed volunteer online platform offering free access to translation tools and resources for everyone. It is a one-stop platform for all information and resources relating to language translation and interpreting. It facilitates the creation of volunteer working groups and helps language students and graduates get mentors and enter the marketplace."

You can already find a number of translation tools and management tools there, as well as training modules, such as "eCoLo" (electronic Content Localisation) (formerly "eCoLoRe" or "eContent Localization Resources for Translator Training"),"a set of shareable resources for translation training" that was originally developed in the early 2000's with EU funding.

As Jeannette mentions in the quote above, this is also volunteer-driven, and accordingly there are a number of activities in which you -- or any translation professional -- can engage. Most of the activities occur within the framework of interest groups. Two that I find particularly interesting are the Certifications group and the Professors and Lecturers group

The Certifications group is vaguely based on an idea I had a few years ago and then donated to Translation Commons: a place to highlight the importance of certifications and degrees in our field, allowing you to distinguish yourself so clients can find you on the site. The first deliverables of this group will likely be listings of certifications and degrees, but the possibilities are endless.

The Professors group comprises 45 or so members from universities and colleges around the world. Maybe it's because they don't have other platforms to talk to each other, but things are hopping with these folks, including exciting plans in relation to the International Year of Indigenous Languages. (More on the IYIL in the next Tool Box Journal.)

This is how they describe themselves:

"The Professors and Lecturers Group is a space where instructors from different universities all over the world can collaborate and share resources in order to achieve an excellent quality of education in localization, translation, and interpretation. This group also provides a bridge between academia and the professional world, by helping educators promote the profession of translation, interpreting and localization, creating contacts for possible internships, exchanges, and opportunities for students to get hands on training."

So, all this to say, if you want to step out of your routine, meet some new people, and advance your own career in the process, one of the many facets of Translation Commons is a good place to do just that.

 

ADVERTISEMENT

Meet and learn from the best at memoQfest 2019

Soon it's this time of the year again: memoQfest will return on 29 - 31 May 2019 to Budapest, Hungary.

  • Meet Jost Zetzsche. Yes, Jost will be one of the keynote speakers at memoQfest!
  • Stay ahead. Be among the first to learn about the newest industry trends, developments, and technologies from the best professionals worldwide.
  • Got difficult questions? Get detailed answers at the conference!

Register now!  

 

3. This 'n' That

Recently I talked with Michael Farrell, the maker of über-terminology-search-tool IntelliWebSearch, and asked him when the new and promised version of his tool would be released. Sort of in the nudge-nudge-I-guess-you-didn't-live-up-to-your-promises kind of way.

Michael just fixed me with a slightly puzzled look and asked whether I hadn't seen the release of version 5.1 last December.

Oops, my bad!... But then he mentioned that he actually released it on December 24, and I felt somewhat better -- I had a few other things on my mind on that and the following days.

So what's this all about? With the first incarnation of the new version of IATE, the massive and much-used EU terminology database, it looked like the only way to use a search tool like IntelliWebSearchwould be through IATE's application programming interface (API) rather than relying on search parameters in the URL (the Internet address). After hearing plenty of complaints about that, the powers-that-be at IATE quickly changed course and enabled the old search capabilities so tools like IWS would again work in connection with IATE (proving that large institutions can react promptly!).

However, since Michael had already started to work on enabling API-based searches, he continued and released the version mentioned above.

It comes with four different pre-configured APIs (DeepL, Microsoft Translator, and the neural and statistical engines for Google Translate), but you can also create your own.

IWS API

And why, you might ask, should you use these paid API versions rather than the free web-based versions? The late Aretha Franklin would have said C-O-N-F-I-D-E-N-T-I-A-L-I-T-Y (perhaps a few too many letters for the tune; if you're singing along, try it this way: CON-FI-DEN-TI-A-LI-TY!).

As has been said many times before, only by using the API-based search do you have the assurance from these three companies that they will not use your data elsewhere.

 

SDL just announced that it will release an add-on that will allow you to translate subtitles in its translation interface while watching the sequence of the video as you translate. This feature follows the same in Star Transit and memoQ, and I'm very glad about that (as I'm sure all Trados users are). Subtitle translation has skyrocketed, and it will likely just continue to do so in the foreseeable future.

Note, though, that this add-on is not quite available yet. I talked to SDL's Paul Filkin about it, and he said "probably next week, or the week after." He did share this link with some more details, and he also mentioned this:

"I'd also be happy to learn more about the preferred subtitling formats. So far it's based on what I could find out so we support SRT, STL (Spruce), webVTT, SBV (YouTube). We intend to do TTML (bigger task this one... but important) and I'll happily look at anything we find out is more helpful."

It sounds like this is a real opportunity for the subtitling professionals among you to make your wishes known. You can find Paul on Twitter right here.

 

Although this last item "has been designed with non-technical people in mind," it is for the more technically inclined, a fact that makes it no less interesting. The Firefox browser (and related tools) is localized into about 100 languages (you can see a list here, or you can select them from within your Firefox browser under Options> General> Language). Based on frustrations related to localizing into so many languages, the team under Jeff Beatty at Mozilla (Firefox's parent organization) has developed a new localization-friendly language for the translatable user interface files -- Fluent -- that aims to take care of the gazillion exceptions in numbers, cases, gender, and on and on that different languages require. The concept is this: Rather than having to come up with every possible exception when writing the original code and forcing every language to deal with it no matter whether they require it, each language can add its own variations if necessary. The makers behind this called the concept "asymmetric localization." Clever.

You can read about it right here and find links to documentation on the project's homepage. I like it that the Mozilla team is "inviting translation tool authors to try it out and provide feedback" from the get-go. 

 

ADVERTISEMENT

Your style, your wording, your content. Utilize translations quickly and reliably throughout your company. Corporate machine translation powered by STAR MT

Watch the short video for more information on STAR MT functionality and usage:
https://youtu.be/vvO1DsUwNKU 

www.star-group.net  

 

4. Using Neural Machine Translation Beyond Post-Editing -- A Conversation (Part 2)

Many of you will remember the conversation I started in the last Tool Box Journal with Félix do Carmo, a translator and machine translation researcher, about best practices for using neural machine translation. (If you don't have that issue handy, you can read it right here.) This is how our conversation continued and ended:

 

JOST: ... What's being done in academia with NMT in a more practical manner to move beyond "post-editing," as vague as that term might be?

FELIX: I would say that current research is still very much focused on using and applying NMT to produce better output to feed to traditional tools. We should mention four areas of current research that will affect the way NMT output will be presented to translators: INMT, AMT, APE, and QE.

  • Interactive Neural Machine Translation (INMT) is dedicated to developing ways to incrementally feed output to translators from neural networks trained on parallel corpora. These systems model the translation work as described above: the translator generates the translation, starts writing, the NMT system suggests the next fragment, and, all going well, the translation is created faster than if the translator did not have this "voice over the shoulder." For these systems to be accepted and become regular tools translators use, they need to feed suggestions that are adjusted to each context. Since INMT outputs words that are constrained on the words already written, there is the expectation that the suggestions presented by these systems will be better than those possible with SMT engines. However, this is still an area which raises more questions than answers. For example, can you constrain the output not just on the previous target words, but also on a list of validated terminology, and control how accurate the whole process is?
  • Adaptive Machine Translation (AMT) has been proposed as a term to describe systems that learn specific traits of each translator's work and adapt suggestions to those traits. It is not yet clear how this will be done, which traits these are (some call it "style," which is one of the vaguest terms one can use), and how effective this actually is.
  • Another complementary area that is being researched is Automatic Post-editing (APE). The name may sound like another way to replace translators, now not only in the translation stage but also in the editing and revision stages. Actually, I would say this is just another way to improve the output. It has been shown that applying NMT technology to APE improves the output of MT systems. However, again, despite this improvement in the output, this does not change the nature of the translating/editing work that is required, and the fact that this work requires professional translators.
  • An area we must also refer to is Quality Estimation (QE), which tries to give some indication of the segments that may not require much editing, and those that may require extensive translating work. QE may also serve to highlight words that are probably wrong in a translation suggestion. This is complementary information which may help in the translation decision process. The use of NMT methods for QE has also enhanced the capacities of QE methods.

So, these four areas -- INMT, AMT, APE, and QE -- complement each other in helping the translator: they provide the translator with better suggestions (as interactive/dynamic pieces for the translator to build his translation or as better full sentences for him to edit), and they help filter out bad suggestions, guiding his attention to what may really require more work.

To describe how to leverage this technology to give the translator more than just better output for him to edit, the discussions have been going on around terms like "augmented translation" or "knowledge-assisted translation," but the discussion started a few years ago when we started talking about next generation translation tools. Apart from the integration of some of the concepts above, like INMT in Lilt, or QE in Memsource, most of these ideas still did not come off the paper to become a reality in the daily lives of most translators.

There is a tendency in academia and the industry to discuss the names more than make the revolution. One of the most recent signs of that is the suggestion to stop talking about NMT (because it is said that it is now officially the same as MT), and to talk instead about Artificial Intelligence (AI). But all these new terms simply express the challenge to combine not just the plethora of sources we mentioned earlier but also the plethora of technological approaches into the same tools.

JOST: I do actually like the suggestion to talk about AI instead of talking about NMT, and it's also interesting to see that some of the research areas have already found their way into tools, including the tools that you mention but also SDL, Intento, and ModernMT. As a last question, I would like to ask you something practical, though. The typical translator does not have access to customized MT engines (with the possible exceptions of the adaptive engines mentioned above, or if the client gives access to a customized MT). If the translator chooses to use an MT engine, they will end up using engines like Google, Microsoft, or DeepL. How can one of these engines -- or indeed several at the same time -- be used more productively or creatively than having the translator essentially just responding to the suggestions that these engines make? How can the translator be in the "driver's seat" when using these resources?

FELIX: For me, the next technological step will be personalization. (Actually, it is not such a ground-breaking proposal; this is another buzzword that has been hanging around for a while.) As our industry matures, we should identify the value of each node in the supply chain, and we should have technology and management of resources adapted to each of those nodes. Corporations will go on managing big data, but they will suffer from the anonymity and genericity of that data. LSPs will need to manage their client's data judiciously, and freelancers will need tools that help them manage their own data locally.

So, to be in the driver's seat, translators will need to have a clear right to manage the data they produce, and to keep personal TMs of all translations they do, more than to have access to other translators' and companies' resources, or to an increasing number of tools and technologies. Translators need to know their work better, and they will need tools that record and give them better insight into what they have been doing in previous projects, whether these are individual projects or collaborative ones.

In a scenario in which your translation tool receives input from MT engines, personal, client or collaborative project TMs, terminology databases, previous answers to queries, online discussions on translation suggestions, and many other resources, a translator needs different things (see below).

The main thing about tools that are adapted to specialized translators is that they should work in the background to feed the best suggestions possible, but the whole translation decision needs to be done by the translator.

As for the details of how to use these technologies productively and creatively, instead of just responding to suggestions, let's think about a futuristic scenario in which translators work in a mode simply called "Interactive Translation," a scenario which integrates MT and TM, different text resources and online features, and supports both translating and editing work. And it supports both "interactive" and "pre-translation" translators, those who prefer to type over some text, and those who prefer to write from scratch.

In Interactive Translation, everything comes down to the challenges of building a good interaction with the translator, and this means having an interface that adapts dynamically to his needs. I can describe parts of how I envisage a tool that adapts to translators in the future.

The interface should be very clean and uncluttered at the beginning, helping the translator read the text he has to translate, maybe even presenting him with an automatic summary of the text. It may also show him other projects in his pool of resources that may be associated with that text, and terms and segments which may constitute the main issues he will deal with throughout the translation. Or it may make those choices for him and not show them at this stage. At this initial stage, the tool will also have very detailed statistics which estimate effort, quality of the MT output, and other details which may be useful for more advanced users, like the possibility to extract rules from style guides and client instructions and to automate their checking.

The translator may approach the translation in many different ways, from the first segment to the last, starting with those problematic instances, or following any other structure he identifies in the data to translate. In the background, the tool selects the best resources for each segment, either a TM, an MT engine solution, or a composition from fuzzy matches, terminology, and any other resources.

When the translator starts translating, he will see the best suggestion the machine comes up with for each segment. If he sees that this suggestion is perfect, he will validate it. If he wants to know more about that suggestion, he will have a simple way to dig deeper and find where it comes from, how reliable it is, if there are other alternatives from other sources which he might prefer. And he can decide to act on these suggestions one by one or to aggregate them -- for example, dealing with all full matches from a reliable TM at once. But if he needs to edit the suggestion, he will have several forms of support described in a bit more detail below.

The suggestions from the tool are always presented in full, but the translator manipulates them at his will, moving things around, deleting words and inserting new ones. When he selects a word to apply any of these actions, the tool adapts and shows different supports. For example, when he decides to replace words without moving them, the system should be ready to present alternatives for that position, which may simply be a change in the form of that word; when he moves words around, the system should be able to suggest changes that depend on the new position of those words. These suggestions are not the same for each translator or for each project. So, it is fundamental that the tools learn from the translator's behavior, to predict regular edits, and to save and reuse them in similar contexts in other projects.

There are other activities translators do which may be supported by these new tools, like web searching, or making annotations and queries. The knowledge behind decisions supported by these resources is not integrated into translation tools, and it would be great to have this closer at hand.

When the translator stops, the tool can show him statistics on how far he is in terms of the whole project, or other assignments he is currently engaged in, and how the project is in terms of final checks. Before he decides to submit, the tool can do a QA check and reuse the records of the decisions he made to guide him in revising the project. For example, it may help him prepare a report for the reviser with the most troublesome passages, or a list of the sources he used for new terminology.

We could go on dreaming of the details of such tools, but our dreams as translators are not the same for everyone. We realized in our conversation that you dream of tools which are not so focused on editing as the ones I dream of, but which rely on the translator generating the translation and the tool playing a not so intervening role.

But the main idea I take from this conversation is how we moved from the impact of existing technologies to a discussion on how we use it. For me, this is the right way to discuss technology: not to be afraid of how MT or any other technology determines our work methods or even the definition of our tasks, but in the type of research on technology that we need. There is still a lot of research to be done on how each one of us writes, edits, searches, trusts his tool to search for him, or prefers to choose himself, how regular our methods are, how we deal with more productivity and more tiredness, or how all these factors change according to project, motivation, or even mood. It was great to see how you and I share the excitement to think in terms of the future, and to try to imagine how current and new generations of translators will use smart tools that adapt to them.

 

ADVERTISEMENT

Speed up Your Translation Processes with Across v7

The new version of the Across Language Server and the Across Translator Edition is now available! We have addressed numerous subject areas in order to improve the user-friendliness, to reduce flow times, and to enable new working styles. 

Get your Across Translator Edition v7 

 

The Last Word on the Tool Box Journal

If you would like to promote this journal by placing a link on your website, I will in turn mention your website in a future edition of the Tool Box Journal. Just paste the code you find here into the HTML code of your webpage, and the little icon that is displayed on that page with a link to my website will be displayed.

This subscriber uploaded the icon last month:

isabelsanllehi.wordpress.com 

If you are subscribed to this journal with more than one email address, it would be great if you could unsubscribe redundant addresses through the links Constant Contact offers below.

Should you be interested in reprinting one of the articles in this journal for promotional purposes, please contact me for information about pricing.

© 2019 International Writers' Group  

 


Home || Subscribe to the Tool Box Journal