MateCat machines and humans are stronger working together
Take care of the creative and highly
specialized parts of translation.
MateCat uses Machine Learning to
handle repetitive tasks automatically.
1. Borderless Translating
No better time to write about the efforts of
Without Borders (TWB) than the present -- especially if the
"present" is the joyous season.
Sticking to the theme of this publication --
translation technology -- I will report on exactly that in relation to TWB,
but here are some numbers that may illustrate the far reach of TWB.
There are presently 3762 registered
translators in the TWB Workspace,
a platform for claiming projects (the platform was developed and donated by
ProZ, but if you are not a member of ProZ you can still use Workspace),
plus a much smaller group of rapid-response translators. TWB works in
approximately 190 language pairs and has translated 39.4 million words so
far (10 million this year). Some of the hot spots that TWB has been
involved in have included the European and Burundi refugee crises, the
earthquakes in Nepal and Haiti, the typhoon in the Philippines, or the Ebola
and Zika outbreaks.
I talked with Mirko Plitt, who joined TWB as
the head of technology last June, about what technology TWB is using and
why things in that area may have been moving a little more slowly than if
it were a commercial entity. (Some background to this: At the ATA
conference earlier this year, a well-known Canadian EN>FR translator and
TWB volunteer complained rather forcefully to me about what seemed to her
the backward use of technology at TWB. That conversation and my subsequent
contact with TWB finally resulted in my talk with Mirko.)
He described the work of TWB for years as an
NGO rushing from crisis to crisis, each with a different set of parameters
and requirements. This resulted in essentially no time to breathe and stop
and streamline technological efforts, even with basic tools that many of us
take for granted like a shared translation memory.
It wasn't that there were no solutions
available; in fact, many translation technology companies had offered free
access to their technology ("free" as in "no monetary
compensation" but not necessarily in "no PR benefits"), so
it wasn't finances that stood in the way. Instead, what was needed was the
hiring of a dedicated specialist -- Mirko -- who is not necessarily subject
to the ongoing operations but is specifically in charge of developing the
technological framework for translation and other language-related tasks.
This is where things stand at the moment.
Mirko has developed launched the TWB
Translation Server, which is a customized version of the same
system that MateCat is based on. This system was chosen because a)
it was open-source and therefore customizable without having to rely on the
technology vendor, and b) it was easy to use without the need of
specialized training (unlike some earlier experiences with donated
memoQ licenses and accompanying training in Kenya). The TWB
Translation Server in connection with the TWB Workspace now
enables partners (TWB lingo for NGO clients) to upload translatable
documents (in MS Word, PowerPoint, OpenOffice
and text formats -- for other formats a project manager has to be involved
at this point), have translators in the right language combination claim
and translate the docs while using the language resources of MyMemory
(a mixture of contributed TM resources, aligned materials, and machine
translated data) -- if those are available -- and store and share TM data
with other TWB translators in an otherwise private translation memory.
These are not TWB's only recent
technological achievements. It has, for instance, been involved in the
translation of Translation
Cards, an app by Google and Mercy Corps that consists of audio
snippets in various languages that can be used by aid workers. It recently
launched machine translation systems for Kurmanji (Northern Kurdish) and
Sorani (Central Kurdish), which are the main languages for Kurdish
refugees. I was particularly intrigued with the Kurdish MT solution.
Neither of those languages had existing workable offline solutions, so Prompsit
(which recently was mentioned in the Tool Box Journal in
connection with neural machine translation) took three weeks to prepare the
open-source rules-based machine translation system Apertium to prep it
for the training of Kurmanji and Sorani and then guided 10 Kurdish
translator for a week via Skype for the data and rules entry.
Can we expect great results from that
engine? Linguistically speaking, I would say likely not. But from a
humanitarian perspective? No doubt.
There are other aspects aside from the
humanitarian side that I found meaningful when talking to Mirko, including
this: As Mirko correctly noted, the technological gap between languages has
increased even more just this year. Neural machine translation has
propelled languages that are deemed as "more important" into a
territory that seems unreachable for the other 99+% of languages. It's
organizations like TWB who try to give some of those languages a
technological underpinning that they are unlikely to get elsewhere
If you are interested in contributing to
TWB, you can find information about becoming a volunteer
right here and about financial support right
crossMarket Premium: Boost Your Productivity
online network for all Across users brings together translation service
providers and buyers. Become a premium member and benefit from unlimited
possibilities to find and contact potential customers.
prices and a list of all premium features, visit www.crossmarket.net
2. Unlimited Finding
The Swedish dictionary/thesaurus tool
will be changing its business and service model come January 1. Rather than
selecting individual or specific group of dictionaries to search through
either via its web interface (WordFinder Online) or via Windows,
Mac, Android or iOS applications, you will now have
access to all dictionaries available for one price (9.99 euro/month).
The language combinations of those
dictionaries are between English and Danish, Finnish, French, German, Italian,
Norwegian, Polish, Portuguese, Russian, Spanish, and Swedish. The
corresponding resources differ significantly, with Swedish clearly being
particularly blessed with nearly 140 dictionaries going from Swedish into
another language. You can find a list of 199 dictionaries and thesauruses right
here, but you should be aware that this list is not complete. For
instance, I noticed that the massive Langenscheidt's Muret-Sanders English
<> German dictionary is missing in the list but is actually included
in the offering.
You can find a good video of the advanced
functionality of WordFinder's Windows app right
I'm impressed that WordFinder was
able to negotiate this deal with the most prestigious dictionary makers in
the world, and I hope they will be able to continue to grow their list of
languages and language combinations.
I also hope, and I have talked about this a
number of times with Ola Persson, WordFinder's CEO, that they will find a
way to bring the data even more closely into our translation environments
so that we don't even have to search anymore -- much like data from
termbases that is displayed automatically.
Translations: New Plug-in for InDesign CC 2017
STAR Group has expanded its plug-ins for DTP
applications, and now supports InDesign CC 2017 (Windows and Mac).
The plug-ins simplify the translation of InDesign
projects through a smooth data exchange with the Transit NXT
translation memory system.
information, please contact: email@example.com.
3. Measured Compensating -- A Reader's
Mats Linder, the author of the widely-used
and just updated Trados
manual, sent us some thoughts on new compensation models that I had
mentioned in a couple of Tool Box Journals. Here they are:
"In issue 255 of the Tool Box
Journal, Jost wrote about new compensation models, and in issue 267 he
continued the discussion [Premium subscribers can find these issues in the
archives]. The point he stresses is that with new technology extending the
useful source segment repositories from the usual TMs to more and more
advanced uses of machine translation (MT), plus -- as Jost added at the
Europe Forum -- better uses of TM fragments (or subsegments), we
will arrive at a situation where the so-called Trados rates grid is
no longer valid as a measuring mechanism for calculating a fair
"The solution to this, says Jost, is to
'completely move away from pricing by the word, line, or page and learn how
to quote by project and/or time, which, after all, is something that
virtually everyone in the professional world (outside of translation)
does.' And, 'I can't wait to throw off the shackles of word counts and
operate like a professional who can figure out how much to charge for a
project, just like my electrician or lawyer does.'
"While I'm all for this, and hope for
such a development, I'm not sure of the rationale for this, nor that it
will happen very soon.
"For one thing, how will clients, or
even translation agencies, be able to tell in advance (as is possible with
the fuzzy matches analyses) the effect of fragment matches for the
translation efforts? For some time now, both Déjà Vu and memoQ
have been giving statistics for 'internal repetitions'/'homogeneity,' that
is, possible usage of such matches not only in TMs but also within
documents (something which Trados Studio does not, for some reason).
But as far as I know, no clients or even translation agencies have tried to
base payment on such statistics. The fact that Trados Studio, as a
result of its job analysis, now gives the number of 'fragment matches' in
TMs is not likely to change that. Even more problematic would be to base
payment on the possible use of MT, however advantageous that may be for the
"Thus for the foreseeable future, I
believe only fuzzy matches statistics may be used as basis for payment in
the old style, i.e. 'per word.' Which means we shall be able to use the
other improvements without detrimental effects on our compensation; oh
"But of course, many of us refuse to
accept the 'Trados grid' as basis for payment even though they
charge per word. And furthermore, many clients who are not translation
agencies may very well still be unaware of both the fuzzy matches
mechanisms and the improvements and hence not require use of the 'Trados
grid' as payment structure.
"So I think the technical development
is not likely to necessitate new methods for measuring the work we do --
except for ourselves, when we try to estimate the work load before we
"But as I said, I still agree with Jost
that charging per hour of work, or per job, would be preferable. And I
believe the 'per job' alternative is by far the best method. Many clients
-- perhaps even translation agencies -- may find paying what we really make
per hour (my guess is that for a translator in, e.g., UK, Sweden, or the
US, USD100 per hour is far from unusual) too high a price, even if they
actually do pay that when they pay per word (i.e., per job). Also, charging
per hour without a top limit is probably not going to be popular, even if
we pay the plumber according to that principle.
"Charging per job 'hides' our income
per hour and, being a set price, is probably more palatable to all clients.
However, as we are able to use all these new methods, such as 'fragment
matching' and even better MT, we shall need to spend some more time
calculating the actual job effort needed. But that may also have the
advantage that we force ourselves to investigate in depth how best to
utilize the new methods; in particular, MT. (The alternative is of course
to continue to base our offers on a per-word charge.)
"And all this means that post-editing
of MT (PEMT), in the strict sense -- i.e., not 'post-editing' the
suggestions we get from MT while translating -- will be less and less
attractive, since none of the technological advances described above is
applicable there. Yet we hear that PEMT is the fastest-growing type of
translation, or so the Common Sense Advisory research tells us -- while at
the same time, there seem to be signs that PEMT may nonetheless be a
passing phenomenon... At least we live in interesting times."
Mats is right, we do live in interesting
times, and there are a lot of things we need to continue to talk about.
memoQ translator pro for only 6 euros/7 bucks?! -- Crazy Group Buy '16!
A Crazy Year - a Crazy Raffle: get an
exceptionally great price on memoQ
+ participate in the
raffle - every fifth buyer wins.
4. A Couple More Things . . .
Quite a few of you have written to me about
a broken link in the article about morphology in the last Tool Box
Journal. The correct link is https://www.aclweb.org/anthology/N/N15/N15-1186.pdf.
You might consider following me on Twitter or at least
checking into my Twitter feed occasionally to keep abreast of updates such
Also, I wasn't able to give you detailed
information in the last journal about API access pricing for the new Google
neural machine translation (which you will have to use if you want to use
it within a translation environment tool). The answer to that is that there
is no new pricing yet. If you've applied for and been given a new API key,
you'll have to pay the same as you have so far. Until January 31, that is.
After that it'll be more, though nothing specific has been announced yet.
And, yes, the API key is usable in any translation environment tool in
which you can enter one. I'm indebted to Samuel Murray and Cenk Yalavaç
for this information.
The small things that make a big
difference in SDL Trados Studio 2017
As well as transformative self-learning
machine translation and translation memory features, Studio 2017 is packed
full of enhancements that make everyday tasks easier.
- Drag and drop files
to start a project quickly
- Merge segments over
a hard return
- Customize filters to
- Easy file filter
You can see many more time saving features
in our online
The Last Word on the Tool Box Journal
If you would like to promote this journal by
placing a link on your website, I will in turn mention your website in a
future edition of the Tool Box Journal. Just paste the code you find here into
the HTML code of your webpage, and the little icon that is displayed on
that page with a link to my website will be displayed.
If you are subscribed to this journal with more than one email address, it would be great if
you could unsubscribe redundant addresses through the links Constant
Contact offers below.
Should you be interested in reprinting one
of the articles in this journal for promotional purposes, please contact me for
information about pricing.
© 2016 International Writers'