Discover the New SDL Trados Studio 2017
Whether you're a beginner or an advanced
user of CAT tools, join a webinar
in December to learn about the leading software for translators -- SDL
Trados Studio 2017.
Spotlight on upLIFT
Neural Jump (Premium Content)
Surely there's not a single one among us who
hasn't heard about the changes
in how Google Translate works for a number of languages (EN <>
FR, DE, ES, PT, ZH, JA, KO, TR -- and many more will certainly soon be
announced). In these language combinations, the previous phrase-based
statistical model has been replaced by a neural machine translation system
that potentially offers better translations.
It might be surprising to hear, though, that
while this system is in full force at the web-based translate.google.com,
the API that many translation environment tools use to access Google
Translate uses the old system.
. . . you can find the rest of this article
in the Premium edition. If you'd like to read more, an annual subscription
to the Premium edition costs just $25 atwww.internationalwriters.com/toolkit. This will also give you access to the
archives the of Tool Box Journal going back all the way to 2007.
memoQ translator pro for only 6 euros/7 bucks?! -- Crazy Group Buy '16!
Crazy Year - a Crazy Raffle: get an exceptionally great price on memoQ
+ participate in the raffle - every fifth
2. Morphing into the Promised Land
of you know that I've been very interested in morphology. No, let me put
that differently: I've been very frustrated that the translation
environment tools we use don't offer morphology. There are some exceptions
-- such as SmartCat, Star Transit, Across, and OmegaT
-- that offer some morphology support. But all of them are limited to a
small number of languages, and any effort to expand these would require
painful and manual coding.
tools, such as memoQ, have decided that they're better off with
fuzzy recognition than specific morphological language rules, but that
clearly is not the best possible answer either.
what is the problem? And what is morphology in translation environment
tools about in the first place?
wouldn't it be nice to have all inflected forms of any given word in your
source text be automatically associated with the uninflected form that is
located in your termbase or glossary and have that displayed in your
terminology search results? And does it feel a little silly to even have to
ask that question at a point when it should be a no-brainer to have any
given tool provide that service? In case you wondered: The answer to both
questions is "Yes, yes, resoundingly yes!"
the other hand, there is a reason why we're stuck where we are. It happens
to be cost. If you really have to manually enter morphology rules for all
languages, it quickly becomes a Sisyphean exercise
(starting with: "What exactly are all languages?"). If you
do it just for the "important" (which in the eyes of the
technology vendors means "profitable") languages, you end up with
the situation we already have with the tools mentioned above.
few years ago, a group of folks including myself had the idea to
crowdsource the collection of morphology rules for and with each
language-specific group of translators. Once the rules were collected, they
could then be integrated into the various technologies. It sounded good,
but it was hard to get the project started due to a lack of funds to build
the necessary infrastructure and/or the time it would have taken to raise
funds, among other issues.
translation environment tool Lilt with a very cool proposal that
may very well be the solution. Lilt's latest version introduces a
"neural morphology" engine for all presently supported languages
minus Chinese (so: EN, DA, NL, FR, DE, IT, NO, PO, PT, RU, ES, SV).
is the honest truth, though: When I first read the press release a couple of
weeks ago, I fondly rolled my eyes and thought to myself that the folks
from Lilt were just thinking it was wise to throw a little
"neural" around while it's hot.
turns out I was mistaken, however, as I found out when I talked with Lilt's
John DeNero, who is the architect of this part of Lilt's system.
John tried to explain to me what the system does and why it can make a big
difference. It was not so hard to understand the second part, but my feeble
untechnical mind had a hard time with the first part.
the way, we always assume that it's us, the less-technically-inclined, who
are to be pitied when we don't understand technology. But can you imagine
how pitiful life is for the more-technically-inclined who have to speak
baby talk when communicating to us?)
This article rovides a good summary of
the system, which essentially analyzes large monolingual corpora, detects
morphological modifications (in theory, they could be any kind of
modification; in practice, Lilt focusses on suffixes right now), and
classifies them. Since any word is evaluated and also classified within
a context, the system is able to distinguish between the adverbial
ending -ly in English when it encounters "gladly" vs.
"only." Using the same contextual analysis, the system is also
able to make very educated guesses about the morphological transformation
of unknown words. (For instance, it might never have encountered
"loquacious," but chances are it would assume -- correctly --
that the adverbial transformation would be "loquaciously").
works with every language (that uses morphology -- therefore excluding
Chinese, for instance), provided there is enough corpus material to train
the system. The time it takes for a new language to be trained is about 2.5
days (on very powerful computers). That's it.
it's not perfect (whatever is??). John was very open in his assessment
about where the system fails. It tends to fail with irregular morphology
(it might not recognize "geese" as the plural of
"goose" or "well" as the adverbial form of
"good"), and there are about 5% of all cases where John felt that
the engine should have made a correct judgment and it did not.
the other hand, terminology hits have increased by a third for its users
since Lilt introduced the system two weeks ago.
consider this a quantum leap -- in particular because it will not only
benefit the large European and Asian languages (where applicable) but the
long tail-end of other languages as well. Well, you might say, Lilt
covers only a handful of languages, so doesn't that end up being the same
thing? The answer to that is (a two-fold) no. First of all, you can expect Lilt
to continue to add languages, and -- even more importantly -- the module
used to build these neural morphology engines is open-source and available
for every translation technology developer right here.
is what John said about the available engine and its usability:
"Here's our open-source release of the morphology
system. It's released as an academic project and does not have any formal
support, so it's not a product. If someone wanted to use it, they'd have to
figure it out on their own (though of course I'm happy to answer
get on it, Kilgray and SDL and Atril and Wordfast and, and, and . . . .
also very promising that there are other areas where morphological
knowledge can be used by a translation system: How about actively changing
the inflection of a term that is automatically inserted based on its usage
in the source? Or how about changing that inflection when repairing fuzzy
matches? Or when repairing machine translation suggestions?
sky's the limit with this. Be creative!
The Words You Want. Anywhere, Anytime
WordFinder open a new world of opportunities -- get access to millions of words and translations
from the best dictionaries, on your computer, via a web browser, on your
smartphone or tablet. Stuffed with lots of smart features. WordFinder
has what you need as a translator in your everyday work -- anywhere,
Read more at www.wordfinder.com.
3. The Tech-Savvy Interpreter: The Rise of
Interpreting Management Systems and Why You Should Care (Column by
Barry Slaughter Olsen)
to anyone responsible for staffing interpreting assignments and you'll
discover quickly just how time consuming and inefficient the task can be.
It is complex, with a lot of moving parts to coordinate. Language
combinations, expertise, time, location, duration, subject matter,
turnaround time, certifications, compliance, client preference,
availability, type of assignment, interpreting equipment, and the
list goes on...
fact, market research conducted in one country recently revealed that
agencies spend an average of 40 minutes to staff just one interpreted
encounter, and that doesn't include all of the administrative work that
comes after the interpreting assignment is complete to get an interpreter
paid! Factor in that most of the growth in interpreting is coming in areas
where interpreting assignments often last two hours or less and you can
begin to understand why increasing the efficiency of all the administrative
aspects around an interpreting assignment is so important.
the demand for interpreting grows and the types of interpreted encounters
continue to diversify, the process of matching interpreters to clients must
become more efficient in order to meet demand while reducing administrative
interpreting management systems or IMSes. The best core definition I have
found for an IMS is from Hélène Pielmeier at Common Sense Advisory, who
defines them as "applications designed to schedule and manage
interpreting assignments, whether on site or remote." This definition
gets at the heart of what an IMS does, but as you will see from the list
provided in this month's column, many IMSes go far beyond the core
definition to include delivery platforms, community building, and referral
programs, to name just a few innovations. There will surely be more
innovation to come as this space continues to evolve.
clear trends emerging in this IMS space are increased efficiency and
efficiency is the new black. Lead
times for staffing interpreted encounters are getting shorter and shorter.
This means clients need to have open, fluid channels of communication with
interpreters, and response times are critical. Expect to see more systems
use instant messaging to communicate with interpreters. Interpreters should
think carefully about how they are willing to interact, what communication
technologies they are willing to monitor regularly, and how responsive they
will be. Responsiveness, even when rejecting jobs, is becoming a key metric
for project managers when deciding which interpreters to work with. In many
cases, getting back to them tomorrow won't be good enough anymore.
roads lead to convergence. While each of the various IMSes listed
below is focused on a specific niche of the interpreting market (e.g.,
medical interpreting, conference interpreting, business interpreting,
etc.), they all seek to integrate the various aspects of the staffing
process into a single workflow. Some, like BoostLingo and TikkTalk, have
actually built in their own video remote interpreting platform as well.
They aim to be one-stop shops. Other platform innovations include GPS
tracking to offer assignments to the interpreter closest to the job, smart
matching using artificial intelligence to assign work based on interpreter
availability and credentials, features to confirm interpreter check-in at
assignments, client and interpreter evaluation, billing, invoicing, payment
processing, compliance, report generation, and more. The competitive edge
will go to agencies and interpreters that are able to adapt to and thrive
in this new environment.
following is a list of different interpreting management systems currently
on the market. I have loosely organized them into three categories. I
encourage you to check them out. The interpreter matchmaking sites and
one-stop shops are potential sources of work for freelance interpreters.
The last category, IMSes for interpreting service providers, contains
software platforms designed for ISPs or other entities that have large
interpreting programs to coordinate, such as hospitals and courts.
for Interpreting Service Providers
list is not exhaustive but does give a sense of the growing interest in
making the interpreting workflow more efficient, which will only improve
access to this important service. Many interpreting agencies and large
institutions are developing or have already developed custom IMSes for
their own operations.
Word of Caution to Developers
is a definite trend toward consolidation and convergence, but I am
skeptical of platforms that are seeking to be all things to all clients.
Over the last ten years, I have seen several platforms take the
any-language-anytime-anywhere approach. None, to my knowledge, has been
successful. Most sputtered out under the weight of the promises made.
IMSes that are showing promise are those that seek to differentiate their
platform from the competition either by offering unique services that
specific market's need (e.g., end-to-end service for the medical
interpreting market) or carving out a niche for a certain type of
interpreted encounter (e.g., focusing on business interpreting gigs and
becoming adept at staffing last-minute requests).
can look to the more mature translation management system (TMS) market,
which has many different product offerings today, to see that there is
ample room for competition and differentiation.
Promise of Disintermediation for Interpreters
interpreters interested in developing more direct relationships with
clients will find the interpreter matchmaking sites of greatest interest.
These sites offer the convenience of an IMS (simple process for accepting
or rejecting assignments, no invoicing necessary, direct deposit of
payment, etc.) but allow the interpreter to negotiate fees directly with
the end client and provide full transparency regarding rates and fees.
These sites have built their business models around alleviating the
administrative burden for the interpreter while still providing access to
the end client.
you have a question about a specific technology? Or would you like to learn
more about a specific interpreting platform, interpreter console, or
supporting technology? Send us an email at email@example.com.
Talk Business Anywhere with Cadence
effortlessly earns you money. Average wage is $120/hr. Join our matchmaking
platform that uses technology plus the human touch to connect and prepare
you for the interpreting jobs you want. Over $1M paid out in the last 18
months. Sign up for free at www.talkbusinessanywhere.com
4. TAUS . . . (Premium Content)
. . . or the "Translation Automation and User
Society," may not have the best reputation among some of our
colleagues. While I have had a few disagreements with some TAUS members'
viewpoints, I have appreciated many of TAUS' endeavors. In fact, I was
present at its first meeting (in Taos, New Mexico, in 2007) where much of
the initial focus was set out. (That was also the meeting where I locked
myself out of my room while completely naked in the hot tub on my terrace
and had to proudly walk in all my very natural glory across the huge
terrain of the conference resort to ask the receptionist to please, PLEASE!
give me a key. I promised my wife I would never tell that story publicly.
TAUS has gone through a number of
. . . you can find the rest of this article
in the Premium edition. If you'd like to read more, an annual subscription
to the Premium edition costs just $25 a twww.internationalwriters.com/toolkit.
Translator Premium Edition: The Translation Software for Freelancers
Boost your productivity with an efficient,
comprehensive work environment. Become a premium user and integrate your
personal translation memory and terminology in the Offline Client.
For prices and a list of all premium
features, visit www.crossmarket.net
Jean-François Richard from Quebecois
translation technology developer and vendor Terminotix spent some time
earlier this week on the phone with me to give me an idea about the newest
version of AlignFactory,
due to be released in the first quarter of 2017.
I wrote the following in my Tool Box ebook about
the old version of AlignFactory:
Terminotix's AlignFactory offers an uncommonly high
accuracy of alignment [=converting independent source and target files into
a translation memory or corpus] because a) it uses a highly sophisticated
alignment engine and b) it uses a number of filters that filter out any
unlikely match (for instance, based on differing lengths of segments).
Furthermore, with AlignFactory you can also select
thousands of file pairs (including PDF files), have them matched up (they
have to follow certain naming conventions such as a language identifier),
and then have them aligned in one big swoosh. And it really is one big
swoosh: the speed of the alignment is mind-boggling. In fact, it's so fast
that I have repeatedly thought that something had gone wrong only to find
that it had already successfully completed the alignment. While it's not
perfect, it certainly has brought alignment to a different level.
AlignFactory really is a great tool, and I love doing demos of it
during workshops -- the speed and accuracy never fail to impress an
In the upcoming version, Jean-François and
his team have integrated an interesting feature: a web crawler. A web
crawler is a tool that can download complete websites onto your hard drive.
Originally developed when it was expensive to spend a long time online, it
allowed you to mirror websites on your computer instead and browse them
without having to be online. While this isn't generally a need any more for
most general users, translators have been real beneficiaries of that legacy
technology. Tools like HTTrack, Teleport, and Quadsucker
are all helpful tools for downloading complete translated websites or (more
likely) just certain file types that are helpful for alignment purposes.
So it was just a logical next step for
Terminotix to build this right onto the tool. Since they didn't feel that
any of the existing solutions really matched their needs, they developed it
from scratch. In fact, they'd already done that awhile back to help
existing clients download their own websites and build a corpus that they
could then query with Terminotix's LogiTerm (see edition 220 of the Tool
Box Journal for a review of LogiTerm).
With the new web crawler, AlignFactory
now downloads the relevant files from translated websites (excluding image,
video, or audio files or files that contain only coding), automatically
matches them up according to defined language pairs, and aligns them to be
output as TMX (translation memory exchange) files or corpus files that can
be used with Terminotix's other tools.
If the website is clearly structured (such
as all French files in a directory called FR and identical HTML file
names), AlignFactory will just use those markers to match the file
pairs. If it's not quite that obvious, it will use a "heuristic"
method where it looks for "fingerprints" within each file to
learn its language and then associate it with the corresponding language.
The regular version of AlignFactory
is not exactly cheap at CAN$1500, and the price will actually go up to
CAN$2000 once the new version is released. The "Light"
version is a lot cheaper but will not contain the new crawler tool, but . .
. since the Terminotix team is eager to find out how its newly featured
tool works in all kinds of situations, it'll give you full support
alongside the free 45-day trial version. This should allow you to get to
the data you've long pined for and make it useful. Just send an email to firstname.lastname@example.org
and they'll give you access. (And you'll finally have something to do over
those boring Christmas holidays...).
MateCat machines and humans are stronger working together
Take care of the creative and highly
specialized parts of translation.
MateCat uses Machine Learning to
handle repetitive tasks automatically.
There are a couple of important follow-ups
from my article about the new version of SDL Trados
One is a correction. In the last Tool Box
Journal I said this:
Of course, one thing you'll need to take into account when
thinking about using this feature is that it requires an additional paid
subscription to the SDL Language Cloud -- much like what Google
Translate also requires with its cost to use its API (which in turns
makes it possible to use that service within a translation environment
The link is correct, but I apparently hadn't
studied that web page adequately. Four hundred thousand characters per month
are indeed free with SDL's new machine translation offering. From a single
freelance translator's perspective, that essentially means you can use the
solution for free.
My bad that I portrayed that differently.
The other thing I didn't mention had to do
with the much-praised upLIFT solution. I had not realized (and hadn't even
considered looking for) a feature reader Amy Bryant informed me of: unlike
competing tools, SDL takes the fragment recall into consideration for
project analysis -- which can then potentially be used in pricing projects
with a new kind of fuzzy match rate.
I asked upLIFT's architect Kevin Flannagan
about it, and this is some of what he said:
"I just checked which way this went, and yes, in Studio
2017, leverage reports will include upLIFT statistics ... but it's
important to be clear that they're totaled separately from existing
statistics, and (as I understand it) SDL is not advising their use for any
kind of discounting, and the LSP arm of SDL has been briefed not to use
them in that way. You might think (as it struck me) that we should
therefore leave those numbers off the reports. The reasoning for having
them that interested me most goes along these lines: 'If we don't quantify
fragment recall for translators -- notwithstanding that recalled fragments
(like recalled fuzzy segment matches) won't always be useful -- how will a
translator judge whether she/he will be able to complete the project in a
day less, and therefore be able to take on more work and earn more?'
Bearing in mind that a Studio user can set fragment minima, e.g.
minimum fragment length of 6 words, this kind of reporting does help
identify the not-that-uncommon cases where good segment matches are few,
but fragment recall really can make a big productivity difference (though
we might need better reporting for that, e.g. identifying where a 20-word
segment had no segment-level match but was entirely covered by two 10-word
"Does that mean that fragment recall statistics should
never, and will never, be used for any kind of discounting? That's harder
to say. On the one hand, (...) there are reasons to think technology is
moving us towards an hourly-rate charging system anyway, in which case the
question may become irrelevant."
I sort of get what he's saying. And who am I
to say that a tool should not display all available data that could have an
impact on the productivity of said tool? What I find notable is that other
tool vendors that use fragment recall (AKA subsegment matching) could have
done something similar but chose not to.
But then, it's up to the individual
translator to not accept a rate change that takes subsegment use into
consideration. And maybe (make that: hopefully) we'll already have switched
to a time-based model at the point where it could become standard to use
Oh, and there is another erratum from the
last Tool Box Journal. I implied that memoQ's Muses (the
databases that contain subsegments) are being dynamically updated. As
reader Anthony Green pointed out, this is simply and sadly not true.
Last Word on the Tool Box Journal
If you would like to promote this journal by
placing a link on your website, I will in turn mention your website in a
future edition of the Tool Box Journal. Just paste the code you find here into
the HTML code of your webpage, and the little icon that is displayed on
that page with a link to my website will be displayed.
If you are subscribed to this journal with more than one email address, it would be great if
you could unsubscribe redundant addresses through the links Constant
Contact offers below.
Should you be interested in reprinting one
of the articles in this journal for promotional purposes, please contact me for
information about pricing.
© 2016 International Writers'