Translators Are Not Equal
In the last two editions of the Tool Box Journal I
mentioned OCR (optical character recognition) systems for languages that
support writing systems used in India. For a long time I've been very aware
of my privileged situation as an English-into-German translator in regard
to translation technology (more on that later), and in that spirit I felt
those mentions were relevant.
Following those comments I received a note from
Subhashri D V, a translator in Bangalore, India, who works in Indian
languages. She said this:
"Your brief remarks in the last two journals about Indian
language tools, OCR, etc., spurred me to write to you about the
difficulties in Indian language translations thanks to the chronic lack of
"Of course, Google is playing savior by incorporating
most Indian languages in its latest tools and initiatives. But, as compared
to the variety of tools, databases, etc., available/supported for many
other languages, our scene is dismal. The Indian government/private
companies are naturally also to be blamed for this situation, for not
vesting enough interest in developing resources for our languages. I
believe the government does have a head start in this matter,
and in spite of the recent efforts to propagate these
multilingual computing resources, they are not readily accessible or
good enough to be used by professionals.
"That aside, I would also like to highlight the glaring
inadequacy of typing tools for Indian languages. The most commonly used
tools for typing in Indian languages right now are the Indic language
transliteration tools (MS, Google, etc.). Inscript keyboards/layouts
are available but rarely used, as most people start off with an English
keyboard and it is difficult to 'learn' a new one for 'each' language (most
Indians are multilingual). As you may already know, there is a world of
difference in the way in which English and Indian language characters
(Devanagari and other South Indian languages) are written. As a simple
example, there are no half letters in
English, while they abound in Indic scripts. The transliteration method is
ineffective for professionals because it reduces typing speed greatly and
leads to a number of errors. Typically Indian words are long, and
transliterating them into English requires many double aa, oo, uu, ii (or a
mix of capital and small letters if written phonetically), and then one
still has to 'select' the correct word from the many options displayed by
the software. This is a cumbersome process, even though most of us are now
used to it for want of another option.
"On the other hand, I think Indic typing tools for
mobiles have seen greater
development, but sadly almost none have been adapted to the PC/laptop.
For a professional translator/writer, it is next to impossible to work on a
"Of all the existing options so far, the handwriting mode
Google Translate) looks to be the best, as this is the fastest and most
efficient way of writing Indian languages. But again, it is not really
useful to translators unless it becomes a common tool (touch-screen laptops
"I'm also a fan of the speech-to-text option
offered by Google (supported for most Indic languages), although it is
still a bit shaky and cannot be used in any other tools except Google
"Coming to Indic OCR tools, they are still in a very
nascent stage and don't work most of the time, but I believe this could be
a cost-effective and easier option for translators (to handwrite a
translation and OCR it, because at the current speed of transliterating,
writing seems faster).
"[And] I forgot to add that there are not even decent
spelling/grammar checkers for Indian languages, including Hindi. Given that
the Indian market is now increasingly calling for localization, especially
in e-commerce and content, it makes sense to build/integrate suitable tools
That last sentiment is likely something we all could agree
Almost exactly 10 years ago, I
suggested a stacked fee schedule determined by the degree to which
technology supports any given language. It was supposed to look like this:
"There would be three different levels of languages.
Level 1 languages would include languages with full support in areas like
voice recognition, optical character recognition, seamless support by
translation environment tools, support by major online dictionaries and/or
other language resources, and spell- and grammar-checkers. Level 2
languages would include those that are missing one or two of the tools
listed above, and Level 3 languages would be those that lack more than two
of those same enablers.
"The fee scale would be calculated like this: If you
translate between two different Level 1 languages, you would earn 10% less
per word than if you had a Level 2 language involved. With two Level 2
languages you would make yet another 10% per word, and this would be the
same as what a translator between a Level 1 and a Level 3 language would
make.... You get the point.
"Can you imagine the rejoicing among our colleagues who
translate Level 3 languages like Amharic, Inuktitut, Haitian Creole, or
many other technically less-supported languages? Even translators of Level
2 languages like Arabic, Urdu, or Hebrew probably would not mind such a
system. Only folks like me, who translate between two Level 1
languages-English and German-would be left in the dust."
It was not meant as a completely serious suggestion -- after
all, it's not how a market economy works -- but it gives the many of us who
do work in "Level 1" languages a greater appreciation of how
blessed (in the sense of "receiving something undeserved") we
are, and how much sense it would make to -- if possible -- aid translators
in those other language combinations to be more productive.
So what can be done?
Some of the still relevant suggestions that I had back then
universities or other non-commercial entities that may have developed
solutions but not released them to the general public.
- Contact independent
developers of existing tools and ask them what it would take to add
support for your language.
- Find out
what kind of grants might be available to support private development
Other items that I would add today would be to create a place
to collect (and verify) some of the needs that Subhashri mentions above.
This would have a number benefits: a) In some cases there actually might be
solutions (or workarounds) that individuals are not aware of. b) It would
alert developers of all kinds to needs they otherwise wouldn't even be
aware of. And c) it would create a community that could pool its resources
to create more muscle and options than its individual members possess.
I have created a "topic"
in the Language
Technology Wiki that could be a starting point. So come all ye who
feel (justifiably) disadvantaged and let the world and each other know what
Across Quick Tutorials
Are you new to the Across Translator Edition? If so,
take a look at our new YouTube channel. The channel features various tips
and tricks to help you get started.
Go to across.net/youtube.
2. Augmented Translation
Sense Advisory (CSA) coined the term "augmented translation"
some time back, and while I always felt that I understood what it meant, I
never really looked into the specifics. Now I have and I think it's an
interesting term that we're all better off to be acquainted with (whether
we like the term or not).
Merriam-Webster defines "augmented" as "made greater, larger, or more
complete" and "augmented reality" (the context where most of
us have encountered "augmented"
recently, not to be confused with "virtual reality") as "an
enhanced version of reality created by the use of technology to overlay
digital information on an image of something being viewed through a
Though it may sound a little spacey and very unlike "crotchety" St. Jerome's style of translation (Happy International
Translation Day, everyone!!!), "augmented translation" might
still be a pretty good fit. After all, we do use a lot of digital
information to aid us in our translation, and we've done so for a long
time. The difference between now and, say, five years ago when the term
wasn't in use is that there is more digital data in additional formats
available to us now than there was previously.
After a number of emails back and forth between CSA and myself
(regarding when "augmentation" starts and how valid this concept
is across language combinations), they were kind enough to send me this
"CSA Research contends that augmented translation isn't
completely new. Translators have long used web searches, external
terminology databases, product information repositories and other such
items, but working with these required stepping out of the translation
environment to carry out some action, thus disrupting their cadence. The
difference with augmented translation is the degree of integration with
external informational resources, such as term discovery, outside MT, and
semantic linking within the tool, which prevents the need to leave and hunt
"The part the market research firm emphasizes is that
this does put the translator back in the center. In it, MT and all the
other parts are suggestions designed to add additional, contextually
relevant information, but the translator is in control. CSA Research chose
the term 'augmented translation' on the model of industrial 'augmented
reality' tools that allow you to take a tablet or phone or goggles and view
a device or assembly and have overlays of things like sensory data, part
names, part numbers, disassembly or assembly instructions, or diagnostic
steps you should take."
Spacey? A bit. But I like it. Quite a bit, actually. I love
the fact that the
astronaut translator is in the center and uses
just the tools that help them drive the process. This is also well
exemplified by this image (also generously provided by CSA):
We have been using terminology management, project
management (whether on our own or through the client), and translation
memory for a long time. "Adaptive Neural MT" is also a reality
for many of us, and (hurray!) it's not driving the process; instead, it's
just one more valuable tool for the translator -- not primarily through
post-editing but as a resource that can be harvested for fragments of
various sizes. (I probably would have said "Adaptive Neural MTs"
just to emphasize that there could be more than one engine at a time -- but
then you'd have to add a plural ending to TMs and termbases as well.)
Automated Content Enrichment (ACE) is defined
as "a new technology that scans content to identify the concepts,
dates, places, and other information in it, then link them to online
resources" -- i.e., manual web research in an automated fashion. Not
something that many of us deal with a lot at this point, but I can
certainly understand and appreciate that this is gaining importance.
And here's a good example of where I'd want to make a point if
I were introducing someone to the concept of augmented translation. All translators
do it to some degree, but no one does it as much as is possible,
particularly because "possible" is an ever-moving target. In
reality, there is such a wide variety of approaches to translation and
translation technology (whether by our choosing or because some
technologies just might not be available or accessible in our language
combination or geographical location) that we all have to put on the
spacesuit that fits us. And sometimes it might be only a boot or two.
The SDL Trados Roadshow is
back - Book your free seat
The SDL Trados Roadshow returns this October, visiting 17
cities across Europe and North America. Come and meet the SDL Trados team,
network with translation professionals from across the whole supply chain
and hear about the latest industry news and trends.
your free seat today >>
3. The Tech-Savvy
Interpreter: What Kind of Interpreting Work Is There in the Cloud? (Column
by Barry Slaughter Olsen)
This is my first column after a much-needed summer break.
While I was on vacation, innovation in the interpreting technology space
continued to charge ahead. To be sure, there are still plenty of software
programs to explore and new interpreting delivery platforms to test. But
one question has been in the back of my mind for some time now: just how
much remote interpreting work is there in the cloud?
It's a valid question that interpreters are rightfully
interested in. Unfortunately, it's not a question that I can answer because
the data hasn't been collected. But in early September, I reached out to
seven different cloud-based remote interpreting companies of the 15 or so I
am tracking and invited them to provide me with the number of interpreted
events they held on their platforms from January 2016 to August 31, 2018.
Four of the seven companies provided the data I asked for. So,
what follows is just a snapshot of some of the cloud-based remote
interpreting taking place today and should not be considered a
comprehensive survey. I have plotted the data out on the graph below. Each
line represents a type of interpreted event and not the number of
interpreted events hosted on a particular platform. In other words, as a
general rule, platforms can host more than one kind of interpreted event.
What constitutes an interpreted event? I divided them into
four different categories:
- Over-the-phone consecutive (OPI Consec)
- Over-the-phone simultaneous (OPI Simul)
- Webinars with remote simultaneous (Webinars)
- Web meetings with remote simultaneous (Web Conferences)
Over-the-phone consecutive here refers exclusively to what I
call "high-value consecutive" or consecutive interpreting that
usually pays between US$100.00 to $US200.00 per hour and requires training
in long consecutive note taking skills. Clients are usually from the
international finance sector or government entities. These interpreted
events usually last between 30 and 90 minutes.
Of the four groups, OPI Consec has grown the most from 2016 to
August 31, 2018. This is a market segment that many interpreters are
unaware of and that requires significant domain knowledge and consecutive
Among these four companies, over the three-year period, there
were 2,819 OPI Consec events with 1,038 in 2017 and 1,664 through August
31, 2018, which shows a 2017-2018 year-over-year growth rate through August
31 of 164%. With four months remaining in 2018, and the direction of the
trend line, the final growth rate of this group will likely be even
larger. The significantly higher growth of this high-value
consecutive interpreting vis-à-vis all other types of simultaneous may be
indicative of customers not realizing that remote simultaneous could
greatly improve their meeting flow.
Over-the-phone simultaneous here refers to simultaneous
interpretation for bilingual or multilingual conference calls or audio
conferences. In these interpreted events participants are usually on a
multi-channel audio bridge that makes simultaneous interpretation possible,
although in some cases, these conference calls are conducted using more
than one telephone line (and interpreters juggling two different phones as
they interpret). The length of these events usually varies from 30 minutes
to two hours but can go longer in some cases.
The number of OPI Simul events over the period measured
(January 1, 2017 - August 31, 2018) has remained constant: 376 interpreted
events in 2017 and 232 so far in 2018. No data was submitted for 2016. The
trend line for this group is basically flat but may show a slight
year-over-year increase by the end of 2018.
In both OPI Consec and OPI Simul, neither the participants nor
the interpreters can see each other. With no visual input and the limited
frequency response of some phones, these two modes are arguably the most
difficult and taxing on the interpreter. However, the term OPI is, in some
cases, a misnomer, as this kind of interpreting is delivered with
increasing frequency using voice over Internet
protocol (VoIP) technology that does not entail
the use of traditional telephony and has an expanded frequency response
range, which makes for higher fidelity audio.
Webinars, or seminars conducted online, are typically
presentations with a talking head in one corner of the screen and
presentation slides taking up most of the remaining screen space. They
usually include one speaker talking to many online audience members.
Webinar platforms usually include a chat function for questions and answers
and other kinds of interaction. They have become a huge part of
training, sales and public relations in both the public and private
sectors. Given the potential international audience for webinars, providing
remote simultaneous interpretation for these web events seems like a no
Oddly enough, this is the only line on the graph that appears
to be trending downward over the period covered for these four companies
surveyed. In 2016, there were 200 webinars with simultaneous
interpretation; in 2017, 132; and through August 31, 2018, only 109.
It will be interesting to see if the demand for multilingual webinars grows
as the technology evolves and more potential users learn about the service
or they simply prove to be a short-lived novelty. There are a couple of
reasons why this market segment may be flat. First, there are still too few
players in the space, and second, remote simultaneous platforms cannot
integrate easily with existing monolingual webinar platforms that dominate
the market, and the main webinar organizers are reluctant to switch to a
new platform just for the multilingual capability it may offer.
I define a web meeting as one where all participants
(including the interpreters) are connected through a web conferencing
platform, as opposed to a meeting where most participants and the
interpreters are in the same physical space and a remote participant may be
connected via some web conferencing service. Web meetings truly take place
in the cloud with participants and interpreters distributed
geographically. The increase in web conferencing services has grown
dramatically in recent years as these services have migrated to the cloud
and no longer require expensive proprietary video and audio equipment.
Among the surveyed companies, Multilingual Web Meetings are
showing a clear upward trend over the three-year period. There were 44 in
2016, 135 in 2017, and 215 through August 31, 2018. The year-over-year
growth rate between 2017 and 2018, as of August 31, was a notable 140%.
So, my initial question of just how much remote interpreting
work is there in the cloud remains unanswered, but I do have a partial
It's important to keep in mind that the data in this graph
represent the operations of just four startup companies. The law
of small numbers is definitely applicable. Even so, these four types of
remote interpreting in the cloud are a growing segment of the market. I
would like to see a more exhaustive study covering a larger number of
platform providers in the future.
The total number of interpreted events over the period studied
was 4,415, which comes out to an average of 138 events per month. Again,
not huge in the grand scheme of things, but significant, especially if you
are one of the interpreters hired to do the work. It's also worth noting
that none of these interpreted interactions are replacing face-to-face
interpreting assignments in conference, court or medical interpreting but
they are increasing the overall volume of interpreting work.
For context, the global web conferencing market is forecast to
grow at a compound
annual growth rate (CAGR) of 10% between 2017 and
2024, and reach US$8.82 billion, according to one
study. Web conferencing is big and will only get bigger. I don't
have a dollar figure I can assign to the interpreted events on cloud-based
platforms included in this article, but it's safe to say they are a very
small drop in an enormous and growing bucket. If interpreting wants to be a
part of the new communication paradigm, we need these multilingual
platforms that will allow us to work professionally in the
Do you have a question about a specific technology? Or would
you like to learn more about a specific interpreting platform, interpreter
console or supporting technology? Send us an email at firstname.lastname@example.org.
Translation Quality Estimation:
Latest AI-powered Feature from Memsource
Machine Translation Quality Estimation (MTQE) provides a
quality score for machine translated segments BEFORE you start
post-editing. It works just like translation memory matches. When you get a
100% score, this means the quality is high and you typically don't have to
post-edit. Learn more about MTQE on our
Want to give it a try?
Sign up for a free trial of Memsource and set
up MTQE in your account.
A couple of weeks ago I talked with David Canek from Memsource (a company that
has experienced exceptional growth, especially last year, and now boasts 90
employees) about an AI (artificial intelligence) -driven feature that will
be released at the beginning of October (some of you might have also seen this teaser
that I put out on Twitter).
We've all been talking a lot about AI and translation, but
almost without fail only in regard to neural machine translation (NMT).
However, there are many more ways to use advanced cloud computing in the
process of translation, and Memsource is particularly interested in
exploring some of them. Eight months ago I reported about their first
non-NMT AI feature: recognition of language combination-specific
non-translatables. Now, the next AI feature will be MTQE (or "machine
translation quality estimation"). MTQE is a per-segment process that
uses language combination-specific data collected across users to estimate
the quality of an MT suggestion. It gives the MT-translated segment a
"fuzziness percentage," a process similar to its translation
memory matches. This works (after its unveiling in October) for 70 language
That's a lot of information, so let's look at it more closely.
First of all, while the system uses the scoring system that we
know from translation memory matches, the percentages describe something
very different. In the case of TM matches, the percentage number describes
the grade of similarity to a segment in the TM (100% being exactly the same
and anything less than that being gradually different). The MT match provided
by Memsource is a probability score calculated via a neural network that
looks at post-editing history and estimates the likelihood of its being
correct. Totally different thing. While a 100% TM match will always be
correct (not necessarily in the specific translation project, but as far as
its similarity to an existing translation), a 100% MT match has a high
probability of correctness according to Memsource statistics, which are
likely well informed but not guaranteed (and certainly not for your specific
And then there is the "collected across users." Memsource
is a cloud-based application. By signing its Terms
of Services agreement ("We will use Your data to train machine
learning models"), users agree to have their data analyzed for the
larger purpose of enabling services like the above-mentioned AI services.
While this setting can be switched off, the default is to have it
activated. It's an interesting concept that some will find hard to stomach
and others will welcome as a move toward modern cloud computing.
You can use this feature as part of a pretranslation batch
process or in real time, segment by segment. As I said above, this works on
a per-segment level and theoretically also with multiple machine
translation engines attached from which the best "match" is
picked (for an impressive list of engines supported by Memsource, see
here). However, the multi-engine feature will not be available in this
Unlike the first AI feature -- the non-translatables -- this
is a feature that requires extra payment. (David said that it will likely
be comparable to what Microsoft charges for its MT engine, which is $10 per
million characters.) This would have to be added on top of the fee that
might have to be paid to the MT engine provider.
What are the savings then? According to pilot programs, the
number of words that are 85% or above in the MT probability rating plus the
non-translatables range from approximately 5% (EN>JP -- a language
combination with very few non-translatables of course) to 10% or slightly
above in EN>ES and RU>EN. So this could prove to be a solid time
And for those who wrinkle their noses at including
"85%" in those numbers, an 85% MT "match" could very
well be "correct," unlike a TM match of the same percentage,
which is incorrect for sure. Of course, the question that still has to be
answered is what it takes for a translator to evaluate that. In the case of
the TM match, the difference is already marked and only has to be
"fixed." The MT suggestion, on the other hand, has to be
evaluated from scratch.
The whole MTQE feature, by the way, is also available through
Memsource's API, so it is possible to use it within completely different
and independent translation environment tools.
Localization powered by STAR Transit
Watch the short video for more information on Transit
functionality and usage: youtu.be/D7_pJCQ7N8s
5. This 'n' That
1. If you're going to the ATA conference in New Orleans and
have dictionaries (yes, those paper things) that you're no longer using,
please bring them and donate them to colleagues who would benefit from
them. There's going to be a table where you can offer them as gifts (and
where you can take some that you might want for yourself to fill those now
empty spaces in your luggage).
If you're not planning to come and have dictionaries that
you'd like to share, please contact me and I'll let you know how to send
them so I can take them to said table.
2. Please (!) follow @translationtalk on Twitter. It has been really, really
amazing so far, and it's just getting better with every week. Just in case
you still don't know what it is: It's a Twitter account for the whole
community of translators and interpreters where every week someone else has
a say. I've already learned a lot and so will you.
3. In the last Tool Box Journal I mentioned the new
edition of my Translator's Tool Box
ebook (version 13.5). It's a rich and exhaustive resource that might very
well become your favorite go-to resource when it comes to questions of how
to optimize your work with your computer. For the rest of this month (just
a couple more days!), I'm offering a package of the Translator's Tool
Box ebook (value $50), my new Translation
Matters book in PDF format (value $9.95), and a one-year
subscription to the Premium edition of the Tool Box Journal (value
$17) for $30. (Just enter "30" right here.)
Of course, if you've already purchased version 13 of my book,
just use the same download information and password to get this new version
for free. If you have any other earlier version of the book and would like
this new version, you can purchase it for the upgrade price of $25 right here as well.)
PDF Translation for
Translating PDFs is easier and quicker with TransPDF.
within Memsource and memoQ
with all CAT tools.
log-in for Proz members.
for your next PDF project.
All file formats, all languages, all target groups, better
quality, shorter time-to-market -- one standard solution.
The Last Word on the Tool
If you would like to promote this journal by placing a link on
your website, I will in turn mention your website in a future edition of
the Tool Box Journal. Just paste the code you find here into
the HTML code of your webpage, and the little icon that is displayed on
that page with a link to my website will be displayed.
If you are subscribed to this journal with
more than one email address, it would be great if you could unsubscribe
redundant addresses through the links Constant Contact offers below.
Here is a reader who has added the icon to their website:
Should you be interested in reprinting one of the articles in
this journal for promotional purposes, please
contact me for information about pricing.
© 2018 International Writers' Group