You can view earlier editions of the Tool Box Journal going all the way back the 2007
in the archives to which you have access if you support my work on the Journal.

Tool Box Logo

 A computer journal for translation professionals


Issue 18-9-292
(the two hundred ninety second edition)  

Contents

1. All Translators Are Not Equal

2. Augmented Translation

3. The Tech-Savvy Interpreter: What Kind of Interpreting Work Is There in the Cloud?

4. MTQE

5. This 'n' That

The Last Word on the Tool Box

Poem to the First Generation of People to Exist After the Death of the English Language (by Billy Collins)

I'm not going to put a lot of work into this
because you won't be able to read it anyway,
and I've got more important things to do
this morning, not the least of which
is to try to write a fairly decent poem
for the people who can still read English.

Who could have foreseen English finding
a place in the cemetery of dead languages?

I once imagined English placing flowers
at the tombstones of its parents, Latin and Anglo-Saxon,
but you people can actually visit its grave
on a Sunday afternoon if you still have days of the week.

I remember the story of the last speaker
of Dalmatian being tape-recorded in his hut
as he was dying under a horse-hair blanket.

But English? English seemed for so many of us
the only true way to describe the world
as if reality itself were English
and Adam and Eve spoke it in the garden
using words like snake, apple, and perdition.

Of course, there are other words for things
but what could be better than boat,
pool, swallow (both the noun and the verb),
statuette, tractor, squiggly, surf, and underbelly?

I'm sorry.
I've wasted too much time on this already.
You carry on however you do
without the help of English, communicating
with dots in the air or hologram hats or whatever.
You're just like all the ones who say
they can't understand poetry
but at least you poor creatures have an excuse.

So I'm going to turn the page
and not think about you and your impoverishment.
Instead, I'm going to write a poem about red poppies
waving by the side of the railroad tracks,
and you people will never even know what you're missing.

 

You can find this and other amazing poems by Billy Collins right here.

1. All Translators Are Not Equal

In the last two editions of the Tool Box Journal I mentioned OCR (optical character recognition) systems for languages that support writing systems used in India. For a long time I've been very aware of my privileged situation as an English-into-German translator in regard to translation technology (more on that later), and in that spirit I felt those mentions were relevant.

Following those comments I received a note from Subhashri D V, a translator in Bangalore, India, who works in Indian languages. She said this:

"Your brief remarks in the last two journals about Indian language tools, OCR, etc., spurred me to write to you about the difficulties in Indian language translations thanks to the chronic lack of tools.

"Of course, Google is playing savior by incorporating most Indian languages in its latest tools and initiatives. But, as compared to the variety of tools, databases, etc., available/supported for many other languages, our scene is dismal. The Indian government/private companies are naturally also to be blamed for this situation, for not vesting enough interest in developing resources for our languages. I believe the government does have a head start in this matter, and in spite of the recent efforts to propagate these multilingual computing resources, they are not readily accessible or good enough to be used by professionals.

"That aside, I would also like to highlight the glaring inadequacy of typing tools for Indian languages. The most commonly used tools for typing in Indian languages right now are the Indic language transliteration tools (MS, Google, etc.). Inscript keyboards/layouts are available but rarely used, as most people start off with an English keyboard and it is difficult to 'learn' a new one for 'each' language (most Indians are multilingual). As you may already know, there is a world of difference in the way in which English and Indian language characters (Devanagari and other South Indian languages) are written. As a simple example, there are no half letters in English, while they abound in Indic scripts. The transliteration method is ineffective for professionals because it reduces typing speed greatly and leads to a number of errors. Typically Indian words are long, and transliterating them into English requires many double aa, oo, uu, ii (or a mix of capital and small letters if written phonetically), and then one still has to 'select' the correct word from the many options displayed by the software. This is a cumbersome process, even though most of us are now used to it for want of another option.

"On the other hand, I think Indic typing tools for mobiles have seen greater development, but sadly almost none have been adapted to the PC/laptop. For a professional translator/writer, it is next to impossible to work on a mobile.

"Of all the existing options so far, the handwriting mode (provided in Google Translate) looks to be the best, as this is the fastest and most efficient way of writing Indian languages. But again, it is not really useful to translators unless it becomes a common tool (touch-screen laptops are expensive).

"I'm also a fan of the speech-to-text option offered by Google (supported for most Indic languages), although it is still a bit shaky and cannot be used in any other tools except Google Docs.

"Coming to Indic OCR tools, they are still in a very nascent stage and don't work most of the time, but I believe this could be a cost-effective and easier option for translators (to handwrite a translation and OCR it, because at the current speed of transliterating, writing seems faster).

"[And] I forgot to add that there are not even decent spelling/grammar checkers for Indian languages, including Hindi. Given that the Indian market is now increasingly calling for localization, especially in e-commerce and content, it makes sense to build/integrate suitable tools for translators."

That last sentiment is likely something we all could agree with.

Almost exactly 10 years ago, I suggested a stacked fee schedule determined by the degree to which technology supports any given language. It was supposed to look like this:

"There would be three different levels of languages. Level 1 languages would include languages with full support in areas like voice recognition, optical character recognition, seamless support by translation environment tools, support by major online dictionaries and/or other language resources, and spell- and grammar-checkers. Level 2 languages would include those that are missing one or two of the tools listed above, and Level 3 languages would be those that lack more than two of those same enablers.

"The fee scale would be calculated like this: If you translate between two different Level 1 languages, you would earn 10% less per word than if you had a Level 2 language involved. With two Level 2 languages you would make yet another 10% per word, and this would be the same as what a translator between a Level 1 and a Level 3 language would make.... You get the point.

"Can you imagine the rejoicing among our colleagues who translate Level 3 languages like Amharic, Inuktitut, Haitian Creole, or many other technically less-supported languages? Even translators of Level 2 languages like Arabic, Urdu, or Hebrew probably would not mind such a system. Only folks like me, who translate between two Level 1 languages-English and German-would be left in the dust."

It was not meant as a completely serious suggestion -- after all, it's not how a market economy works -- but it gives the many of us who do work in "Level 1" languages a greater appreciation of how blessed (in the sense of "receiving something undeserved") we are, and how much sense it would make to -- if possible -- aid translators in those other language combinations to be more productive.

So what can be done?

Some of the still relevant suggestions that I had back then were these:

  • Contact universities or other non-commercial entities that may have developed solutions but not released them to the general public.
  • Contact independent developers of existing tools and ask them what it would take to add support for your language.
  • Find out what kind of grants might be available to support private development activities.

Other items that I would add today would be to create a place to collect (and verify) some of the needs that Subhashri mentions above. This would have a number benefits: a) In some cases there actually might be solutions (or workarounds) that individuals are not aware of. b) It would alert developers of all kinds to needs they otherwise wouldn't even be aware of. And c) it would create a community that could pool its resources to create more muscle and options than its individual members possess.

I have created a "topic" in the Language Technology Wiki that could be a starting point. So come all ye who feel (justifiably) disadvantaged and let the world and each other know what you need!

 

ADVERTISEMENT

Across Quick Tutorials

Are you new to the Across Translator Edition? If so, take a look at our new YouTube channel. The channel features various tips and tricks to help you get started.

Go to across.net/youtube.

 

2. Augmented Translation

Common Sense Advisory (CSA) coined the term "augmented translation" some time back, and while I always felt that I understood what it meant, I never really looked into the specifics. Now I have and I think it's an interesting term that we're all better off to be acquainted with (whether we like the term or not).

Merriam-Webster defines "augmented" as "made greater, larger, or more complete" and "augmented reality" (the context where most of us have encountered "augmented" recently, not to be confused with "virtual reality") as "an enhanced version of reality created by the use of technology to overlay digital information on an image of something being viewed through a device."

Though it may sound a little spacey and very unlike "crotchety" St. Jerome's style of translation (Happy International Translation Day, everyone!!!), "augmented translation" might still be a pretty good fit. After all, we do use a lot of digital information to aid us in our translation, and we've done so for a long time. The difference between now and, say, five years ago when the term wasn't in use is that there is more digital data in additional formats available to us now than there was previously.

After a number of emails back and forth between CSA and myself (regarding when "augmentation" starts and how valid this concept is across language combinations), they were kind enough to send me this definition:

"CSA Research contends that augmented translation isn't completely new. Translators have long used web searches, external terminology databases, product information repositories and other such items, but working with these required stepping out of the translation environment to carry out some action, thus disrupting their cadence. The difference with augmented translation is the degree of integration with external informational resources, such as term discovery, outside MT, and semantic linking within the tool, which prevents the need to leave and hunt for information.

"The part the market research firm emphasizes is that this does put the translator back in the center. In it, MT and all the other parts are suggestions designed to add additional, contextually relevant information, but the translator is in control. CSA Research chose the term 'augmented translation' on the model of industrial 'augmented reality' tools that allow you to take a tablet or phone or goggles and view a device or assembly and have overlays of things like sensory data, part names, part numbers, disassembly or assembly instructions, or diagnostic steps you should take."

Spacey? A bit. But I like it. Quite a bit, actually. I love the fact that the astronaut translator is in the center and uses just the tools that help them drive the process. This is also well exemplified by this image (also generously provided by CSA):

Augmented Translation

We have been using terminology management, project management (whether on our own or through the client), and translation memory for a long time. "Adaptive Neural MT" is also a reality for many of us, and (hurray!) it's not driving the process; instead, it's just one more valuable tool for the translator -- not primarily through post-editing but as a resource that can be harvested for fragments of various sizes. (I probably would have said "Adaptive Neural MTs" just to emphasize that there could be more than one engine at a time -- but then you'd have to add a plural ending to TMs and termbases as well.)

Automated Content Enrichment (ACE) is defined as "a new technology that scans content to identify the concepts, dates, places, and other information in it, then link them to online resources" -- i.e., manual web research in an automated fashion. Not something that many of us deal with a lot at this point, but I can certainly understand and appreciate that this is gaining importance.

And here's a good example of where I'd want to make a point if I were introducing someone to the concept of augmented translation. All translators do it to some degree, but no one does it as much as is possible, particularly because "possible" is an ever-moving target. In reality, there is such a wide variety of approaches to translation and translation technology (whether by our choosing or because some technologies just might not be available or accessible in our language combination or geographical location) that we all have to put on the spacesuit that fits us. And sometimes it might be only a boot or two. 

 

ADVERTISEMENT

The SDL Trados Roadshow is back - Book your free seat

The SDL Trados Roadshow returns this October, visiting 17 cities across Europe and North America. Come and meet the SDL Trados team, network with translation professionals from across the whole supply chain and hear about the latest industry news and trends.

Book your free seat today >>  

 

3. The Tech-Savvy Interpreter: What Kind of Interpreting Work Is There in the Cloud? (Column by Barry Slaughter Olsen)

This is my first column after a much-needed summer break. While I was on vacation, innovation in the interpreting technology space continued to charge ahead. To be sure, there are still plenty of software programs to explore and new interpreting delivery platforms to test. But one question has been in the back of my mind for some time now: just how much remote interpreting work is there in the cloud?

It's a valid question that interpreters are rightfully interested in. Unfortunately, it's not a question that I can answer because the data hasn't been collected. But in early September, I reached out to seven different cloud-based remote interpreting companies of the 15 or so I am tracking and invited them to provide me with the number of interpreted events they held on their platforms from January 2016 to August 31, 2018.

Four of the seven companies provided the data I asked for. So, what follows is just a snapshot of some of the cloud-based remote interpreting taking place today and should not be considered a comprehensive survey. I have plotted the data out on the graph below. Each line represents a type of interpreted event and not the number of interpreted events hosted on a particular platform. In other words, as a general rule, platforms can host more than one kind of interpreted event.  

Remote Interpreting

 

What constitutes an interpreted event? I divided them into four different categories:

  • Over-the-phone consecutive (OPI Consec)
  • Over-the-phone simultaneous (OPI Simul)
  • Webinars with remote simultaneous (Webinars)
  • Web meetings with remote simultaneous (Web Conferences)

OPI Consec

Over-the-phone consecutive here refers exclusively to what I call "high-value consecutive" or consecutive interpreting that usually pays between US$100.00 to $US200.00 per hour and requires training in long consecutive note taking skills. Clients are usually from the international finance sector or government entities. These interpreted events usually last between 30 and 90 minutes.

Of the four groups, OPI Consec has grown the most from 2016 to August 31, 2018. This is a market segment that many interpreters are unaware of and that requires significant domain knowledge and consecutive interpreting skills.

Among these four companies, over the three-year period, there were 2,819 OPI Consec events with 1,038 in 2017 and 1,664 through August 31, 2018, which shows a 2017-2018 year-over-year growth rate through August 31 of 164%. With four months remaining in 2018, and the direction of the trend line, the final growth rate of this group will likely be even larger.  The significantly higher growth of this high-value consecutive interpreting vis-à-vis all other types of simultaneous may be indicative of customers not realizing that remote simultaneous could greatly improve their meeting flow.

OPI Simul

Over-the-phone simultaneous here refers to simultaneous interpretation for bilingual or multilingual conference calls or audio conferences. In these interpreted events participants are usually on a multi-channel audio bridge that makes simultaneous interpretation possible, although in some cases, these conference calls are conducted using more than one telephone line (and interpreters juggling two different phones as they interpret). The length of these events usually varies from 30 minutes to two hours but can go longer in some cases.

The number of OPI Simul events over the period measured (January 1, 2017 - August 31, 2018) has remained constant: 376 interpreted events in 2017 and 232 so far in 2018. No data was submitted for 2016. The trend line for this group is basically flat but may show a slight year-over-year increase by the end of 2018.

In both OPI Consec and OPI Simul, neither the participants nor the interpreters can see each other. With no visual input and the limited frequency response of some phones, these two modes are arguably the most difficult and taxing on the interpreter. However, the term OPI is, in some cases, a misnomer, as this kind of interpreting is delivered with increasing frequency using voice over Internet protocol (VoIP) technology that does not entail the use of traditional telephony and has an expanded frequency response range, which makes for higher fidelity audio.

Webinars

Webinars, or seminars conducted online, are typically presentations with a talking head in one corner of the screen and presentation slides taking up most of the remaining screen space. They usually include one speaker talking to many online audience members. Webinar platforms usually include a chat function for questions and answers and other kinds of interaction.  They have become a huge part of training, sales and public relations in both the public and private sectors. Given the potential international audience for webinars, providing remote simultaneous interpretation for these web events seems like a no brainer.

Oddly enough, this is the only line on the graph that appears to be trending downward over the period covered for these four companies surveyed. In 2016, there were 200 webinars with simultaneous interpretation; in 2017, 132; and through August 31, 2018, only 109.  It will be interesting to see if the demand for multilingual webinars grows as the technology evolves and more potential users learn about the service or they simply prove to be a short-lived novelty. There are a couple of reasons why this market segment may be flat. First, there are still too few players in the space, and second, remote simultaneous platforms cannot integrate easily with existing monolingual webinar platforms that dominate the market, and the main webinar organizers are reluctant to switch to a new platform just for the multilingual capability it may offer.

Web Meetings

I define a web meeting as one where all participants (including the interpreters) are connected through a web conferencing platform, as opposed to a meeting where most participants and the interpreters are in the same physical space and a remote participant may be connected via some web conferencing service. Web meetings truly take place in the cloud with participants and interpreters distributed geographically.  The increase in web conferencing services has grown dramatically in recent years as these services have migrated to the cloud and no longer require expensive proprietary video and audio equipment.

Among the surveyed companies, Multilingual Web Meetings are showing a clear upward trend over the three-year period. There were 44 in 2016, 135 in 2017, and 215 through August 31, 2018. The year-over-year growth rate between 2017 and 2018, as of August 31, was a notable 140%.

My Take

So, my initial question of just how much remote interpreting work is there in the cloud remains unanswered, but I do have a partial answer.

It's important to keep in mind that the data in this graph represent the operations of just four startup companies. The law of small numbers is definitely applicable. Even so, these four types of remote interpreting in the cloud are a growing segment of the market. I would like to see a more exhaustive study covering a larger number of platform providers in the future.

The total number of interpreted events over the period studied was 4,415, which comes out to an average of 138 events per month. Again, not huge in the grand scheme of things, but significant, especially if you are one of the interpreters hired to do the work. It's also worth noting that none of these interpreted interactions are replacing face-to-face interpreting assignments in conference, court or medical interpreting but they are increasing the overall volume of interpreting work.

For context, the global web conferencing market is forecast to grow at a compound annual growth rate (CAGR) of 10% between 2017 and 2024, and reach US$8.82 billion, according to one study. Web conferencing is big and will only get bigger. I don't have a dollar figure I can assign to the interpreted events on cloud-based platforms included in this article, but it's safe to say they are a very small drop in an enormous and growing bucket. If interpreting wants to be a part of the new communication paradigm, we need these multilingual platforms that will allow us to work professionally in the cloud.  

Do you have a question about a specific technology? Or would you like to learn more about a specific interpreting platform, interpreter console or supporting technology? Send us an email at inquiry@interpretamerica.com.  

 

ADVERTISEMENT

Machine Translation Quality Estimation:  

The Latest AI-powered Feature from Memsource

 

Machine Translation Quality Estimation (MTQE) provides a quality score for machine translated segments BEFORE you start post-editing. It works just like translation memory matches. When you get a 100% score, this means the quality is high and you typically don't have to post-edit. Learn more about MTQE on our blog.

Want to give it a try? Sign up for a free trial of Memsource and set up MTQE in your account.

 

4. MTQE

A couple of weeks ago I talked with David Canek from Memsource (a company that has experienced exceptional growth, especially last year, and now boasts 90 employees) about an AI (artificial intelligence) -driven feature that will be released at the beginning of October (some of you might have also seen this teaser that I put out on Twitter).

We've all been talking a lot about AI and translation, but almost without fail only in regard to neural machine translation (NMT). However, there are many more ways to use advanced cloud computing in the process of translation, and Memsource is particularly interested in exploring some of them. Eight months ago I reported about their first non-NMT AI feature: recognition of language combination-specific non-translatables. Now, the next AI feature will be MTQE (or "machine translation quality estimation"). MTQE is a per-segment process that uses language combination-specific data collected across users to estimate the quality of an MT suggestion. It gives the MT-translated segment a "fuzziness percentage," a process similar to its translation memory matches. This works (after its unveiling in October) for 70 language pairs.

That's a lot of information, so let's look at it more closely.

First of all, while the system uses the scoring system that we know from translation memory matches, the percentages describe something very different. In the case of TM matches, the percentage number describes the grade of similarity to a segment in the TM (100% being exactly the same and anything less than that being gradually different). The MT match provided by Memsource is a probability score calculated via a neural network that looks at post-editing history and estimates the likelihood of its being correct. Totally different thing. While a 100% TM match will always be correct (not necessarily in the specific translation project, but as far as its similarity to an existing translation), a 100% MT match has a high probability of correctness according to Memsource statistics, which are likely well informed but not guaranteed (and certainly not for your specific text).

And then there is the "collected across users." Memsource is a cloud-based application. By signing its Terms of Services agreement ("We will use Your data to train machine learning models"), users agree to have their data analyzed for the larger purpose of enabling services like the above-mentioned AI services. While this setting can be switched off, the default is to have it activated. It's an interesting concept that some will find hard to stomach and others will welcome as a move toward modern cloud computing.

You can use this feature as part of a pretranslation batch process or in real time, segment by segment. As I said above, this works on a per-segment level and theoretically also with multiple machine translation engines attached from which the best "match" is picked (for an impressive list of engines supported by Memsource, see here). However, the multi-engine feature will not be available in this first round.

Unlike the first AI feature -- the non-translatables -- this is a feature that requires extra payment. (David said that it will likely be comparable to what Microsoft charges for its MT engine, which is $10 per million characters.) This would have to be added on top of the fee that might have to be paid to the MT engine provider.

What are the savings then? According to pilot programs, the number of words that are 85% or above in the MT probability rating plus the non-translatables range from approximately 5% (EN>JP -- a language combination with very few non-translatables of course) to 10% or slightly above in EN>ES and RU>EN. So this could prove to be a solid time saver.

And for those who wrinkle their noses at including "85%" in those numbers, an 85% MT "match" could very well be "correct," unlike a TM match of the same percentage, which is incorrect for sure. Of course, the question that still has to be answered is what it takes for a translator to evaluate that. In the case of the TM match, the difference is already marked and only has to be "fixed." The MT suggestion, on the other hand, has to be evaluated from scratch.

The whole MTQE feature, by the way, is also available through Memsource's API, so it is possible to use it within completely different and independent translation environment tools. 

 

ADVERTISEMENT

Translation and Localization powered by STAR Transit

Watch the short video for more information on Transit functionality and usage: youtu.be/D7_pJCQ7N8s

www.star-group.net

 

5. This 'n' That

Three things:

1. If you're going to the ATA conference in New Orleans and have dictionaries (yes, those paper things) that you're no longer using, please bring them and donate them to colleagues who would benefit from them. There's going to be a table where you can offer them as gifts (and where you can take some that you might want for yourself to fill those now empty spaces in your luggage).

If you're not planning to come and have dictionaries that you'd like to share, please contact me and I'll let you know how to send them so I can take them to said table.

2. Please (!) follow @translationtalk on Twitter. It has been really, really amazing so far, and it's just getting better with every week. Just in case you still don't know what it is: It's a Twitter account for the whole community of translators and interpreters where every week someone else has a say. I've already learned a lot and so will you.

3. In the last Tool Box Journal I mentioned the new edition of my Translator's Tool Box ebook (version 13.5). It's a rich and exhaustive resource that might very well become your favorite go-to resource when it comes to questions of how to optimize your work with your computer. For the rest of this month (just a couple more days!), I'm offering a package of the Translator's Tool Box ebook (value $50), my new Translation Matters book in PDF format (value $9.95), and a one-year subscription to the Premium edition of the Tool Box Journal (value $17) for $30. (Just enter "30" right here.)

Of course, if you've already purchased version 13 of my book, just use the same download information and password to get this new version for free. If you have any other earlier version of the book and would like this new version, you can purchase it for the upgrade price of $25 right here as well.)

 

ADVERTISEMENT

PDF Translation for Professionals

Translating PDFs is easier and quicker with TransPDF.

  • Available within Memsource and memoQ
  • Compatible with all CAT tools.
  • Fast log-in for Proz members.

FREE for your next PDF project.  

All file formats, all languages, all target groups, better quality, shorter time-to-market -- one standard solution.

 

The Last Word on the Tool Box Journal

If you would like to promote this journal by placing a link on your website, I will in turn mention your website in a future edition of the Tool Box Journal. Just paste the code you find here into the HTML code of your webpage, and the little icon that is displayed on that page with a link to my website will be displayed.

If you are subscribed to this journal with more than one email address, it would be great if you could unsubscribe redundant addresses through the links Constant Contact offers below.

Here is a reader who has added the icon to their website:

swithunwells.com

Should you be interested in reprinting one of the articles in this journal for promotional purposes, please contact me for information about pricing.

© 2018 International Writers' Group    

 


Home || Subscribe to the Tool Box Journal