Here is the truth: At first I was a little skeptical when Peter
Reynolds of Kilgray
started to show me TaaS -- Terminology as a Service -- at this year's
ATA. The lingo that surrounded the product was just a bit too bureaucratic
and jargonic (and, just in case you wonder, yes, from now on that's an
official word). Need some proof? "The motivation for the TaaS project
is to address the need for instant access to the most up-to-date terms,
user participation in the acquisition and sharing of multilingual
terminological data, and efficient solutions for terminology resources
reuse." All right, then.
But the more he -- and later Tatiana Gornostay from Latvian translation
and technology provider Tilde -- showed me, the more I was won over by the
depth and thoughtfulness with which this project was designed.
TaaS is a project that has received major funding from the European
Union Seventh Framework Programme. It has five collaborators:
Fachhochschule Köln, Kilgray, University of Sheffield, TAUS, and -- as the
coordinator -- Tilde. The project is presently still in its beta phase
(which was just launched on November 1), but it will eventually be a large,
cloud-based terminology resource in all official working languages of the
EU for translators, interpreters, terminologists, and technical writers
("language workers," according to their lingo -- a term I rather
Presently (November 27), English, French, German, Hungarian, Italian,
Latvian, Lithuanian, and Spanish are supported; in just a few days, 16 more
languages will be supported (Bulgarian, Croatian, Czech, Danish, Dutch,
Estonian, Greek, Irish, Maltese, Polish, Portuguese, Romanian, Slovak,
Slovene, Swedish, and Russian). Note that "support" for one
language does not necessarily mean the same for every other, but before we
go into this, we should probably see what this tool does in the first place.
Once you've registered, you can upload one or several files in various
formats (PDF, DOC(X), XLS(X), PPTX, RTF, TXT, XLIFF, XML, or HTML), have
terminology extracted from the file(s), apply content within existing
terminology resources to those terms, select from the suggested
translations and/or translate the terms, and then export it so that you can
use it within your terminology database or glossary.
All this would still not be too overwhelming -- after all, there are
plenty of tools that extract data -- were it not for a number of advanced
tools that are (optionally) applied to this process. These include Tilde's
wrapper system for CollTerm (here
is some more detailed information about that), which performs a linguistic
analysis (part of speech tagging, lemmatizers, morpho-syntactic patterns,
etc.) as well as statistical analysis; Kilgray's terminology extractor,
which also performs a language-independent statistical analysis; and a tool
to normalize terms, which brings terms into their canonical forms
(typically nominative singular or infinitive). This latest, very cool
feature is unfortunately only available for English and Latvian at this
point (thus the aforementioned different levels of support).
Once that is done, the extracted list of terms will be run against a
number of (again, optional) resources in the following order: 1. your own
personal resources that you might have collected on the site; 2. other
users' terminology (I'll explain in a second); 3. the EuroTermBank; 4.
the EU's inter-institutional terminology database IATE; 5. the TAUS corpus; and
6. the TaaS statistical database (SDB) that consists of aligned web data.
Once these databases have been queried for translations, they will be shown
as suggestions from which you can choose by just clicking on them and/or
you can enter your own translation.
To test the system, I uploaded a rather technical 9,000-word English
file out of which 430 terms were extracted. Of these 430 terms,
approximately half were terms that were very good suggestions as terms for
my termbase -- which I estimate to be a good average -- and of the
remaining 200-some terms, about 50 had various translation suggestions into
German, usually with one that I chose. The terms for which no translation
was found included "stellar research challenge," "greater
cost accountability," and "translatable content" (yes, it
was a text about translation technology! ) -- so no surprise that these
were not found in an existing termbase.
The suggested translation did typically came from the ETB and some
(not particular helpful ones) from undefined web sources -- I suspect that
the IATE was not properly queried because it is presently under maintenance
(you might have noticed that also in your private searches), and the
connection to the TAUS data is either still buggy or just takes so long
that it really is not viable at this point.
Of course, one of the ideas behind this project is to make it possible
to share terminology data. At the outset of each project you can enter a
whole lot of optional data, but you will need to make a decision on
the language combination, the domain of your text, and whether you want to
share the data with other users. The shared data will not include the
complete texts that you upload but only the term pairs that you will end up
with in your termbases (and only on an individual term pair level rather
than complete lists of term pairs). Presently, the suggested term pairs are
provided anonymously, but at some point in the project the source of the
respective data will be visible along with an option to contact that person
or company. (I assume that it will also be an option whether I want to be
The shared data will also be used for other purposes, including
machine translation. Both Tilde and TAUS have a strong interest in machine
translation (and so does the EU as the funder of this project), and
high-quality termbases are naturally helpful for machine translation.
We will continue to see the tool as a standalone tool but also
integrated into translation environments. Kilgray's memoQ will most
likely be the first to offer a "plugin which will allow users to send
a document from memoQ for term extraction there, and for users to
use their TaaS termbases within memoQ" (quote from Peter
Reynolds.) There is no doubt others will follow, though. Paul Filkin from
SDL has already expressed some interest for Trados Studio (I imagine
that in that case it will take the form of an app in the SDL OpenExchange)
and others will follow, especially because there is a good API (application
programming interface) that is being made available.
I'm eager to see what kind of response this tool will get. Or let me
correct myself: I'm pretty sure that it will get a good response from
translators -- the extraction feature is just too clever (at least in
English) not to be used widely. What I'm eager to find out is what kind of
response the sharing feature will get. It seems tantalizing to be able to
communicate with others about their terminology and to share or use others'
experience in the form of their terminology. Will that also include our
willingness to opt into sharing, knowing that the data will be used by
machine translation engines?