|
Most
of you are familiar with Microsoft's long and at times circuitous route
toward terminology sharing. When it started sharing most, if not all, of
its user interface translations in the mid-nineties (the files that were
often incorrectly called "The Microsoft Glossaries" were really
translation memories), it was widely, and rightly, welcomed as a very
visionary step. While the files had to be downloaded from an FTP site and
were in a rather cumbersome comma-separated (CSV) format, a good number of
tools were offered that specifically or as an added feature supported the
particular Microsoft CSV format.
This
was a visionary step: Making these translation memories freely available
ensured that all tools running on the Windows platform would use the
same terminology in their translated versions, making it soo much easier
for users to switch between products. (When some tools -- I'm thinking
especially of SAP here -- decided definitely not to use the MS
terminology for political reasons, it only served as a sort of back-handed
confirmation of Microsoft's vision.)
After
12 years of offering these databases to the general public, Microsoft
suddenly withdrew them and offered them only to (paid) subscribers of MSDN
(and, later, also Microsoft TechNet).
The general public was first provided with a multilingual CSV glossary and
a few months down the road with the Microsoft Language Portal.
In its first incarnation, the Language
Portal offered access to terminology searches, style guides in various
languages, language-specific blogs (which were very infrequently updated),
and a sort of crowd-sourced site for the terminology of some Microsoft
products. At first, many viewed the site as a poor substitute for the free
and large databases, but it eventually became the standard for Microsoft
terminology. And, since the search queries were done through the URLs, it
was even possible to search with third-party tools like IntelliWebSearch (which unfortunately is not possible with the
otherwise very helpful TAUS engine).
Then last week, many of you contacted me
directly or sent cries for help to Twitter: The Language Portal was gone!
Alas, it was true -- but only for a few hours, hours that seemed to last
longer because the old URLs continued to be inoperative. For in their
place, under a new address, a completely new and
handsome Portal had emerged.
I know it's very easy to be critical of
Microsoft (and Google and SDL and Apple and . . .), but it's also important
to give credit where credit is due. And credit is due right here.
Aside from a new look and an easier path to
certain things (such as clear instructions for what to do if you are
interested in the whole set of TMs), many of the features have stayed the
same (terminology search, style guides, access to blogs). However, some
features have been updated and expanded (including the number of languages
that are currently covered or the commitment to more proactively publish
new blog postings), and some things are completely new, including the
ability to download bilingual files (English into other languages or other
languages into English) in TBX format.
Is that a big deal? I think it is.
First of all, it's super helpful to once
again download data so that it can be integrated into your own terminology
resources/translation environment. And second, there is the TBX element.
I realize that some of you might ask what
exactly TBX is. TBX, the TermBase eXchange standard, is an XML standard
that allows for the interchange of terminology data, including detailed
lexical information. The adoption of TBX has gone very slowly, partly due
to the fact that many felt it was too complicated (for instance, see this article by Maxprogram's Rodolfo Raya, who has developed his
own competing and much simpler standard). Still, many tools have now bought
into it and are supporting it, including Across, Heartsome, Swordfish,
XML-Intl, SDL MultiTerm, Wordfast Pro, Alchemy
Publisher, and Star TermStar. If you own one of those
tools, the TBX is easy to import (note that in the case of SDL MultiTerm
you will first have to use the MultiTerm Convert program) --
especially since this is a TBX file with a simple structure: just source,
target, and definition.
But what about the rest of us, those who
don't have any of these TBX-supporting tools? XBench to the rescue! XBench is a free downloadable
tool (I'm still waiting for the time when the makers of XBench start
charging for it) that is good for many different functions, including
quality assurance of translation files, lookup in glossaries, terminology
databases, translation memories in many, many different formats, and the
import and export of such files. So it's quite easy to import the TBX file
and then export it into a TMX (Translation Memory eXchange) or CSV file,
which can then be processed by your tool of choice (note that you might
lose the Definition field in this process).
So why is this so big of Microsoft? Well,
maybe "big" is the wrong term, but I join many others in being
thankful that Microsoft has reached out a hand to support this important
standard. What's also been very refreshing in my dealings with this
particular team at Microsoft is that bugs I've pointed out to them have
typically been fixed within a matter of hours -- not generally something
I've been used to seeing from a very large software vendor.
One more thing about the Microsoft
glossaries: They have also been integrated into the Evroterm termbase of Slovenia and the mighty EuroTermBank -- so if those are your preferred places to search,
you'll get the terminology that way.
Oh, and in case you wonder which languages
are supported (either in or out of English), here is a list: Afrikaans,
Albanian, Amharic, Arabic, Armenian, Assamese, Azeri (Latin), Basque,
Bengali (Bangladesh and India), Bosnian (Cyrillic and Latin), Bulgarian,
Catalan, Chinese (Simplified and Traditional), Croatian, Czech, Danish,
Dutch, Estonian, Filipino, Finnish, French, Galician, Georgian, German,
Greek, Gujarati, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Igbo,
Indonesian, Inuktitut, Irish, isiXhosa, isiZulu, Italian, Japanese,
Kannada, Kazakh, Khmer, Kinyarwanda, Kiswahili (Kenya), Konkani, Korean,
Kyrgyz, Lao, Latvian, Lithuanian, Luxembourgish, Macedonian (FYROM), Malay
(Brunei Darussalam and Malaysia), Malayalam, Maltese, Maori, Mapudungun,
Marathi, Nepali, Norwegian (Bokmal and Nynorsk), Oriya, Pashto, Persian,
Polish, Portuguese (Brazil and Portugal), Punjabi, Quechua, Romanian,
Romansch, Russian, Sanskrit, Serbian (Cyrillic and Latin), Sesotho sa
Leboa, Setswana, Sinhala, Slovak, Slovenian, Spanish, Swedish, Tamil,
Tatar, Telugu, Thai, Turkish, Ukrainian, Urdu, Uzbek (Latin), Vietnamese,
Welsh, Wolof, and Yoruba.
The amounts of translated terms vary between
2,000 and 18,000 terms (you can find more information on this in
Jeromobot's Twitter stream).
|