Integrating Translation Tools in Document Creation
Fortunately, this article was created for an English-speaking audience, thereby allowing us the luxury to set aside many globalization and localization considerations in our writing. Such is not the case, however, for the technical writers at Caterpillar Inc., the Illinois-based, multinational heavy equipment and engine manufacturer.

Approximately 50 percent of Caterpillar's annual sales are generated outside the United States, in non-English-speaking countries. Legal requirements in many of these countries mandate that country-specific-language owner's manuals accompany product shipments. This ensures that the product's users understand its proper operation and maintenance and minimizes safety risks to consumers.

With improvements in technology that make information more readily accessible, the demand for translations has grown rapidly. As with all businesses, the pressure to reduce costs and cycle times has increased along with demand, propelling Caterpillar Corporate Translations from a typewriter-and-dictaphone translations environment to computer-assisted translation (CAT).



A fuzzy match turned perfect.

Caterpillar Corporate Translations processes multiple project types and many unique translations. Tools such as STAR Transit, TRADOS, Déjà Vu and Arbortext are often used in the translation business. Depending on the nature of the material and the languages required, Caterpillar selects the optimal solution for each project, not only from off-the-shelf tools, but also from Caterpillar-developed products: Service Information System (SIS) Authoring, Caterpillar Technical English (CTE) and Automated Machine Translation (AMT). CTE and AMT were developed in conjunction with Carnegie Mellon University. For purposes of this article, we will examine SIS Authoring and the CAT tool Déjà Vu.

SIS Authoring, using SGML tagging, is used by Caterpillar to produce support documentation for machines and equipment. In SIS Authoring, information is stored in pieces or Information Elements (IEs). An IE is a small, self-contained unit of information, a subset of a document. Fortunately, similarity within product groups makes IEs repetitive across different product documents. Information is then shared between documents.

In 1991, Caterpillar launched the CTE/AMT development project with Carnegie Mellon University to further improve the creation and translation of technical documentation into three core languages: Spanish, French and German. CTE consists of a controlled vocabulary (approximately 80,000 technical terms) and all of the English grammatical structures required when writing technical documentation. CTE ensures that AMT is able to translate what authors write in English. New CTE terms are added into the dictionary to support new product introduction. These new CTE terms are translated into AMT languages and are loaded, along with corresponding linguistic information, into AMT.

The development of machine translation was justified by the volume of SIS Authoring documents requiring translation for the core languages. AMT reduces translation efforts by about a third, but it still requires review by a human editor.

In an effort to reduce post-editing time, Caterpillar developed Translation Memory Tool (TMT), an in-house tool for reusing information by locating exact or partial matches of English sentences with previously translated sentences. After translation, the source and target language IEs are aligned into source-target language pairs and are stored into the TMT memory database. The languages using AMT are first processed through TMT and then sent through AMT, which completes the machine translation of each non-translated sentence.

Within the last ten years, external developments in the language technology industry, especially in the field of sophisticated fuzzy-matching algorithms, have produced tools with similar functionality to TMT. These tools have provided Caterpillar Corporate Translations with the ability to reduce internal development costs, translation costs and cycle times. In 2001, Caterpillar added Déjà Vu to its repertoire of translation memory tools, based on five criteria: existence of a strong fuzzy match module; existence of a strong terminology component; ability to exchange data with other systems in its terminology and memory components; ability to deal securely with Caterpillar's SGML format; and batch processing capability of hundreds of files.

Fuzzy Matching

Fuzzy matching is the ability to find exact and partial matches for a source segment and its components. Among the translation memory tools now available, features such as fuzzy matching for the terminology and translation memory components, a customizable degree of fuzziness and an automatic lookup of found components have become widely accepted standards. In addition to these features, Déjà Vu employs example-based machine translation (EBMT), which allows it to turn fuzzy matches into perfect matches, thus emulating machine translation while still using translation memory algorithms.

Here is an example of how this works. For the source segment

Caterpillar, the heavy equipment and engine manufacturer

the translation memory's target

Caterpillar, le producteur de matériel lourd et de moteurs (Caterpillar, the heavy equipment and engine producer)

would be a fuzzy match, with the correct settings displayed and inserted as such. If the French term for manufacturer is also in the terminology database, the tool will display it, allowing the user to delete producteur (producer) and add fabricant (manufacturer). If both the translations for producer and manufacturer are in the terminology database, Déjà Vu will assemble the translation by automatically deleting producteur and adding fabricant at the appropriate location, thus turning a fuzzy match into a perfect match without any user intervention:

Caterpillar, le fabricant de matériel lourd et de moteurs (Caterpillar, the heavy equipment and engine manufacturer)

Especially for AMT languages, with their existing sophisticated terminology databases from the AMT system and their in-house TMT databases in sentence pairs, the gains from fuzzy matching for Caterpillar Corporate Translations were immediate.



A user-defined SGML filter file.

SGML

By definition, SGML is a user-definable format. On one hand, this is the essence of its power because it is possible to exactly define its behavior; on the other, this very lack of standardization is also what makes working with it so difficult. The translation technology industry -- specifically the translation memory industry -- has developed different methods to respond to this challenge.

One method is to take the Document Type Definition (DTD) file, which defines all tags and attributes of a certain set of SGML files, and let the translation memory tool automatically generate a specific filter file that will then help to interpret the SGML file(s) correctly. Another method is to completely ignore the DTD file and offer a way for the user to decide on how each of the attributes and tags is to be treated purely on the basis of the SGML files. The user decides whether an SGML tag will be embedded (if it is a part of a segment that may, for example, format a certain section within a translation unit) or completely hidden from view, or whether keys within tags are to be extracted and translated or hidden from view.

The makers of Déjà Vu have chosen this second option.

Each method has its benefits. The first assures a faster, more automated generation of the filter files, whereas the second gives more power to the users, thereby allowing them to fine-tune the filter. It does not rely on the DTD files, which are often difficult to obtain, and it allows the user to work with any file format that adheres to the SGML specifications (including XML, Cold Fusion and so on).

Caterpillar felt comfortable with Déjà Vu's way of dealing with SGML files because of its sophisticated customizability for Caterpillar's highly uncommon set of SGML files. Furthermore, Déjà Vu strictly displays only translatable content with inner-translation unit formatting codes embedded and protected within the translation unit, thus guaranteeing the greatest possible security and integrity for the files.

For example, Déjà Vu displays the following strings from an Information Element file:

Several tool control systems can be equipped on your machine. However, only one tool control system can be equipped at one time. Refer to Table for the tool control systems that are available for your machine. After you determine the correct tool control system for your machine consult your Caterpillar dealer.

with all internal tagging information protected and hidden in read-only codes ({123}) as shown in the accompanying screen shot.



An Information Element file.

Batch Processing

The idea of batch processing SGML projects with hundreds of files was also important. Déjà Vu allows batch processing at the file management level and the actual translation level. It is much easier to work with one project file that contains hundreds of IE files instead of opening and closing each of them individually. In the translation and editing stages, consistency checks, global search and replace, sort and filter options and conventional functions such as spell checks not only save time but also improve quality by allowing the user to perform these functions in the project as a whole, rather than piece by piece.

Another important feature associated with Déjà Vu's batch functionality is that it enables propagation, which automatically replaces duplicate source sentences with the appropriate translated sentence throughout the entire project file. Propagation is not possible in workflow task management systems that force a translator to translate one IE file at a time.

In Conclusion

Overall, Caterpillar Corporate Translations' introduction of Déjà Vu has been successful, providing reductions in both cycle times and translation costs. The conversion of existing terminology and memory databases and the integration with existing processes have gone smoothly. With the savings realized as a result of the addition of Déjà Vu to Caterpillar's translation tools, the purchase and implementation costs will be recovered within a short period of time.

Kirsi Rintanen is Asian language manager and IT liaison at Caterpillar Corporate Translations, Peoria, Illinois (http://www.caterpillar.com/).






Jost Zetzsche is a localization consultant and translator and a co-founder of International Writers' Group, LLC, in Oregon. He can be reached at jzetzsche@internationalwriters.com.
--MultiLingual Press, reprinted with permission of translationzone.com

 

International Writers' Group || Déjà Vu Support

©1999-2005 International Writers' Group, LLC