On to Adaptive MT.
It's relatively easy to train rules-based
machine translation (like PROMT, SYSTRAN, and Lucy MT)
on the fly -- even if the process of entering new terms and phrases tends
to be rather cumbersome. (You have to go through a number of options to
teach the system not only the new word or phrase but also information about
it such as part of speech or other grammatical information).
But the good thing about an RbMT system is
that it is possible to finagle the outcome -- which traditionally has not
been the case with statistical machine translation systems.
Here the data that the system uses to create
translated text sits in so-called "phrase tables" that typically
cannot be written to interactively. So rather than learning interactively
as you make changes to MT suggestions, you (or a system administrator) will
have to set the system up to rebuild the MT engine with updated data on a
regular basis. This is a very cumbersome and time-consuming task, not to
mention that it's super-frustrating to have to wait a few days or even more
before you can stop changing the same poor output again and again.
There are some exceptions to this. One was
developed as part of the EU-funded MateCat project. They developed a
process that uses a technique called "cache-based online adaptation
for machine translation." (You can read about it right here.)
Very unfortunately, this technology did not make it into the commercialized
version of MateCat, but it looks like the technology's current
"guardian" -- the Fondazione
Bruno Kessler -- has ported this technology to yet another EU
initiative, the ModernMT
project, which "will overcome four technology barriers that still
hinder the wide adoption of currently available MT software by end-users
and language service providers." We'll see what will come out of that.
I certainly hope that in this case, the technology will end up seeing the
public light of day.
Another tool that has tackled this is Lilt,
and I just recently wrote and recorded a video trying to explain just how
it manages to do this. (You can watch the video right here.)
And later this year, SDL Trados Studio
(as well as other SDL translation products) will be equipped with a feature
that, while using a statistical machine translation engine, (almost)
immediately learns from your corrections. All you have to do to make that
happen is to, well, make said corrections.
Here's how it works behind the scenes:
Rather than training a machine translation engine with translation memory
data you might have collected, you select the base-line engine (the
non-specialized engine) of the SDL Language Cloud MT offering and
then customize that.
Ah, you might say, my work will benefit
everyone else as well! No, not quite (in fact, not at all). That's because
while you do share the core MT engine with others, the customizations that
you enter will not be seen and/or used by others. How is that possible?
Well, just as in most statistical machine translation products, the actual
phrase table (see above) stays unchanged by your corrections, but whatever
corrections you do make are stored in your "private phrase table"
within Language Cloud. That private collection of changes
essentially modifies the output of the engine to make it produce results
that are more similar to your previous corrections.
The demo that Daniel and Massi showed me was
really rather impressive. One term within a lengthy sentence was altered,
and after a short delay (thus the "almost" immediate learning
mentioned above) that term became the preferred term for the next lengthy
sentence, despite the fact that it really was not a commonly used term.
The tool allows you to create as many
"instances" of a customized engine as you want (which might well
be necessary for different clients), and while it will be available at
first only between English and French, Italian, German, Spanish, and Dutch,
you should be able to expect other language combinations to follow suit
I'm happy to see this feature and look
forward to seeing how it will perform with a large number of changes and
what the actual improvements will be (in an internal test that SDL did last
year, they said they had to perform an average of 250 fewer edits in a
post-editing scenario of 300 segments in an EN>FR project). I'm also
interested in how this feature will be beneficial in a non-post-editing
scenario with the machine translation as just one additional resource accessed
by features like AutoSuggest (early feedback from SDL is that the
implemented changes will indeed change there also).