Most of you know that I like to mention
where tool vendors come from -- partly to emphasize that we truly are an
international bunch of folks, but also because it intrigues me when
countries that would typically not be on my mental list of hotbeds of
development are among the places where translation technology is being
developed, including countries like Uruguay or Ukraine. But only rarely
does a vendor whom I interview stress that his company is located in a
certain country because of the positive associations with it -- especially
for his particular product.
Well, that's what happened when I talked to
(the native German) Mirko Plitt of Modulo Language about his Swiss Post-Editing Score
product. A quality assurance product from the land of super-accurate
watches! Can there be a better match?!
Now, you might ask why "Swiss" is
mentioned in the name of the tool at all. It risks misleading potential
non-Swiss clients, and it's surely not because "Swiss Post-Editing
Score" is an easy-flowing and highly marketable name! Well, listen to
what it does and you might agree that the particular Swiss brand of
precision goes to the heart of what the tool does.
Consider this actual example (don't worry if
your German is a little rusty -- it's OK if you don't really understand):
Titelverteidiger Rafael Nadal hat
erneut das Endspiel der French Open erreicht.
Nun trifft Nadal am Sonntag entweder auf
den französischen Lokalmatador Jo-Wilfried Tsonga oder seinen Landsmann
David Ferrer. Djokovic konnte sich damit nicht für die letztjährige
Niederlage im Endspiel revanchieren. "Das ist ein sehr spezieller Sieg
für mich", sagte Nadal nach seinem 20. Sieg im 35. Duell mit seinem
Dauerrivalen: "Dieser Platz ist für mich etwas ganz Besonderes. Novak
wird in einem anderen Jahr hier gewinnen, er ist ein großer Champion."
And then this:
Defending champion Rafael Nadal has
again reached the final hell of the French Open.
Now Nadal meets on Sunday either on the
French local hero Jo-Wilfried Tsonga or his compatriot experienced David
Ferrer. Djokovic could not reciprocate for last year's defeat in staying
the final. "This is a very special win for me," said ago Nadal after
his 20th victory in the 35th duel with rival duration: "This place is
for me something special. Novak will win in another year here, he's was a
Without a doubt you will stumble over the
unidiomatic English -- that's not surprising since the English text was
produced by machine translation. But you should really stumble over
the surprising "final hell" of the French Open (unless you were
just deeply engrossed in re-reading Dante and thought final hells could be
The "final hell" was inserted by Swiss
Post-Editing Score. And it did that for no other reason than to be
caught by the post-editor of this machine-translated text. The makers of SPES
(sorry about the acronym, but my fingers are tired!) are not trying to
evaluate the quality of machine translation; instead, they want to give MT
users a way to evaluate post-editors. Companies that use MT will tell you
it's hard to find good post-editors (if they find any at all), and there
really are only very subjective ways to evaluate their quality. What about
combining the well-proven ideas of sampling and error injection and merging
them with measures of editing distance (how much a post-editor changes in
the machine-translated text) to identify positive or negative outliers?
This is what SPES does.
By injecting errors and automatically
checking whether those have been corrected, it can come up with reports on
the reliability of the individual translators. And if those number are also
related to editing distance (and word count), it's possible to see whether
the post-editor was just an (unnecessarily) eager beaver and corrected
everything and anything anyway, or whether she focused on the "right
kind of errors" (and, yes, dear passionate MT foe, I know, I know...).
You can see a sample "dashboard"
report right here.
So far this product is in an alpha stage
with only two LSPs using it. In fact, how they're using it at this point is
less than sophisticated: they have to have the error insertion and the
analysis done via intermediate XLIFF files. But when the product is
launched in October it will be introduced as an API, allowing it to be
directly integrated into any machine translation engine so the process will
be automated. The price? It will be charged as a service, and will be
approximately at the level of what Google charges for its Google Translate
API, says Marko.
I'll let you know how this tool and concept
Oh, Marko brought up something else that was
interesting. As I mentioned above, it's very difficult to find qualified MT
post-editors. How to eventually solve this? Let the laws of the market sort
it out by significantly raising compensation. That's an idea! (And, yes
again, dear MT foe, I know that this still does not mean that you'll touch
it with a ten-foot pole.)