Voices, I Hear Voices

by Jost Zetzsche, Ph.D.

here are a lot of complaints that speech recognition—the ability to dictate to your computer—is geeky technology. But I think the very opposite is true. How geeky is it to hack on a keyboard to make your computer understand what you are trying to say? Really: think about it. It makes so much sense to speak to your computer, dictate text, and navigate through programs. The only geeky part about it is that we're not used to it.

Andrew Levine agrees. The March 2012 edition of the ATA Chronicle carries his very informed article on speech recognition, with a number of approaches that I think are great. For example, he says that, like fingers, vocal chords can get tired, too, so he "normally translate[s] for about 45 minutes dictating, then switch[es] to keyboard typing for 15 minutes to give [his] voice a break."

Some things are just more practical to do on the keyboard, and this is particularly true if you need to switch between languages.

That's a great approach. I do it a little differently, though I am even less of a "purist." I use voice recognition only when I think I need to speed things up a little, when I have a text that is well suited, or when my fingers just don't work the way I want them to (which unfortunately happens more often than I care to admit). But even when I dictate, I don't unplug my keyboard or simply refuse to use it. Some things are just more practical to do on the keyboard, and this is particularly true if you need to switch between languages, an obviously common occurrence for translators. Dragon NaturallySpeaking, the only reasonable third-party program on the market for the PC, supports at least the native language and English in the Dutch, German, Spanish, Italian, French, and Japanese editions; however, unloading one language and re-loading the other takes at least a couple of minutes.

User profiles for different languages and input devices in Dragon

The same loading and unloading needs to be done for different kinds of microphones you might be using, each of which requires a different user profile. If you do have access to a Bluetooth microphone, you will want to use that; otherwise, a USB mic will do as well. I actually even use an analog microphone quite frequently and do rather well with it.

So, which texts are well suited—or better, which texts are not well suited—for speech recognition? The answer to this depends partly on your particular translation subject. In my work, I avoid using voice recognition in texts that contain a lot of proper names and/or loan words. This does not mean that you can't teach the program to recognize the proper names and loan words, but it's one of those judgment things: If you want to use speech recognition (or anything else for that matter) to become more effective, you'd better make sure that you truly are gaining efficiency. If you have to spend an hour to train it to recognize a bunch of new terms before translating for an hour and a half on a job that would otherwise have taken you only two hours, that seems like wasted time to me. Plus, while I enjoy translating, I can think of better things to do than training speech recognition. On the other hand, if I can expect those proper names and loan words to occur again in future projects, I may just as well spend the time to train.

My first rule for success with speech recognition software will probably have the "purists" shaking their heads in agony. After having used the software for some time, I know some of the weak spots of my speech engine (or my pronunciation). Rather than using the "correct" function again and again, I prefer to type those problem terms even while dictating the rest.

My next rule: Take some time to get used to not "thinking with your fingers." Instead, try to pre-formulate longer segments and then speak them coherently for better results.

This goes right along with the next kind of texts that are not well suited for speech recognition because it's hard to say them naturally: texts with a lot of formatting. Depending on what kind of translation environment tool you're working with and how formatting is handled by the tool, it may be easier to use the keyboard shortcuts for those that you are used to. If there is really a LOT of formatting, it may be easier to just type the whole thing.

Now, technically, there is no formatting function or other fancy maneuver that your speech recognition can't do—that is, if you have the right version. The Dragon NaturallySpeaking Premium version (formerly Preferred) comes with all basic formatting in environments likeMS Word or its own editor, DragonPad. When you use a translation environment tool that makes you work in an interface other than Word, you will have a hard time doing everything with voice commands unless you have the Professional edition, in which you can easily write macros with virtually unlimited possibilities.

Macro Editor in Dragon's Professional edition

The problem is that while the Premium version has a relatively modest price tag, the Professional version does not. Once you have the Professional version, you can either stay there and pay premium prices for upgrades because you are interested in the slightly better recognition that typically comes with each new version, or you can go the cheap route, downgrading at some point but then losing all your macros. That was a problem and the eventual solution I had with Dragon.

Windows 7 also contains an internal voice recognition program for Chinese, Japanese, German, French, Spanish, and English.

"
Speech Recognition dialog in Windows 7
(select Speech Recognition in the Control Panel and then Advanced Speech Options)

This feature has suffered some very public criticism, but I was rather impressed with its accuracy and user-friendliness in a couple of unscientific tests that I ran. I dictated the same paragraphs in both programs and had only a slightly worse recognition in Windows than in Dragon (96% vs. 98%).

So, unless you are an awesome typist and refuse to change that geeky habit of exclusively using your fingers to enter text, speech recognition is a great alternative to "typing," even before carpal tunnel syndrome hits.

We're told that a good relationship can't be built with one party doing all the talking, and that bond with your computer is no exception. When you get tired of dictating, you can have your computer talk to you.

Text-to-Speech is not a technology that makes sense for everyone. But for those who translate readable texts (as opposed to those who translate cryptic error messages) and need a second set of (virtual) eyes, this is a great way to catch errors that you tend to overlook—especially when you edit your own translations with automatically limited error-catching abilities.

If you are already using Dragon NaturallySpeaking for voice recognition, you're set—it contains a decent text-to-speech engine as well and obviously one that will speak your language (that you usually dictate in.)

Options for Dragon's text-to-speech engine

If you just want to use the features that Windows offers, you can give those a shot as well. Outside of MS Office I'm not particularly impressed with it, but within the latest versions of Word, Excel, and PowerPoint it works well. To make access more user-friendly, you should probably add the Speak button to the Quick Access Toolbar (in Office 2010 applications).

To do this, you can select the little dropdown arrow to the right of the toolbar and select More Commands. Then select All Commands under Choose commands from and add the Speak command to the Quick Access Toolbar field.

Now you can have whole documents read to you, or you can highlight specific parts by clicking the Speak button. And if you need to change the speed of "Microsoft Anna" (sorry, I didn't come up with that name, but she will be replaced with David, Hazel and Zira in Windows 8 ), you can do that in the Windows Speech Properties dialog.

Text to Speech dialog in Windows 7
(select Speech Recognition in the Control Panel and then Text to Speech)

If you need to have your texts read to you outside Office, or if you don't like Anna for English or Lili for Chinese—the only languages that are available—you can use one of the many third-party applications. As mentioned above, Dragon NaturallySpeaking is a good option. A quick Internet search will reveal the many other programs that are available for your language and taste.

As you wade through the voices vying for your attention today, don't forget to make the most of that voice connection to your primary tool through voice recognition and text-to-speech technology. It's a compelling one.