Found in Translation

  • InterACT helps change science fiction into science fact
By Karen Hoffmann (S'04)

The Thai toddler cries and cries. Her father doesn't speak any English, and the visiting American doctor barely knows a word of Thai. Still, the doctor asks, "What's wrong?" A rugged laptop computer translates and speaks the Thai words. The girl's father answers, "Her stomach hurts," and the computer repeats it in English. In this way, the doctor is able to diagnose and treat the girl's pain.

In a boardroom of a major company in Europe, five people enter. Each speaks a different language and they wouldn't ordinarily be able to understand each other. Cameras on the wall recognize each person and track where they sit. Then, individualized audio translations of what others are saying are beamed to them without wires or headsets so that only they can hear.

In situations like these and many others, technology developed at the International Center for Advanced Communication Technologies is changing lives. InterACT is a partnership between Carnegie Mellon and several international universities. Under the direction of Alex Waibel, professor in CMU's Language Technologies Institute, it's developing software that translates, in real time, spoken English, Spanish, German and Japanese.

InterACT scientists started out translating broadcast news, then moved to speeches and lectures. In October, a start-up company founded by Waibel rolled out an iPhone app, called Jibbigo, which translates live conversations between English and Spanish. "We've been working on this technology since 1987, and this is the first time we've come out with something everybody can use," says Waibel, who also has an appointment as a professor of computer science at Universitat Karlsruhe in Germany. "I'm just ecstatic."

The progress being made toward a Star Trek-style "universal translator" is the product of years of work dating back to the early 1990s, when interACT created its first speech translation device. Instead of trying to train computers to learn the rules and idiosyncrasies of various languages, interACT uses statistical analysis.

Researchers transcribe lengthy sound files into text, then feed both the audio and the text through a series of algorithms. The translation software learns to associate certain sounds with certain text patterns--and the more data used to train the software, the better the translations become.

News and political speeches proved relatively simple to translate because they tended to follow a set pattern and proceed smoothly, Waibel says. But developing Lecture Translator, which debuted in 2005, presented more of a challenge.

Lecturers were prone to stuttering, vocalizing their pauses (saying "um" and "ah") and making false starts in their sentences. Plus, the vocabulary needed to translate lectures on thermodynamics, for instance, was very different from that of everyday speech, or even from lectures on other topics such as philosophy.

Waibel's group jumped those technical hurdles by making their translators adaptable to specialized vocabularies. Now, a lecturer can input her text before she begins speaking. Lecture Translator goes online and searches out the meanings of any unfamiliar words, along with terms related to those words.

During the lecture, the recognition algorithms run with this modified vocabulary. As a professor lectures in English, her words and a real-time Spanish translation of them are shown on the screen behind her. German is also in the works.

Another technology developed at interACT uses highly directional loudspeakers--sort of "acoustic laser beams"--to whisper translation to individual people in a room. Waibel says it's like getting a personalized translation at the United Nations General Assembly, but without wearing a big, clunky headset.

The most difficult arena to translate, so far, has been that bane of many people's existence--the meeting. "Meetings and phone conversations are particularly nasty because they are particularly disfluent," Waibel says. People interrupt each other, for example. "The whole vision we're trying to realize is building a conference room where we could all sit in our respective chairs, talk in our respective languages, and hear our respective languages," Waibel says. "My life's dream is to realize that. With targeted audio and simultaneous translation capability you can feel that this would be possible."

InterACT also brings its technology out of the conference room and into the field. The Jibbigo app, which costs $25, is unique among translation applications in that it runs locally on the iPhone and doesn't require an Internet connection. (The name comes from "gibberish" and the Japanese suffix "-go," meaning language.)

At a lecture in October, Waibel showed video of Jibbigo being used to help doctors communicate with patients at a clinic in the Honduras. Even in the United States, many hospitals and clinics aren't equipped to deal with people who don't speak English. Waibel's team provides their Spanish translation technology to Pittsburgh's East Liberty Family Health Care Center, which treats people without health insurance.

In the spring, Waibel and graduate student Matthias Eck attended the annual two-week Cobra Gold joint military exercise between the United States government and various Southeast Asian nations. Every day, they were up at 5 a.m. to travel with American doctors to remote villages in Thailand, bringing along their technology to help patients vocalize their needs. These and other field exercises provided real-world tests far more severe than anything that researchers could concoct in a lab, Waibel says. "You're there with a laptop and dealing with everything from (speech and translation) algorithms, to 'how do you confront a Thai farmer with a microphone and explain he has to speak at the beep?' And in the background your noise problems consist of chickens squawking!"

In the end, says Waibel, the goal is to create flexible technology that's adaptable to local situations. "We can't possibly, in the lab, adapt it for every language or dialect situation; things will always have to be done locally," he says. "We need to build technology that is simple enough to be maintained by people themselves, so that someone in the field can put a word into the system themselves.

Waibel notes that he's not trying to replace human translators; they're particularly useful in diplomatic situations, where they can provide context and calm people down. Instead, he's interested in helping in situations where the choice isn't between human and machine translation--it's a question of using a machine or getting no communication at all. "The older I get, the more helping people becomes important to me," Waibel says. "It's less about having the latest widget and more about actually touching people's lives."

Karen Hoffmann (S'04) is a freelance writer who frequently covers science and technology. She interviewed Maxine Eskenazi for the Spring 2009 issue of The Link. For more information on interACT, visit www.is.cs.cmu.edu.
For More Information: 

Jason Togyer | 412-268-8721 | jt3y@cs.cmu.edu