Carnegie Mellon Scientists Orchestrate an International Video Conference Demonstrating Spontaneous Speech-to-Speech Translation in Six Languages

BY Byron Spice - Thu, 1999-07-22 12:00  Printer-friendly version

Carnegie Mellon University scientists and their colleagues in the international Consortium for Speech Translation Advanced Research (C-STAR) are conducting an international video conference to demonstrate a travel planning system on the Web, which employs groundbreaking computer speech-to-speech translation technology that can translate among six languages at six different locations around the world.

The demonstration will highlight research breakthroughs in large vocabulary (more than 10,000 words), spontaneous speech translation systems and in speech recognition and machine translation. The demonstration features a Web-based interface for the travel domain and also illustrates the role wearable computers with translating capabilities can play in this area.

Video conference participants will plan trips to Heidelberg, Germany; Kyoto, Japan, or New York City, speaking in English, French, German, Italian, Japanese and Korean. They will converse with each other in their native languages as they plan their trips while the computer systems in each of their respective laboratories orally provide the necessary translation of their spoken conversation.

In addition to Carnegie Mellon, participants include Advanced Telecommunications Research (ATR), Kyoto, Japan; Electronics and Telecommunications Research Institute (ETRI), Taejon, Korea; Communication Langagiere et Interaction Personne-Systeme (CLIPS++), University of Grenoble, France, with the University of Geneva, Switzerland; Istituto per la Ricerca Scientifica e Tecnologica, Istituto Trentino di Cultura (IRST), Trento, Italy, and the Interactive Systems Laboratories at the University of Karlsruhe, with the European Media Laboratory (EML), Heidelberg, Germany.

During the speech-translated conversations, "travel agents" (scientists in the various laboratories) will show the travelers pictures, schedules and 3D animations of their destinations, allowing them to preview the sights and accommodations at a particular locale. When everything has been finalized, they will receive their itineraries on the Web.

At Carnegie Mellon's Interactive Systems Laboratory, the demonstration site in the United States, researchers interacting with their colleagues in Japan, Korea, Italy and Germany will use the JANUS speech translation system to communicate and make reference to shared Web documents as they jointly plan a trip to Heidelberg and another to Kyoto.

A video link from Carnegie Mellon to Heidelberg will show an American tourist on location using a wearable computer translator to communicate with the local populace. The wearable system not only provides translation, it also will act as a tour guide and give directions by voice. This wearable system represents results of cooperative research between EML's DeepMap navigation project and Carnegie Mellon's JANUS speech translation project.

Carnegie Mellon, the University of Karlsruhe and ETRI also have experimented with cross-lingual avatars. Here the image of the conversant is transmitted to the other side as before, but their lip motion is automatically altered to synchronize with the translated speech. Eye motion is also adjusted to provide the kind of eye contact that two people conversing in person would have.

"Speech translation technology has matured to the point of allowing free, spontaneous dialogues using large vocabularies that can be translated into a variety of languages," said C-STAR chairman Alex Waibel, a professor at Carnegie Mellon's School of Computer Science and the University of Karlsruhe in Germany. "While earlier demonstrations showed that speech translation is possible, technology at that time permitted only a limited vocabulary and demanded perfect syntax and speaking style. In addition, speech recognition systems have been improved to handle the sloppy speech people produce when talking spontaneously to each other. The ums, urs, interruptions, hesitations and stutterings of spontaneous speech are automatically recognized, filtered and properly prepared for translation." Waibel said there also have been technological advances in machine translation--an Interlingua approach and another that is example-based. With the former, people's utterances are transformed into Interlingua, an independent intermediary language that represents the intended thought of the speaker. In translating, it can be extended to new languages without building translations between every pair of languages, but only between the new language and Interlingua. The example-based approach uses parallel bodies of text to automatically infer corresponding phrases. The advantage with this technique is that it can learn from examples.

Researchers have found that a Web-based interface allows for convenient cross-lingual interaction between two parties through the Internet. Video conferencing allows both parties to see each other as they share the same view of the object of discussion, including images, sounds or text that illustrate the conversation. "Commercial firms working in this field are already selling speech rcognition and text translation software," Waibel said. "But spontaneous speech translation is challenging because of the ambiguities of spoken language."

The C-STAR consortium was established in 1991 to conduct research in spoken language translation. Its founding members included ATR, Carnegie Mellon, the University of Karlsruhe and Siemens, A.G. Today, in addition to the principals, there are more than a dozen affiliates in Europe, Asia, North America and India.

For more information, check the Web site: http://www.c-star.org C-STAR Affiliates List

AT&T (USA)

DFKI , German Research Center for Artificial Intelligence GmbH

DLR, the German Space Agency

EML European Medial Laboratories, Heidelberg, Germany.

Fonix Corporation (USA)

HKUST (Hong Kong) Hong Kong University of Science and Technology

IIT (India) Indian Institute of Technology, Tamil Nadu, India

Lernout & Hauspie Speech Products (Belgium)

LIMSI (France) Laboratory of Information Sciences for Mechanics and Engineering Sciences, Paris, France

Massachusetts Institute of Technology - Lincoln Labs (USA)

MIT (USA)

NLPR (China) National Laboratory of Pattern Recognition, Chinese Academy of Sciences, Beijing, China

Siemens, A.G.

SONY

SRI (USA)

For More Information: 

Byron Spice | 412-268-9068 | bspice [atsymbol] cs ~replace-with-a-dot~ cmu ~replace-with-a-dot~ edu