Speech / Language Translator

Smart Module

Project Description

The Speech Translator Smart Module includes a Language Translator (LT), and Speech Recognizer and Synthesizer (SR).

The Language Translation Smart Module is based upon Carnegie Mellon's language translation software which was originally used in the Diplomat Project. The original desktop PC language translation code was profiled to identify "hot spots" for software and hardware acceleration and improvements to be made in those areas. These improvements reduced the required computational and storage resources of prototype software by at least a factor of five and applying optimization techniques to the translation algorithms lead to about 6x speed up over the original desk top PC system.

Input to the module is ASCII text and outputs from the module is ASCII text. The LT computer is a "smart module" that is added to a general purpose computer. In the first prototype that is a Newton MessagePad 2000 where the English and translated text are displayed. Text input/output is via serial line. The functional prototype is based upon a 133 MHz 586 processor, 32 MB DRAM, and a PCMCIA disk drive.

The Speech Recognizer is based upon CMU's speech recognition software, providing speaker independent continuous speech recognition. The code was profiled to determine, and remove performance bottlenecks. The two-way translation from English to a foreign language has been implemented.

The image to the left is the Functional Prototype. The image to the right is the Optimized version of the Speech Translator Smart Module which is the size of a bar of soap.

The smart modules are a family of wearable computers dedicated to the speech processing application. A smart module provides a service almost instantaneously and is configurable for different applications. The speech recognition module uses CMU’s Sphinx 2 continuous, speaker independent system. The speech recognition code was profiled and tuned. Profiling was done to identify hot spots for hardware and software acceleration and to reduce the required computational and storage resources of software. Input to the module is audio and output is ASCII text. The speech recognition module is augmented with speech synthesis. The figure below and to the left illustrates a combination of the language translation module (LT), and speech recognizer (SR) module, forming a complete stand-alone audio-based interactive dialogue system for speech translation.

The figure below and to the right depicts the structure of the speech translator, from English (L1) to a foreign language (L2), and vice versa. The speech is input into the system through a microphone, and background noise is eliminated using filtering procedures. Next, the sound is converted into its corresponding phonemes. The list of phonemes is then converted into words using speaker models, dictionaries, and syntactical models. The speaker models are used to determine the linguistic characteristics of the individual users. Dictionaries are used to convert the phonemes into possible words. Then, the syntactical models are used to decide which of the possible words that could be made is correct. The resulting text is then fed to the Translation module which performs a text to text translation. The clarification dialogue takes place on-screen in the Edit operation, making sure that misrecognized words are corrected. A speech synthesizer performs text to speech conversion at the output stage.

Last updated on 13 January 1999