Next: Translation Up: Speechalator: two-way speech-to-speech translation Previous: Background

Recognition

We used an HMM-based recognizer, developed by Multimodal Technologies Inc, which has been specifically tuned for PDAs. The recognizer allows a grammar to be tightly coupled with the recognizer, which offers important efficiencies considering the limited computational power of the device. With only minor modification we were able to generate our interlingua interchange format (IF) representation directly as output from the recognizer, removing one module from the process.

MTI's recognizer requires under 1M of memory with acoustic models of around 3M per language. Special optimizations deal with the slow processor and ensure low use of memory during decoding. The Arabic models were bootstrapped from the GlobalPhone [2] Arabic collections as well as data collected as part of this project.

Alan W Black 2003-06-12