1997 Project Summary
Research on Spoken Language Systems
Carnegie Mellon University
ARPA Order No. A052


Research Data
Related Information /
Project URL:
http://www.cs.cmu.edu/~air/ARPA/SLS.html -- Additional project information provided by the performing organization
Objective: Carnegie Mellon's objective is to create technology for real-time unlimited vocabulary spoken language processing in the context of practical applications. Their long-term goals continues to be the enhancement of the accuracy, robustness, portability, scalability and utility of spoken language systems, through the development of strategies to automatically acquire knowledge at all levels of the process and through unification of structures to represent this knowledge.
Approach:

Carnegie Mellon pursues a broad program of research in the context of practical applications of value to the Department of Defense. Three application systems currently development draw on this work: (1) A speech interface for a wearable computer that provides access to information resources for the mobile warfighter. (2) The News on Demand system that uses speech recognition for automatic indexing of broadcast materials and for subsequent retrieval. (3) DIPLOMAT, a flexible multi-modal speech-to-speech translation system. Each application focusses on a currently difficult of aspect of speech technology.

The wearable speech system work focusses on the development of speech-only interfaces to small, wearable devices for which conventional interface modalities are inappropriate. Such systems require the development of strategies for interacting fluently with the user, providing for orientation and error correction. We first addressed this problem in the context of an amphibious assault vehicle inspection task for the Marines, where speech is the means of recording inspection data and of accessing maintenance resources. Our current work broadens this focus to the DIPLOMAT system and information access, where speech is used potentially in conjunction with other input modalities in a reconfigurable multimedia interface.

The News on Demand work focusses on the problem of decoding "found speech", that is, speech not originally produced with the intent of being automatically decoded. This domain requires the capability to automatically segment a broadcast stream and to automatically adapt the language model to content. It emphasizes basic recognition techniques and adaptation to evolving situations. The testbed for News on Demand is the indexing of broadcast news, with a goal for increasing accuracy of transcription and the sophistication of queries that are possible on the resulting database.

The DIPLOMAT work focusses on two problems: cross-language communication with a wide range of individuals and the development of procedures for rapid deployment of speech and translation capabilities. This domain emphasizes speech interfaces that are simple to use and that can be easily learned. It also emphasizes procedures for rapidly acquiring and processing language-specific information (that which is needed for acoustic, lexical and language modeling for speech and for the creation of translation knowledge bases, both transfer-based and example-based). The testbed for DIPLOMAT is rapid development of speech-to-speech translation capabilities for a succession of non-English languages.

The application-specific work described above draws on a core of spoken language research carried out by Carnegie Mellon. The goal of this work is to improve speech recognition through fundamental advances in technology, including automatic acquisition of phonetic, lexical, syntactic and semantic knowledge. The development of dynamic domain language adaptation, the development of algorithms to detect and assimilate new words and to extend grammatical coverage for them. The development of algorithms for environmental robustness to maintain good recognition performance under various conditions.

Recent FY-97 Accomplishments:

Extended the DIPLOMAT cross-lingual speech communications system to work with two additional languages of interest to DoD, Haitian Creole and Korean. Generalized and improved rapid deployment techniques.

Extended our mobile/wearable speech systems work to a new domain, indexing (e.g., license-plate lookup) system developed in cooperation with the City of Pittsburgh Police. Extensions included a more sophisticated dialog-level model for interaction and integral use of speech response.

Delivered and installed a version of the News on Demand system in the DARPA TIE facility.Significantly improved retrieval accuracy for News on Demand, by tailoring retrieval to the characteristics of the domain (errorful transcribed speech). Accurate retrieval is possible even with significant degradation in transcription accuracy.

FY-98 Plans:

Generalize and systematize our existing software for mobile speech systems, including wearable-based recognition and open-microphone algorithms, techniques for dynamic grammar creation and Web browser based recognition. Deliver a concise and flexible API for the Sphinx recognition system.

Demonstrate and deliver a decoder incorporating language model based on token trigrams, where token point to recursive transition networks, allowing a flexible combination of statistical and rule-based language models. Initial evaluation indicates that this decoder architecture will produce a 15% improvement in command detection.

Develop and deploy server-based recognition systems that provide a range of services suitable for communications-based spoken-language systems in computationally-scalable environments. Make the service available for use by other sites on an ongoing basis.

Demonstrate improved algorithms for speech recognition using narrow bandwidth communication channels and reduce the current penalty (over wide-band speech) by a factor of two.

Technology Transition:

Carnegie Mellon provides a suite of speech recognition tools for use by technologically sophisticated research and development organizations as a point of departure for research and productization efforts. We have migrated our software from a research environment to successively more accessible development environments, currently including Windows and Visual Basic. The suite currently includes tools for acoustic modeling, language modeling, a decoder,a spoken language parser and example applications. We also provide (in the public domain) a 100,000+ word pronouncing dictionary used for lexical modeling and implementations of speech coding algorithms.

The Speech Library has been used (and productized) by Apple, DEC, IBM, Kurzweil AI, Microsoft, Verbex, VPC, and Sun. Impact is measured by the availability of products from computer manufacturers (such as IBM and Apple) based on CMU technology. At the same time, we provide our software to universities, federal contractors (in the past year, Lockheed Martin and SAIC) and government labs for research into speech interfaces and for the development of prototype systems. In the past year we have provided systems to NIST, NRL, and the U.S. Army Corps of Engineers.

For information on access to products based on our software, contact: Dr. Alexander Rudnicky, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213. Phone: (412) 268-2622; Email: air@cs.cmu.edu

Principal Investigator: Raj Reddy
School of Computer Science, Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, Pennsylvania 15213-3891
(412) 268-2597
(412) 683-5348 fax
reddy+@cs.cmu.edu
Co-Principal Investigator: Roni Rosenfeld
School of Computer Science, Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, Pennsylvania 15213-3891
(412) 268-7678
(412) 268-5576 fax
roni@cs.cmu.edu
Co-Principal Investigator: Alexander Rudnicky
School of Computer Science, Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, Pennsylvania 15213-3891
(412) 268-2622
(412) 268-5576 fax
roni@cs.cmu.edu
Co-Principal Investigator: Richard Stern
Department of Electrical and Computer Engineering, Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, Pennsylvania 15213
(412) 268-2535
(412) 268-3890 fax
rms@cs.cmu.edu
Co-Principal Investigator: Wayne Ward
School of Computer Science, Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, Pennsylvania 15213-3891
(412) 268-2597
(412) 683-5348 fax
whw@cs.cmu.edu

This page generated: Mon Jul 14 10:00:18 1997