PITTSBURGH—More than 1,000 scientists from around the world will explore the myriad ways in which people and computers use and understand the spoken word when they meet here for Interspeech 2006, the Ninth International Conference on Spoken Language Processing.
The conference, one of two major scientific meetings on speech processing held each year, will take place Sept. 17-21 at the Westin Convention Center. Sponsored by the International Speech Communication Association, Interspeech 2006 will be hosted by Carnegie Mellon University's Language Technologies Institute and its Department of Electrical and Computer Engineering, along with the University of Pittsburgh.
Pittsburgh is an apt venue for such a meeting, as the city has emerged as a center for both developing and commercializing speech technology. Speech scientists have always studied how people learn and use the spoken word, but increasingly speech science includes the study of how computers learn to decipher and generate speech. New technologies have emerged, such as automated phone centers, automated transcription, and computers and robots that respond to a user's voice instead of just keystrokes on a keyboard. Locally, academic speech and language researchers have spun off such firms as Carnegie Speech, Cepstral, Multimodal Technologies, Akustica and Vivisimo.
"Meetings such as Interspeech are important because they foster closer ties between academia and industry," said Jaime Carbonell, director of the Language Technologies Institute in Carnegie Mellon's School of Computer Science.
"We've been a key player in language technologies since the mid-1970s," Carbonell said. Two early programs developed at Carnegie Mellon — Harpy and Hearsay — were precursors of modern speech recognition systems, and Sphinx, developed in the 1980s, was the first large-vocabulary, speaker-independent, speech recognition system. In the 1990s, researchers here developed the first speech-to-speech translation system, Janus, and one of the first Internet search engines, Lycos.
Since then, researchers in the Language Technologies Institute have made breakthroughs in text-to-speech programs, educational software for teaching reading and English as a second language, and handheld devices that translate speech between Arabic and English. The latest advance involves devices that can detect speech by using electrodes to monitor facial muscle movements.
Interspeech 2006 attendees will be able to obtain conference information through an automated, telephone-based system called ConQuest developed by Carnegie Mellon's speech group.
"One of the major challenges is to make speech technologies more robust," Carbonell said. "Today, speech recognition systems will get completely confused by overlapping conversations at a cocktail party, or if a train rumbles by, or if a speaker has a cold or becomes excited."
Solving this problem is the goal of the first Speech Separation Challenge, an international competition presenting its findings at Interspeech 2006. Eight groups from the United States, United Kingdom, Canada and Finland participated in the challenge, attempting to develop computer methods to recognize speech from one of two people speaking at the same time. Success could result in more useful devices for the hearing impaired, among other things.
Featured speakers include Raj Reddy, the Mozah Bint Nasser University Professor of Computer Science and Robotics at Carnegie Mellon, who will address how to extend the benefits of speech technologies to underdeveloped nations. Other speakers are John J. Ohala, professor emeritus of linguistics at the University of California, Berkeley; Michael Phillips, a speech technologist who heads Mobeus Corp.; and Elissa L. Newport, chair of brain and cognitive sciences at the University of Rochester.
Pittsburgh researchers and companies will demonstrate their devices during a reception at 5:30 p.m., Sept. 17 at the Westin. Among the technologies will be Let's Go, the automated, phone-based system for Port Authority transit information; SmartNotes, a multimedia system for taking meeting notes; ITSPOKE, a computerized tutoring system that engages students in spoken dialogue; Healthline, an information system for semi-literate community health workers in developing countries; and a PDA that performs English-Thai speech-to-speech translation for medical terms. Speech technology firms Carnegie Speech, Cepstral, Acusis, Multimodal, Akustica, Lucas Systems and Vocollect will also demonstrate their products.
Richard M. Stern, professor of electrical and computer engineering and computer science at Carnegie Mellon, is the general chair for Interspeech 2006. Associate Research Professor Alan W. Black and Research Assistant Professor Tanja Schultz, both of the Language Technologies Institute, are general co-chairs. For more information, visit www.interspeech2006.org.