How May I Help You?":
Automated Customer Service via Natural Spoken Dialog

Authors: Alicia Abella, Allen Gorin, Giuseppe Riccardi, Jeremey Wright, Tirso Alonso
Organization: AT&T Shannon Laboratories, 180 Park Ave. Florham Park, New Jersey 07932

The next generation of voice-based user interfaces will enable easy-to-use automation of new and existing communication services. A critical issue is to move away from highly-structured menus to a more natural human-machine paradigm.

In this tutorial we will cover the large vocabulary speech recognition, language modeling, spoken language understanding, dialog manager and logging functionalities of our system. We will show how finite state representation and stochastic modeling provide rich tools to model different language models: n-grams, word phrases, word classes, phrase grammars. We will also present our latest results on automatically learned head-dependency grammars and speech disfluency-based language models.

The Spoken Language Understanding (SLU) is based on salient grammar fragments, acquired automatically from a corpus of transcribed and labelled training utterances. Each grammar fragment represents a cluster of similarly-meaningful phrases, represented as a finite state machine. Matches of these to the recognizer output for a test utterance are grouped in semantically coherent ways, and the best interpretation of the utterance is found, taking account of dialog context.

Based on the output of the SLU the dialog manager needs to determine whether to ask the customer a question, create a database query, transfer a call, etc. The dialog manager is flexible enough to be utilized in a wide variety of applications. The dialog manager is built from general dialog principles that are captured quantitatively using a Construct Algebra and a task representation that not only structures the task knowledge but also influences the behavior of the dialog manager and utilizes the object-oriented paradigm.

The HMIHY platform has an extensive array of instrumentation built-in to track its internal operation. The collected information is logged in files for later analysis. The analysis tool used on these log files is object oriented, modular, reusable, and extensible. The collected data includes such things as prompts selected by the dialog manager to play, audio fed to the ASR engine, etc.

Each of the aforementioned components was initially integrated into a prototype in 1997 that automated over 10,000 customer requests for operator services. This year a wizard-of-oz version of the system for customer care conducted more than 25,000 dialogs. Based on this data collection, a fully autonomous system has been deployed in the AT&T network to handle customer care requests.