Title: How
May I Help You?": Automated Customer Service via Natural Spoken Dialog
Authors:
Alicia Abella, Allen Gorin, Giuseppe Riccardi, Jeremey Wright, Tirso Alonso
Organization:
AT&T Shannon Laboratories, 180 Park Ave. Florham Park, New Jersey
The next generation of
voice-based user interfaces will enable easy-to-use automation of new and
existing communication services. A critical issue is to move away from
highly-structured menus to a more natural human-machine paradigm. In this tutorial we will cover the large
vocabulary speech recognition, language modeling, spoken language
understanding, dialog manager and logging functionalities of our system. We
will show how finite state representation and stochastic modeling provide rich
tools to model different language models: n-grams, word phrases, word classes,
phrase grammars. We will also present our latest results on automatically
learned head-dependency grammars and speech disfluency-based language models. The
Spoken Language Understanding (SLU) is based on salient grammar fragments,
acquired automatically from a corpus of transcribed and labelled training
utterances. Each grammar fragment
represents a cluster of similarly-meaningful phrases, represented as a finite
state machine. Matches of these to the
recognizer output for a test utterance are grouped in semantically coherent
ways, and the best interpretation of the utterance is found, taking account of
dialog context. Based on the output of
the SLU the dialog manager needs to determine whether to ask the customer a
question, create a database query, transfer a call, etc. The dialog manager is
flexible enough to be utilized in a wide variety of applications. The dialog
manager is built from general dialog principles that are captured
quantitatively using a Construct Algebra and a task representation that not
only structures the task knowledge but also influences the behavior of the dialog
manager and utilizes the object-oriented paradigm. The HMIHY platform has an extensive array of instrumentation
built-in to track its internal operation. The collected information is logged
in files for later analysis. The analysis tool used on these log files is object
oriented, modular, reusable, and extensible. The collected data includes such
things as prompts selected by the dialog manager to play, audio fed to the ASR
engine, etc. Each of the aforementioned
components was initially integrated into a prototype in 1997 that automated
over 10,000 customer requests for operator services. This year a wizard-of-oz
version of the system for customer care conducted more than 25,000 dialogs.
Based on this data collection, a fully autonomous system has been deployed in
the AT&T network to handle customer care requests.