The RavenClaw Dialog Management Architecture

overview . . . .

what is RavenClaw . what is Olympus . recent developments

Olympus

Olympus is a dialog system architecture, which has its roots in the earlier CMU Communicator architecture. It includes various components for speech recognition, language understanding, confidence annotation, language generation, speech synthesis, etc., which are connected together via the Galaxy message-passing communication infrastructure.

Olympus Architecture / Components

Olympus is a classical pipeline dialog system architecture (see the image below). The input from the user first passes through a speech recognizer, and then through a language understanding module. Based on the received semantic input, the dialog manager decides which action should be taken next. The dialog manager may also talk to one to several backend agents (e.g. database, etc). The output from the dialog manager is rendered as text by the language generation module, and then transformed into audio by a synthesis component.

The next image provides a more detailed view of the various system components.

The main components involved are:

Recognition	The recognition server obtains an audio stream through a soundcard and forwards the stream to multiple Sphinx-II recognition engines. Several decoder engines can be configured to work in parallel (for instance, we often use parallel decoders configured with male and female acoustic models; the DTMF recognition engine can also be connected to the recognition server). Each engine produces a result. The recognition server collects these results and forwards them to the language understanding component (Phoenix). Recently a SPHINX-III recognition engine has been developed. Currently we are not using it due to real-time constraints.
Language Understanding	Language understanding is performed via the Phoenix parser. Phoenix is a robust semantic parser based on manually constructed semantic grammars. Phoenix parses all the recognition results and forwards the parses to the next module, Helios.
Confidence Annotation	Helios is a confidence annotation module. It receives several alternate semantic hypotheses from either the Phoenix parser (or from other components such as a GUI in a multimodal system). Based on various features, Helios computes a confidence score for each semantic hypothesis. The highest scoring hypothesis is then forwarded to the dialog manager.
Dialog Management	The dialog manager is constructed using the RavenClaw dialog management framework. It receives semantic inputs and it sends out semantic outputs to the language generation module.
Domain Reasoning	The dialog manager may communicate with a number of back-end (e.g. domain reasoning) components, such as a database, etc.
Language Generation	The language generate module (Rosetta) takes semantic output from the dialog manager and generates the corresponding surface forms. The language generation module is based on templates.
Synthesis	Clients for several synthesis engines (e.g. Festival, Theta, Swift) are available and can be used in the Olympus dialog management framework.
Text I/O	For debugging purposes a text I/O component for the system is also provided.

These various components are connected via the Galaxy message-passing communication infrastructure. Galaxy uses a central hub and a set of rules for relaying messages from one component to the other.