|  |
|
| |
|
|
| |
|
| - |
Bohus, D., Raux, A., Harris, T., Eskenazi, M., and Rudnicky, A. (2007) - Olympus: an open-source framework for conversational spoken language interface research, to appear in HLT-NAACL 2007 workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technology, Rochester, NY [abs]
|
|
| |
We introduce Olympus, a freely available framework for research in conversational interfaces. Olympus’ open, transparent, flexible, modular and scalable nature facilitates the development of large-scale, real-world systems, and enables research leading to technological and scientific advances in conversational spoken language interfaces. In this paper, we describe the overall architecture, several systems spanning different domains, and a number of current research efforts supported by Olympus.
|
|
|
| |
|
|
| |
|
| - |
Bohus, D., Grau, S., Huggins-Daines, D., Keri, V., Krishna, G., Kumar, R., Raux, A., and Tomko, S. (2007) - Conquest - an Open-Source Dialog System for Conferences, to appear in Proceedings of HLT-NAACL 2007, Rochester, NY [abs]
|
|
| |
We describe ConQuest, an open-source, reusable spoken dialog system that provides technical program information dur-ing conferences. The system uses a transparent, modular and open infrastructure, and aims to enable applied research in spoken language interfaces. The conference domain is a good platform for applied research since it permits periodical redeployments and evaluations with a real user-base. In this paper, we describe the system’s functionality, overall architecture, and we discuss two initial deployments.
|
|
|
| |
|
|
| |
|
| - |
Tetreault, J., and Bohus, D., (2007) - Estimating the Reliability of MDP Policies: a Confidence Interval Approach, to appear in HLT-NAACL 2007, Rochester, NY [abs]
|
|
| |
Data sparsity is one of the major issues that NLP researchers always wrestle with. That is, does one have enough data to make reliable conclusions in an experiment? Using Reinforcement Learning to improve a spoken dialogue system is
no exception. Past approaches in this area have simply assumed that there was enough collected data to derive reliable dialog control policies or used thousands of user simulations to overcome the sparsity issue. In this paper we present a methodology for numerically constructing confidence bounds on the expected reward for a constructed policy, and use these bounds to better estimate the reliability of that policy. We apply this methodology to a prior
experiment of using MDP's to predict the best features to include in a model of the dialogue state. Our results show that policies developed in the prior work were not as reliable as previously determined but the overall ranking of features remains the same.
|
|
|
| |
|
|
|  |
|
| |
|
|
| |
|
| - |
Bohus, D., Langner, B., Raux, A., Black, A., Eskenazi, M. and Rudnicky A. (2006) - Online Supervised Learning of Non-understanding Recovery Policies, in SLT-2006, Palm Beach, Aruba [abs]
|
|
| |
Spoken dialog systems typically use a limited number of nonunderstanding
recovery strategies and simple heuristic policies to
engage them (e.g. first ask user to repeat, then give help, then
transfer to an operator). We propose a supervised, online method
for learning a non-understanding recovery policy over a large set
of recovery strategies. The approach consists of two steps: first, we
construct runtime estimates for the likelihood of success of each
recovery strategy, and then we use these estimates to construct a
policy. An experiment with a publicly available spoken dialog
system shows that the learned policy produced a 12.5% relative
improvement in the non-understanding recovery rate.
|
|
|
| |
|
|
| |
|
| - |
Bohus, D., and Rudnicky, A. (2006) - A K Hypotheses + Other Belief Updating Model, in AAAI Workshop on Statistical and Empirical Approaches to Spoken Dialogue Systems, 2006, Boston, MA [abs]
|
|
| |
Spoken dialog systems typically rely on recognition confidence
scores to guard against potential misunderstandings.
While confidence scores can provide an initial assessment
for the reliability of the information obtained from the user,
ideally systems should leverage information that is available
in subsequent user responses to update and improve the accuracy
of their beliefs. We present a machine-learning
based solution for this problem. We use a compressed representation
of beliefs that tracks up to k hypotheses for each
concept at any given time. We train a generalized linear
model to perform the updates. Experimental results show
that the proposed approach significantly outperforms heuristic
rules used for this task in current systems. Furthermore, a
user study with a mixed-initiative spoken dialog system
shows that the approach leads to significant gains in task
success and in the efficiency of the interaction, across a
wide range of recognition error-rates.
|
|
|
| |
|
|
| |
|
| - |
Raux, A., Bohus, D., Langner, B., Black, A., and Eskenazi, M. (2006) - Doing Research in a Deployed Spoken Dialog System: One Year of Let's Go! Public Experience, in Interspeech-2006, Pittsburgh, PA [abs]
|
|
| |
This paper describes our work with Let’s Go, a telephone-based
bus schedule information system that has been in use by
the Pittsburgh population since March 2005. Results from
several studies show that while task success correlates
strongly with speech recognition accuracy, other aspects of
dialogue such as turn-taking, the set of error recovery strategies,
and the initiative style also significantly impact system
performance and user behavior.
|
|
|
| |
|
|
|  |
|
| |
|
|
| |
|
| - |
Bohus, D., and Rudnicky, A. (2005) -
Constructing Accurate Beliefs in Spoken Dialog Systems, in ASRU-2005, San Juan, Puerto Rico [abs] [poster]
|
|
| |
We propose a novel approach for constructing more accurate
beliefs over concept values in spoken dialog systems by
integrating information across multiple turns in the conversation.
In particular, we focus our attention on updating the confidence
score of the top hypothesis for a concept, in light of subsequent
user responses to system confirmation actions. Our data-driven
approach bridges previous work in confidence annotation and
correction detection, providing a unified framework for belief
updating. The approach significantly outperforms heuristic rules
currently used in most spoken dialog systems.
|
|
|
| |
|
|
| |
|
| - |
Bohus, D., and Rudnicky, A. (2005) - Error Handling in the RavenClaw dialog management architecture, in HLT-EMNLP-2005, Vancouver, CA [abs]
|
|
| |
We describe the error handling architecture
underlying the RavenClaw dialog
management framework. The architecture
provides a robust basis for current and future
research in error detection and recovery.
Several objectives were pursued in its
development: task-independence, ease-ofuse,
adaptability and scalability. We describe
the key aspects of architectural design
which confer these properties, and
discuss the deployment of this architecture
in a number of spoken dialog systems
spanning several domains and interaction
types. Finally, we outline current research
projects supported by this architecture.
|
|
|
| |
|
|
| |
|
| - |
Bohus, D., and Rudnicky, A. (2005) - Sorry, I Didn't Catch That! - An Investigation of Non-understanding Errors and Recovery Strategies, in SIGdial-2005, Lisbon, Portugal [abs] [sigdial book chapter]
|
|
| |
We present results from an extensive empirical analysis of non-understanding
errors and ten non-understanding recovery strategies, based on a corpus of
dialogs collected with a spoken dialog system that handles conference room
reservations. More specifically, the issues we investigate are: what are the
main sources of non-understanding errors? What is the impact of these errors on
global performance? How do various strategies for recovery from non-
understandings compare to each other? What are the relationships between these
strategies and subsequent user response types, and which response types are more
likely to lead to successful recovery? Can dialog performance be improved by
using a smarter policy for engaging the non-understanding recovery strategies?
If so, can we learn such a policy from data? Whenever available, we compare and
contrast our results with other studies in the literature. Finally, we summarize
the lessons learned and present our plans for future work inspired by this
analysis.
|
|
|
| |
|
|
| |
|
| - |
Bohus, D., and Rudnicky, A. (2005) - A Principled Approach for Rejection Threshold Optimization in Spoken Dialog Systems, in Interspeech-2005, Lisbon, Portugal [abs]
|
|
| |
A common design pattern in spoken dialog systems is to reject
an input when the recognition confidence score falls below a
preset rejection threshold. However, this introduces a
potentially non-optimal tradeoff between various types of
errors such as misunderstandings and false rejections. In this
paper, we propose a data-driven method for determining the
relative costs of these errors, and then use these costs to
optimize state-specific rejection thresholds. We illustrate the
use of this approach with data from a spoken dialog system
that handles conference room reservations. The results
obtained confirm our intuitions about the costs of the errors,
and are consistent with anecdotal evidence gathered throughout
the use of the system.
|
|
|
| |
|
|
| |
|
| - |
Raux, A., Langner, B., Bohus, D., Black, A., and Eskenazi, M. (2005) - Let's Go Public! Taking a Spoken Dialog System to the Real World, in Interspeech-2005, Lisbon, Portugal [abs]
|
|
| |
In this paper, we describe how a research spoken dialog system
was made available to the general public. The Let’s Go Public
spoken dialog system provides bus schedule information to the
Pittsburgh population during off-peak times. This paper describes
the changes necessary to make the system usable for the general
public and presents analysis of the calls and strategies we have
used to ensure high performance.
|
|
|
| |
|
|
|  |
|
| |
|
|
| |
|
| - |
Bohus, D. (2004) - Error Awareness and Recovery in Task-Oriented Spoken Dialogue Systems, Ph.D Thesis Proposal, Carnegie Mellon University, Pittsburgh, PA [abs] [slides]
|
|
| |
A persistent and important problem in spoken language interfaces is their lack of robustness when faced with understanding errors. The problem is present across all domains and interaction types, and stems primarily from the unreliability of the speech recognition process. I propose to alleviate this problem by (1) endowing spoken dialogue systems with better error awareness, (2) constructing a richer repertoire of error recovery strategies, and (3) developing a practical data-driven approach for making error handling decisions. The proposed work will address questions and make contributions in each of these three areas. For the first part, I propose to develop a belief updating mechanism that integrates confidence annotation and correction detection into a unified framework, and allows spoken dialogue systems to continuously track the reliability of the information they use. For the second part, I propose to implement and investigate an extended set of error recovery strategies addressing common problems in human-computer dialogue. Finally, I plan to bring these two capabilities together in a scalable reinforcement-learning based approach for making error handling decisions in task-oriented spoken dialogue systems.
|
|
|
| |
|
|
| |
|
| - |
Bohus, D., and Rudnicky, A. (2004) - Task-Independent Conversational Strategies in the RavenClaw Dialogue Management Framework, unpublished manuscript [abs]
|
|
| |
We present the implementation of task-independ¬ent conversational strategies in the RavenClaw dialogue management framework. The proposed approach decouples the implementation and the control of these strategies from the actual system task, and brings forth several advantages: it in-creases the consistency in the interaction style, while at the same time it lessens the development and testing efforts by allowing for the easy reuse of these strategies across different systems. We plan to illustrate the repertoire of task-independent con-versational strategies in the RavenClaw dialogue management framework by giving a live demon-stration of RoomLine, a spoken dialogue system for conference room reservation and scheduling.
|
|
|
| |
|
|
| |
|
| - |
Aist, G., Bohus, D., Boven, B., Campana, E., Early, S., Phan, S. (2004) - Initial Development of a Voice-Activated Astronaut Assistant for Procedural Tasks: From Need to Concept to Prototype, in Journal of Interactive Instruction Development, Volume 16, Nr. 3, Winter 2004, pp 32-36
|
|
| |
|
|
|  |
|
| |
|
|
| |
|
| - |
Bohus, D., and Rudnicky A. (2003) - RavenClaw: Dialog Management Using Hierarchical Task Decomposition and an Expectation Agenda, in Eurospeech-2003, Geneva, Switzerland [abs] [poster]
|
|
| |
We describe RavenClaw, a new dialog management framework developed as a successor to the Agenda architecture used in the CMU Communicator. RavenClaw introduces a clear separation between task and discourse behavior specification, and allows rapid development of dialog management components for spoken dialog systems operating in complex, goal-oriented domains. The system development effort is focused entirely on the specification of the dialog task, while a rich set of domain-independent conversational behaviors are transparently generated by the dialog engine. To date, RavenClaw has been applied to five different domains allowing us to draw some preliminary conclusions as to the generality of the approach. We briefly describe our experience in developing these systems.
|
|
|
| |
|
|
| |
|
| - |
Aist, G., Dowding, J., Hockey, B.A., Rayner, M., Hieronymus, J., Bohus, D., Boven, B., Blaylock, N., Campana, E., Early, S., Gorrell, G., and Phan, S. (2003) - Talking through procedures: An intelligent Space Station procedure assistant, in Demo Session at EACL-2003, Budapest, Hungary [abs]
|
|
| |
We present a prototype system aimed at
providing spoken dialogue support for
complex procedures aboard the International
Space Station. The system allows
navigation one line at a time or in larger
steps. Other user functions include issuing
spoken corrections, requesting images
and diagrams, recording voice notes and
spoken alarms, and controlling audio volume.
|
|
|
| |
|
|
|  |
|
| |
|
|
| |
|
| - |
Bohus, D., and Rudnicky A. (2002) - LARRI: A Language-Based Maintenance and Repair Assistant, in IDS-2002, Kloster Irsee, Germany [abs]
|
|
| |
LARRI (Language-based Agent for Retrieval of Repair Information) is a dialog-based system for support of maintenance and repair domains, characterized by large amounts of documentation and by procedural information. LARRI is based on an architecture developed by Carnegie Mellon University for the DARPA Communicator program and is integrated with a wearable computer system developed by the Wearable Computing group at Carnegie Mellon University.
LARRI adapts a dialog-management architecture developed and optimized for a telephone-based problem solving task (travel planning), and applies it to a very different domain -- aircraft maintenance. The system was taken on a field trial on two occasions where it was used by professional aircraft mechanics. We found that our architecture, AGENDA, extended readily to a multi-modal and multi-media framework. At the same time we found that assumptions that were reasonable in a services domain turn out to be inappropriate for a maintenance domain. Apart from the need to manage integration between input modes and output modalities, we found that the system needed to support multiple categories of tasks and that a different balance between user and system goals was required. A significant problem in the maintenance domain is the need to assimilate and make available for language processing appropriate domain information.
|
|
|
| |
|
|
| |
|
| - |
Bohus, D., and Rudnicky A. (2002) - Integrating Multiple Knowledge Sources for Utterance-Level Confidence Annotation in the CMU Communicator Spoken Dialog System, Technical Report CS-190, Carnegie Mellon University, Pittsburgh, PA [abs]
|
|
| |
In the recent years, automated speech recognition has been the main drive behind
the advent of spoken language interfaces, but at the same time a severe limiting
factor in the development of these systems. We believe that increased robustness
in the face of recognition errors can be achieved by making the systems aware of
their own misunderstandings, and employing appropriate recovery techniques when
breakdowns in interaction occur. In this paper we address the first problem: the
development of an utterance-level confidence annotator for a spoken dialog
system. After a brief introduction to the CMU Communicator spoken dialog system
(which provided the target platform for the developed annotator), we cast the
confidence annotation problem as a machine learning classification task, and
focus on selecting relevant features and on empirically identifying the best
classification techniques for this task. The results indicate that significant
reductions in classification error rate can be obtained using several different
classifiers. Furthermore, we propose a data driven approach to assessing the
impact of the errors committed by the confidence annotator on dialog
performance, with a view to optimally fine-tuning the annotator. Several models
were constructed, and the resulting error costs were in accordance with our
intuition. We found, surprisingly, that, at least for a mixed-initiative spoken
dialog system as the CMU Communicator, these errors trade-off equally over a
wide operating characteristic range.
|
|
|
| |
|
|
|  |
|
| |
|
|
| |
|
| - |
Bohus, D., and Rudnicky, A. (2001) - Modeling the Cost of Misunderstandings in the CMU Communicator Dialog System, in ASRU-2001, Madonna di Campiglio, Italy [abs] [slides] [poster]
|
|
| |
We describe a data-driven approach that allows us to quantify the costs of various types of errors made by the utterance-level confidence annotator in the Carnegie Mellon Communicator system. Knowing these costs we can determine the optimal tradeoff point between these errors, and tune the confidence annotator accordingly. We describe several models, based on concept transmission efficiency. The models fit our data quite well and the relative costs of errors are in accordance with our intuition. We also find, surprisingly, that for a mixed-initiative system such as the CMU Communicator, false positive and false negative errors trade-off equally over a wide operating range.
|
|
|
| |
|
|
| |
|
| - |
Carpenter P., Jin C., Wilson D., Zhang R., Bohus, D., and Rudnicky A. (2001) - Is This Conversation on Track?, in Eurospeech-2001, Aalborg, Denmark [abs] [slides]
|
|
| |
Confidence annotation allows a spoken dialog system to accurately assess the likelihood of misunderstanding at the utterance level and to avoid breakdowns in interaction. We describe experiments that assess the utility of features from the decoder, parser and dialog levels of processing. We also investigate the effectiveness of various classifiers, including Bayesian Networks, Neural Networks, SVMs, Decision Trees, AdaBoost and Naive Bayes, to combine this information into an utterance-level confidence metric. We found that a combination of a subset of the features considered produced promising results with several of the classification algorithms considered, e.g., our Bayesian Network classifier produced a 45.7% relative reduction in confidence assessment error and a 29.6% reduction relative to a handcrafted rule.
|
|
|
| |
|
|
|  |
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
| - |
Bohus, D., and Boldea, M. (2000) - A Web-based Text Corpora Development System, in LREC-2000, Athens, Greece [abs]
|
|
| |
One of the most important starting points for any NLP endeavor is the construction of text corpora of appropriate size and quality. This paper presents a web-based text corpora development system that focuses both on the size and the quality of these corpora. The quantitative problem is solved by using the Internet as a practically limitless resource of texts. To ensure a certain quality, we enrich the text with relevant information to be fit for further use by resolving in an integrated manner the problems of diacritic characters restoration, lexical ambiguity resolution and morphosyntactic annotation. Although at this moment it is targeted at texts in Romanian, a number of mechanisms have been provided that allows it to be easily adapted to other languages.
|
|
|
| |
|
|