April 10, 2000
POMDPs are well-suited to a great number of real-world problems where decision-making is required despite prevalent uncertainty, and as such have gained much attention in recent years. Despite the representational power of POMDPs, their use is significantly limited by the great computational cost of finding an optimal policy for the learner. Consequently, research efforts have focused on the development of efficient algorithms to generate near-optimal policies. Current algorithms, however, are still generally unable to handle complex problems. We believe this can be overcome by exploiting the structure present in many real-world domains. This is in contrast to the conventional POMDP approach of treating the problem as a monolithic structure.
The domain currently targeted by our work is that of dialogue management for speech-based interactions between a robot and a human user. The role of our conversational interface is mostly one of information-provider, which has a structure dictated by the information content available in a given application. We believe this domain can be naturally decomposed into a hierarchy and our approach can be used to solve large POMDPs in this context. This would allow the development of a large-scale dialogue system that is more robust to errors in speech recognition and ambiguous statements than previous dialogue management systems.
A significant amount of work exists on exploiting hierarchy to reduce the complexity of planning tasks. Hierarchical decomposition of Markov Decision Processes (MDPs) has been the subject of significant research [7,2,3], however, there has been little work on extending this to POMDP problems. Preliminary attempts make strict assumptions about prior knowledge of sub-tasks and their ordering [1,8], which we do not require.
The hierarchical POMDP approach relies on a division of task which gives higher-level POMDPs direct control over more specialized POMDPs. At the high level, the POMDP policy consists of selecting appropriate sub-modules (i.e. hierarchical actions), while at the lower-level, specialized POMDPs have policies involving direct actions onto the problem domain (i.e. non-hierarchical actions). This division of tasks is not strict, and POMDPs of all levels can have both hierarchical and non-hierarchical actions, with the exception of lowest-level POMDPs which have only non-hierarchical actions. Belief states are maintained over all POMDPs of the hierachy at all times, and are updated after each action/observation pair. When a specific specialized POMDP is selected, it is identified as an ``active module", and is responsible for the learner's action selection at that given time step. Other non-active modules act as HMMs during that time step; their belief state is updated according to a default action, thereby ensuring that they capture the entire history and will produce the appropriate behaviour when later selected as an ``active module".
We have shown that in a domain such as a conversational speech interface, structure can indeed be exploited to obtain a policy much faster than with a conventional POMDP, and furthermore allowed us to build a larger POMDP-based dialogue manager than was possible with non-hierarchical POMDP representations. Such structure is not present in all domains, however is found in a great variety of applications, ranging from the dialogue task, to robot navigation tasks where the domain could be divided into a set of separate problems according to building topology (e.g. each room gets a separate POMDP). The speedup obtained by applying our hierarchical approach to POMDPs can be remarkable in domains with appropriate structure.
This document was generated using the LaTeX2HTML translator Version 98.1p1 release (March 2nd, 1998)
Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -debug -white -no_fork -no_navigation -address -split 0 -dir html -external_file jpineau jpineau.tex.
The translation was initiated by Daniel Nikovski on 2000-07-09