**Daniel Nikovski and Illah Nourbakhsh**

Currently, there are no general methods for autonomous acquisition of discrete models of robot actions in continuous environments with hidden state. For some domains, such as navigation in office spaces, there exist methods for acquiring maps of the environment, but they are domain-specific and are not likely to be useful in other domains. On the other hand, the field of reinforcement learning has made substantial progress in learning general probabilistic models with hidden state, but research has focused exclusively on discrete worlds such as mazes. Extending the applications to continuous domains has proven very hard, even when the world is fully observable.

Since POMDP models are simply hidden Markov models (HMM) augmented with actions, it should be possible, at least in theory, to use general methods for learning probabilistic models with hidden state, such as the Baum-Welch algorithm. In practice, however, such algorithms often get stuck in shallow local maxima of data likelihood even when the observations are discrete, which leads to catastrophic performance when the models are used to plan actions [1]. Learning such models with continuous observations is even harder - these algorithms almost never converge to usable models unless their are initialized with emission models, which are pretty close to the final solutions.

In order to overcome these shortcomings of iterative optimization
algorithms such as Baum-Welch, we are considering a completely
different approach to learning probabilistic models with hidden state,
based on merging states. Proposed originally by Stolcke and Omohundro
for the purposes of learning HMMs for speech recognition, this
approach starts with a completely deterministic, albeit overfit model,
and gradually merges states so as to simplify the model and improve
its generalization abilities. Still, this algorithm proceeds greedily
and never reconsiders suboptimal state mergers, which sometimes
results in models of poor quality [2]. To solve this problem,
we developed a novel algorithm (SMTC), which considers all possible
mergers before carrying out any of them. The algorithm performs
clustering in the embedding space of trajectories leading into and out
of a particular state, and employes an efficient algoithm for
computing the similarities between pairs of trajectories. So far,
experimental results in small test worlds have shown superiority of
our algorithm over Baum-Welch and the greedy state-merrging algorithm
of Stolcke and Omohundro [3]. Table 1 shows results for three
learning methods and five planners: AP - assumptive planning; MLS -
most-likely state MDP-based; Voting - voting MDP-based; *Q*_{MDP} -
MDP-based proportional to Q-values. Shown are the average reward over
five trials, the associated standard deviation, and the statistical
*z* test for difference between the achieved reward and that of random
action selection. The last row shows the performance of the planners
when the true POMDP model is available to them.

**1**-
D. Nikovski.

Learning stationary temporal probabilistic networks.

In*Proceedings of the Conference on Automated Learning and Discovery CONALD'99*. Carnegie Mellon University, Pittsburgh, PA, 1998. **2**-
D. Nikovski and I. Nourbakhsh.

Learning discrete Bayesian models for autonomous agent navigation.

In*Proceedings of the 1999 IEEE International Symposium on Computational Intelligence in Robotics and Automation CIRA'99.*, pages 137-143. IEEE, Monterey, CA, 1999. **3**-
D. Nikovski and I. Nourbakhsh.

Learning probabilistic models for decision-theoretic navigation of mobile robots.

In*Proceedings of the Seventeenth International Conference in Machine Learning ICML'2000*. Stanford, 2000.

This document was generated using the
**LaTeX**2`HTML` translator Version 98.1p1 release (March 2nd, 1998)

Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:

**latex2html** `-debug -white -no_fork -no_navigation -address -split 0 -dir html -external_file daniel daniel.tex`.

The translation was initiated by Daniel Nikovski on 2000-05-23