next up previous
Next: Partially Observable Environments Up: Hierarchical Methods Previous: Compositional Q-learning

Hierarchical Distance to Goal

Especially if we consider reinforcement learning modules to be part of larger agent architectures, it is important to consider problems in which goals are dynamically input to the learner. Kaelbling's HDG algorithm [51] uses a hierarchical approach to solving problems when goals of achievement (the agent should get to a particular state as quickly as possible) are given to an agent dynamically.

The HDG algorithm works by analogy with navigation in a harbor. The environment is partitioned (a priori, but more recent work [5] addresses the case of learning the partition) into a set of regions whose centers are known as ``landmarks.'' If the agent is currently in the same region as the goal, then it uses low-level actions to move to the goal. If not, then high-level information is used to determine the next landmark on the shortest path from the agent's closest landmark to the goal's closest landmark. Then, the agent uses low-level information to aim toward that next landmark. If errors in action cause deviations in the path, there is no problem; the best aiming point is recomputed on every step.

Leslie Pack Kaelbling
Wed May 1 13:19:13 EDT 1996