As the name suggests, this domain is a noisy version of the famous Towers of Hanoi problem. The domain has two types of objects, disks and pegs. The problem that was used in the competition had five disks and three pegs. The actions of the domain are “single-move-big-not-moved”, “single-move-big-moved”, “double-move-big-not-moved” and “double-move-big-moved”. As the action names suggest, one can move either one or two disks at a time (single-move/double-move) and the outcome of the move is dependent on whether or not the biggest disk has been moved yet (big-not-moved/big-moved). The objective of the domain is to maximize the probability of reaching a goal configuration (no rewards).

The initial configuration defines the starting positions of the disks (as in Towers of Hanoi, the five disks are stacked on the first peg in bottom to top, largest to smallest order). The goal configuration defines the destination positions of the disks (again, the destination positions are the same as that of Towers of Hanoi, namely all five disks are stacked in the same order as the initial configuration but on the last peg). The goal of the problem is to move the disks from the starting configuration to the goal configuration. All actions in Towers of Hanoise have noisy outcomes. In particular, when executing an action it is possible to drop a disk and have it be lost forever, thus bringing execution to a dead end. The success probabilities are:

Action |
Success Probability |

“single-move-big-not-moved” | 0.99 |

“single-move-big-moved” | 0.95 |

“double-move-big-not-moved” | 0.80 |

“double-move-big-moved” | 0.90 |

Every sequence of actions has some success probability less than one in this
problem, so it is not possible to reach the goal with certainty. To
maximize the probability of reaching the goal, a careful comparison must be
made. To move the big disk from the first to last peg, it is
necessary to move the four smaller disks to the middle peg. This
subgoal can be achieved by executing “single-move-big-not-moved”
fifteen times on the
smaller disks, resulting in a success probability of
0.99^{15} ≈ 0.86. It can also be accomplished by moving the four smaller
disks as two units of two using “double-move-big-not-moved” three
times, resulting in a low
success probability of approximately 0.51.

Next, the big disk can be moved from the first to last peg with a success probability of 0.99 (“single-move-big-not-moved”). Then, the four smaller disks again need to be moved, this time from the middle peg to the last peg. Now that the big disk has been moved, the success probabilities change and the two strategies yield success probabilities of about 0.46 for “single-move-big-moved” and 0.73 for “double-move-big-moved”.

A planner that chooses optimally at each step would switch from single
moves to double moves after the big disk is in place resulting in a
total success probability of
0.99^{15} `x` 0.99 `x` 0.9^{3} ≈ 0.62. One
that ignores probabilities and always uses single moves has a lower
success probability of
0.99^{15} `x` 0.99 `x` 0.95^{15} ≈ 0.39. A planner that ignores probabilities and minimizes the number
of steps by always using double moves has a lower success probability
still of
0.8^{3} `x` 0.99 `x` 0.9^{3} ≈ 0.37. Thus, for
optimum performance, a planner must realize that its policy should
consider the success probabilities of actions and how they are influenced by the
status of the big disk.

Håkan L. S. Younes

2005-12-06