As the name suggests, this domain is a noisy version of the famous Towers of Hanoi problem. The domain has two types of objects, disks and pegs. The problem that was used in the competition had five disks and three pegs. The actions of the domain are “single-move-big-not-moved”, “single-move-big-moved”, “double-move-big-not-moved” and “double-move-big-moved”. As the action names suggest, one can move either one or two disks at a time (single-move/double-move) and the outcome of the move is dependent on whether or not the biggest disk has been moved yet (big-not-moved/big-moved). The objective of the domain is to maximize the probability of reaching a goal configuration (no rewards).
The initial configuration defines the starting positions of the disks (as in Towers of Hanoi, the five disks are stacked on the first peg in bottom to top, largest to smallest order). The goal configuration defines the destination positions of the disks (again, the destination positions are the same as that of Towers of Hanoi, namely all five disks are stacked in the same order as the initial configuration but on the last peg). The goal of the problem is to move the disks from the starting configuration to the goal configuration. All actions in Towers of Hanoise have noisy outcomes. In particular, when executing an action it is possible to drop a disk and have it be lost forever, thus bringing execution to a dead end. The success probabilities are:
Every sequence of actions has some success probability less than one in this problem, so it is not possible to reach the goal with certainty. To maximize the probability of reaching the goal, a careful comparison must be made. To move the big disk from the first to last peg, it is necessary to move the four smaller disks to the middle peg. This subgoal can be achieved by executing “single-move-big-not-moved” fifteen times on the smaller disks, resulting in a success probability of 0.9915 ≈ 0.86. It can also be accomplished by moving the four smaller disks as two units of two using “double-move-big-not-moved” three times, resulting in a low success probability of approximately 0.51.
Next, the big disk can be moved from the first to last peg with a success probability of 0.99 (“single-move-big-not-moved”). Then, the four smaller disks again need to be moved, this time from the middle peg to the last peg. Now that the big disk has been moved, the success probabilities change and the two strategies yield success probabilities of about 0.46 for “single-move-big-moved” and 0.73 for “double-move-big-moved”.
A planner that chooses optimally at each step would switch from single moves to double moves after the big disk is in place resulting in a total success probability of 0.9915 x 0.99 x 0.93 ≈ 0.62. One that ignores probabilities and always uses single moves has a lower success probability of 0.9915 x 0.99 x 0.9515 ≈ 0.39. A planner that ignores probabilities and minimizes the number of steps by always using double moves has a lower success probability still of 0.83 x 0.99 x 0.93 ≈ 0.37. Thus, for optimum performance, a planner must realize that its policy should consider the success probabilities of actions and how they are influenced by the status of the big disk.
Håkan L. S. Younes