Tireworld is another domain that tests the planners' ability to plan ahead under uncertainty. The domain consists of one type of object, namely locations. This domain comes in two flavors, a goal version and a reward version. The actions common to both versions are “move-car”, “load-tire” and “change-tire”. In the reward version, there is the additional action “call-AAA”.
Within the reward version, there is a cost of 1 every time one of the actions “move-car”, “load-tire” or “change-tire” is executed and a cost of 100 every time the action “call-AAA” is executed. The initial configuration of the problem defines a set of locations, a superimposed graph on these locations (roads), a subset of all locations representing the locations with spare tires, and the starting location on the graph. The goal configuration defines a destination location on the graph. The goal of the problem is to move from the starting location to the goal location.
The noise in Tireworld comes from the action “move-car”. Each time this action is executed, the car drives from one city to another and will get a flat tire with probability 0.15. Once a car has a flat tire, it cannot execute the action “move-car” again until the tire is fixed. The car has the ability to store at most one spare tire, which it can pick up by executing the action “load-tire” when it is in a location with a spare tire. If the car is holding a spare tire, the “change-tire” action can be invoked to fix the flat. However, if the car does not currently have a spare then this action is disabled. In the goal version, a flat tire may result in a dead end if a car gets a flat and carries no spare tire. In the reward version, the planner has the choice of executing one of the actions “change-tire” (if the car has a spare) or “call-AAA” (at a high cost) to repair the flat. Thus, in the reward version, there are no dead ends and the goal is always reachable. Notice that since the cost of “call-AAA” is large compared to the costs of “change-tire” and “load-tire”, fixing a flat is always less expensive if the car has a spare tire.
Figure 6 illustrates the Tireworld problem used in the competition. We next compare the probability of reaching a goal state for two different plans for this problem to illustrate what an ideal plan looks like in this domain.
An optimal plan would look ahead and attempt to keep spare tires as accessible as possible to avoid dead ends. From the start state, the car must make three steps without a flat tire to reach the first spare at cc, which will occur with probability 0.853 ≈ 0.61. Now, the car needs to go four steps without getting two flats to make it to the next spare at d5. It gets zero flats with probability 0.854 ≈ 0.52 and one flat with probability 4 x 0.853 x 0.15 ≈ 0.37, so a four-step segment can be traversed with probability 0.52 + 0.37 = 0.89 with one spare tire. There are three four-step segments that must be traversed successfully to reach ck. Finally, with a spare, the last two steps can be traveled with certainty. Thus, the total success probability of this event sequence is 0.61 x 0.893 ≈ 0.43. Note that this estimate is a lower bound on the success probability of the optimal strategy, because it does not factor in the probability of getting a flat tire upon arrival to a state with a spare tire. Furthermore, if the car is in location cf or ch with a spare and no flat, it is unnecessary to traverse the loop to pick up the spare tire in location d5 or cm. By accounting for these factors we get a success probability of just over 0.57.
In contrast, a greedy replanning algorithm would not gather spares, since their utility comes from the realization that something might go wrong. For such a planner, the best plan is to go directly from c0 to c9 on the shortest (9-step) route. Its success probability is 0.859 ≈ 0.23, which is just 40 percent of the best success probability computed above.
In the reward version of the planning problem, the optimal success probability is one because the “call-AAA” action is always available. However, the cost of this action equals the reward for reaching the goal, so it is always better to end execution with the “done” action than to repair a flat tire with the “call-AAA” action. Hence, the best strategy for the goal version is optimal for the reward version as well and gives a reward of just under 45. The greedy strategy outlined above would result in an expected reward of just over 22. If the “call-AAA” action is used to fix flat tires, then the expected reward drops to −29.
Håkan L. S. Younes