We have shown how to construct an MDP from the PPDDL encoding of a
planning problem. The plan objective is to maximize the expected
reward for the MDP. This objective can be interpreted in different ways, for example as expected *discounted* reward or expected *total* reward. The suitable interpretation depends on the problem.
For process-oriented planning problems (for example, the “Coffee
Delivery” problem), discounted reward is typically desirable, while
total reward often is the interpretation chosen for goal-oriented problems
(for example, the “Bomb and Toilet” problem). PPDDL
does not include any facility for
enforcing a given interpretation or specifying a discount factor.

For the competition, we used expected total reward as the optimality criterion. Without discounting, some care is required in the design of planning problems to ensure that the expected total reward is bounded for the optimal policy. The following restrictions were made for problems used in the planning competition:

- Each problem had a goal statement, identifying a set of absorbing goal states.
- A positive reward was associated with transitioning into a goal state.
- A negative reward (cost) was associated with each action.
- A “done” action was available in all states, which could be used to end further accumulation of reward.

Håkan L. S. Younes

2005-12-06