### Equivalence

The clever algorithms developed to solve MDPs cannot be directly applied to NMRDPs. One way of dealing with this problem is to translate the NMRDP into an equivalent MDP with an expanded state space [2]. The expanded states in this MDP (e-states, for short) augment the states of the NMRDP by encoding additional information sufficient to make the reward history-independent. For instance, if we only want to reward the very first achievement of goal in an NMRDP, the states of an equivalent MDP would carry one extra bit of information recording whether has already been true. An e-state can be seen as labelled by a state of the NMRDP (via the function in Definition 1 below) and by history information. The dynamics of NMRDPs being Markovian, the actions and their probabilistic effects in the MDP are exactly those of the NMRDP. The following definition, adapted from that given by Bacchus et al. [2], makes this concept of equivalent MDP precise. Figure 2 gives an example.

Definition 1   MDP is equivalent to NMRDP if there exists a mapping such that:1

1. For all , .
2. For all , if there is such that , then for all such that , there exists a unique , , such that for all , .
3. For any feasible state sequence and such that for all , we have: for all .

Items 1-3 ensure that there is a bijection between feasible state sequences in the NMRDP and feasible e-state sequences in the MDP. Therefore, a stationary policy for the MDP can be reinterpreted as a non-stationary policy for the NMRDP. Furthermore, item 4 ensures that the two policies have identical values, and that consequently, solving an NMRDP optimally reduces to producing an equivalent MDP and solving it optimally [2]:

Proposition 1   Let be an NMRDP, an equivalent MDP for it, and a policy for . Let be the function defined on the sequence prefixes by , where for all . Then is a policy for such that .

Sylvie Thiebaux 2006-01-20