Translation Into MDP
We now exploit the compact representation of a nonMarkovian reward
function as a reward function specification to translate an NMRDP into
an equivalent MDP amenable to statebased anytime solution methods.
Recall from Section 2 that each estate in the MDP
is labelled by a state of the NMRDP and by history information
sufficient to determine the immediate reward. In the case of a compact
representation as a reward function specification , this
additional information can be summarised by the progression of
through the sequence of states passed through. So an estate
will be of the form
, where is a
state, and
is a reward function
specification (obtained by progression). Two estates
and
are equal if , the
immediate rewards are the same, and the results of progressing
and through are semantically equivalent.^{8}
Definition 5
Let
be an NMRDP, and be a
reward function specification representing (i.e.,
, see
Definition 4).
We translate into the MDP
defined as follows:



 If
, then
If
, then
is undefined

 For all , is reachable under from .
Item 1 says that the estates are labelled by a state and a reward
function specification. Item 2 says that the initial estate is
labelled with the initial state and with the original reward function
specification. Item 3 says that an action is applicable in an estate
if it is applicable in the state labelling it. Item 4 explains how
successor estates and their probabilities are computed. Given an
action applicable in an estate
, each
successor estate will be labelled by a successor state of via
in the NMRDP and by the progression of through . The
probability of that estate is as in the NMRDP. Note
that the cost of computing is linear in that of computing
and in the sum of the lengths of the formulae in . Item 5 has
been motivated before (see Section 3.6). Finally, since
items 15 leave open the choice of many MDPs differing only in the
unreachable states they contain, item 6 excludes all such irrelevant
extensions.
It is easy to show that this translation leads to an equivalent MDP,
as defined in Definition 1. Obviously, the function
required for Definition1 is given by
, and then
the proof is a matter of checking conditions.
In our practical implementation, the labelling is one step ahead of
that in the definition: we label the initial estate with
and compute the current reward and the current
reward specification label by progression of predecessor reward
specifications through the current state rather than through the
predecessor states. As will be apparent below, this has the potential
to reduce the number of states in the generated MDP.
Figure 7 shows the equivalent MDP produced for the
$FLTL version of our NMRDP example in Figure 3.
Recall that for this example, the PLTL reward formula was
. In
$FLTL, the allocation of rewards is described by
. The figure also shows the
relevant formulae labelling the estates, obtained by
progression of this reward formula. Note that without progressing one step
ahead, there would be 3 estates with state on the
lefthand side, labelled with , , and
,
respectively.
Figure 7:
Equivalent MDP Produced by FLTL

Sylvie Thiebaux
20060120