When solving NMRDPs in this setting, the central issue is to define a
non-Markovian reward specification language and a translation into an
MDP *adapted* to the class of MDP solution methods and
representations we would like to use for the type of problems at hand.
More precisely, there is a tradeoff between the effort spent in the
translation, e.g. in producing a *small* equivalent MDP without
many irrelevant history distinctions, and the effort required to solve
it. Appropriate resolution of this tradeoff depends on the type of
representations and solution methods envisioned for the MDP. For
instance, *structured* representations and solution methods which
have some ability to ignore irrelevant information may cope with a
crude translation, while *state-based* (flat) representations and
methods will require a more sophisticated translation producing an MDP
as small as feasible.
Both the two previous proposals within this line of research rely on
past linear temporal logic (PLTL) formulae to specify the behaviours
to be rewarded [2,3]. A nice feature
of PLTL is that it yields a straightforward semantics of non-Markovian
rewards, and lends itself to a range of translations from the crudest
to the finest. The two proposals adopt very different translations
adapted to two very different types of solution methods and
representations. The first [2] targets classical
state-based solution methods such as policy iteration [32]
which generate *complete* policies at the cost of enumerating all
states in the entire MDP. Consequently, it adopts an expensive
translation which attempts to produce a *minimal* MDP. By
contrast, the second translation [3] is very
efficient but crude, and targets structured solution methods and
representations (see e.g., [29,12,21]), which do not
require explicit state enumeration.
Sylvie Thiebaux
2006-01-20