Discussion
Existing approaches [2,3] advocate
the use of PLTL over a finite past to specify non-Markovian
rewards. In the PLTL style of specification, we describe the past
conditions under which we get rewarded now, while with $FLTL we
describe the conditions on the present and future under which future
states will be rewarded. While the behaviours and rewards may be the
same in each scheme, the naturalness of thinking in one style or the
other depends on the case. Letting the kids have a strawberry dessert
because they have been good all day fits naturally into a
past-oriented account of rewards, whereas promising that they may
watch a movie if they tidy their room (indeed, making sense of the
whole notion of promising) goes more naturally with $FLTL. One
advantage of the PLTL formulation is that it trivially enforces the
principle that present rewards do not depend on future states. In
$FLTL, this responsibility is placed on the domain modeller. The
best we can offer is an exception mechanism to recognise mistakes when
their effects appear, or syntactic restrictions. On the other hand,
the greater expressive power of $FLTL opens the possibility of
considering a richer class of decision processes, e.g. with
uncertainty as to which rewards are received (the dessert or the
movie) and when (some time next week, before it rains).
At any rate, we believe that $FLTL is better suited than PLTL to
solving NMRDPs using anytime state-based solution methods. While the
PLTLSIM translation could be easily embedded in such a solution
method, it loses the structure of the original formulae when
considering subformulae individually. Consequently, the expanded state
space easily becomes exponentially bigger than the blind-minimal one.
This is problematic with the solution methods we consider, because
size severely affects their performance in solution quality. The
pre-processing phase of PLTLMIN uses PLTL formula regression to find
sets of subformulae as potential labels for possible predecessor
states, so that the subsequent generation phase builds an MDP
representing all and only the histories which make a difference to the
way actually feasible execution sequences should be rewarded. Not only
does this recover the structure of the original formula, but in the best
case, the MDP produced is exponentially smaller than the blind-minimal
one. However, the prohibitive cost of the pre-processing phase makes
it unsuitable for anytime solution methods. We do not consider that
any method based on PLTL and regression will achieve a meaningful
relaxed notion of minimality without a costly pre-processing
phase. FLTL is an approach based on $FLTL and progression which
does precisely that, letting the solution method resolve the tradeoff
between quality and cost in a principled way intermediate between the
two extreme suggestions above.
The structured representation and solution methods targeted by Bacchus
et al. [3] differ from the anytime
state-based solution methods FLTL primarily aims at, in particular
in that they do not require explicit state enumeration at all. Here,
non-minimality is not as problematic as with the state-based
approaches. In virtue of the size of the MDP produced, the
PLTLSTR translation is, as PLTLSIM, clearly unsuitable to
anytime state-based methods.9 In another sense, too, FLTL represents a
middle way, combining the advantages conferred by state-based and
structured approaches, e.g. by PLTLMIN on one side, and PLTLSTR on the other. From the former FLTL inherits a meaningful notion of
minimality. As with the latter, approximate solution methods can be
used and can perform a restricted dynamic analysis of the reward
formulae. In particular, formula progression enables even state-based
methods to exploit some of the structure in `$FLTL space'. However,
the gap between blind and true minimality indicates that progression
alone is insufficient to always fully exploit that structure. There is
a hope that PLTLSTR is able to take advantage of the full
structure of the reward function, but also a possibility that it will
fail to exploit even as much structure as FLTL, as efficiently. An
empirical comparison of the three approaches is needed to answer this
question and identify the domain features favoring one over the other.
Sylvie Thiebaux
2006-01-20