We are faced with three substantially different approaches that are
not easy to compare, as their performance will depend on domain
features as varied as the structure in the transition model, the type,
syntax, and length of the temporal reward formula, the presence of
rewards unreachable or irrelevant to the optimal policy, the
availability of good heuristics and control-knowledge, etc, and on the
interactions between these factors. In this section, we report an
experimental investigation into the influence of some of these factors
and try to answer the questions raised previously:10
In some cases but not all, we were able to identify systematic
patterns. The results in this section were obtained using a
Pentium4 2.6GHz GNU/Linux 2.4.20 machine with 500MB of ram.
- is the dynamics of the domain the predominant factor affecting
- is the type of reward a major factor?
- is the syntax used to describe rewards a major factor?
- is there an overall best method?
- is there an overall worst method?
- does the preprocessing phase of PLTLMIN pay, compared to PLTLSIM?
- does the simplicity of the FLTL translation compensate for
blind-minimality, or does the benefit of true minimality outweigh the
cost of PLTLMIN preprocessing?
- are the dynamic analyses of rewards in PLTLSTR and FLTL effective?
- is one of these analyses more powerful, or are they rather