For the choice of the $\Psi$s, Bacchus et al. [2] consider two cases. In the simple case, which we call PLTLSIM, an MDP obeying properties (1) and (2) is produced by simply labelling each e-state with the set of all subformulae in $\overline{\mbox{Sub}(F)}$ which are true of the sequence leading to that e-state. This MDP is generated forward, starting from the initial e-state labelled with $s_0$ and with the set $\Psi_0 \subseteq \overline{\mbox{Sub}(F)}$ of all subformulae which are true of the sequence $\langle s_0 \rangle$. The successors of any e-state labelled by NMRDP state $s$ and subformula set $\Psi$ are generated as follows: each of them is labelled by a successor $s'$ of $s$ in the NMRDP and by the set of subformulae $\{\psi' \in
\overline{\mbox{Sub}(F)} \mid \Psi \models \mbox{Reg}(\psi',s')\}$. For instance, consider the NMRDP shown in Figure 3. The set $F = \{q \wedge \circleddash \circleddash p\}$ consists of a single reward formula. The set $\overline{\mbox{Sub}(F)}$ consists of all subformulae of this reward formula, and their negations, that is $\overline{\mbox{Sub}(F)}=
\{p,q,\circleddash p, \circleddash \circleddash p, q ...
...neg \circleddash \circleddash p, \neg (q \wedge \circleddash \circleddash
p)\}$. The equivalent MDP produced by PLTLSIM is shown in Figure 4.
Figure 3: Another Simple NMRDP
$\textstyle \parbox{0.5\textwidth}{\includegraphics[height=0.18\textheight]{figures/pq}}$$\textstyle \parbox{0.4\textwidth}{\footnotesize
In the initial state, both $p$\...
...til $p$\ gets produced, and
then to apply $a$\ or $b$\ indifferently forever. }$
Sylvie Thiebaux 2006-01-20