Principles Behind the Translations

All three translations into an MDP (PLTLSIM, PLTLMIN, and PLTLSTR) rely on the equivalence $\mbox{$f_1 \mathbin{\mbox{\sf S}}f_2$} \equiv
f_2 \vee (f_1 \wedge \circleddash (f_1 \mathbin{\mbox{\sf S}}f_2))$, with which we can decompose temporal modalities into a requirement about the last state $\Gamma_i$ of a sequence $\Gamma(i)$, and a requirement about the prefix $\Gamma(i-1)$ of the sequence. More precisely, given state $s$ and a formula $f$, one can compute in2 ${\cal O}(\vert\vert f\vert\vert)$ a new formula $\mbox{Reg}(f,s)$ called the regression of $f$ through $s$. Regression has the property that, for $i > 0$, $f$ is true of a finite sequence $\Gamma(i)$ ending with $\Gamma_i =s$ iff $\mbox{Reg}(f,s)$ is true of the prefix $\Gamma(i-1)$. That is, $\mbox{Reg}(f,s)$ represents what must have been true previously for $f$ to be true now. $\mbox{Reg}$ is defined as follows: For instance, take a state $s$ in which $p$ holds and $q$ does not, and take $f = (\circleddash \neg q) \wedge (p \mathbin{\mbox{\sf S}}q)$, meaning that $q$ must have been false 1 step ago, but that it must have held at some point in the past and that $p$ must have held since $q$ last did. $\mbox{Reg}(f,s) = \neg q \wedge (p \mathbin{\mbox{\sf S}}q)$, that is, for $f$ to hold now, then at the previous stage, $q$ had to be false and the $p \mathbin{\mbox{\sf S}}
q$ requirement still had to hold. When $p$ and $q$ are both false in $s$, then $\mbox{Reg}(f,s) = \mbox{$\bot$}$, indicating that $f$ cannot be satisfied, regardless of what came earlier in the sequence. For notational convenience, where $X$ is a set of formulae we write $\overline{X}$ for $X \cup \{\neg x \mid x \in X\}$. Now the translations exploit the PLTL representation of rewards as follows. Each expanded state (e-state) in the generated MDP can be seen as labelled with a set $\Psi \subseteq \overline{\mbox{Sub}(F)}$ of subformulae of the reward formulae in $F$ (and their negations). The subformulae in $\Psi$ must be (1) true of the paths leading to the e-state, and (2) sufficient to determine the current truth of all reward formulae in $F$, as this is needed to compute the current reward. Ideally the $\Psi$s should also be (3) small enough to enable just that, i.e. they should not contain subformulae which draw history distinctions that are irrelevant to determining the reward at one point or another. Note however that in the worst-case, the number of distinctions needed, even in the minimal equivalent MDP, may be exponential in $\vert\vert F\vert\vert$. This happens for instance with the formula $\circleddash ^k f$, which requires $k$ additional bits of information memorising the truth of $f$ over the last $k$ steps.
Sylvie Thiebaux 2006-01-20