A Class of Reward-Normal Formulae

The existing decision procedure [41] for determining whether a formula is reward-normal is guaranteed to terminate finitely, but involves the construction and comparison of automata and is rather intricate in practice. It is therefore useful to give a simple syntactic characterisation of a set of constructors for obtaining reward-normal formulae even though not all such formulae are so constructible. We say that a formula is material iff it contains no $\mbox{\$}$ and no temporal operators - that is, the material formulae are the boolean combinations of atoms. We consider four operations on behaviours representable by formulae of $FLTL. Firstly, a behaviour may be delayed for a specified number of timesteps. Secondly, it may be made conditional on a material trigger. Thirdly, it may be started repeatedly until a material termination condition is met. Fourthly, two behaviours may be combined to form their union. These operations are easily realised syntactically by corresponding operations on formulae. Where $m$ is any material formula:

\mbox{{\sf delay}}[f] & = & \raisebox{0.6mm}{$\scriptstyle \b...
...\sf U}}$}m \\
\mbox{{\sf union}}[f_1,f_2] & = & f_1 \wedge f_2

We have shown [41] that the set of reward-normal formulae is closed under delay, cond (for any material $m$), loop (for any material $m$) and union, and also that the closure of $\{\mbox{\$}\}$ under these operations represents a class of behaviours closed under intersection and concatenation as well as union. Many familiar reward-normal formulae are obtainable from $\mbox{\$}$ by applying the four operations. For example, $\mbox{$\Box$}(p \rightarrow \mbox{\$})$ is $\mbox{{\sf loop}}[\mbox{$\bot$},\mbox{{\sf cond}}[p,\mbox{\$}]]$. Sometimes a paraphrase is necessary. For example, $\mbox{$\Box$}((p \wedge
\raisebox{0.6mm}{$\scriptstyle \bigcirc$}q) \rightarrow \raisebox{0.6mm}{$\scriptstyle \bigcirc$}\mbox{\$})$ is not of the required form because of the $\raisebox{0.6mm}{$\scriptstyle \bigcirc$}$ in the antecedent of the conditional, but the equivalent $\mbox{$\Box$}(p \rightarrow \raisebox{0.6mm}{$\scriptstyle \bigcirc$}(q \rightarrow \mbox{\$}))$ is $\mbox{{\sf loop}}[\mbox{$\bot$},\mbox{{\sf cond}}[p,\mbox{{\sf
delay}}[\mbox{{\sf cond}}[q,\mbox{\$}]]]]$. Other cases are not so easy. An example is the formula $\neg p \makebox[1em]{$\mathbin{\mbox{\sf U}}$}(p
\wedge \mbox{\$})$ which stipulates a reward the first time $p$ happens and which is not at all of the form suggested. To capture the same behaviour using the above operations requires a formula like $(p \rightarrow \mbox{\$})
\wedge (\raisebox{0.6mm}{$\scriptstyle \bigcirc$}(p \rightarrow \mbox{\$}) \makebox[1em]{$\mathbin{\mbox{\sf U}}$}p)$.
Sylvie Thiebaux 2006-01-20