Proofs of Theorems

Property 1 Where $b \Leftrightarrow (\Gamma(i) \in B)$, $(\Gamma,i)\mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$ iff $(\Gamma,i+1)
\mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} \mbox{Prog}(b, \Gamma_i,f)$.

Proof: Induction on the structure of $f$. There are several base cases, all fairly trivial. If $f=\mbox{$\top$}$ or $f=\mbox{$\bot$}$ there is nothing to prove, as these progress to themselves and hold everywhere and nowhere respectively. If $f = p$ then if $f$ holds in $\Gamma_i$ then it progresses to $\mbox{$\top$}$ which holds in $\Gamma_{i+1}$ while if $f$ does not hold in $\Gamma_i$ then it progresses to $\mbox{$\bot$}$ which does not hold in $\Gamma_{i+1}$. The case $f = \neg p$ is similar. In the last base case, $f = \mbox{\$}$. Then the following are equivalent:
$(\Gamma,i)\mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$
$\Gamma(i)\in B$
$\mbox{Prog}(b,\Gamma_i,f) = \mbox{$\top$}$
\mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} \mbox{Prog}(b, \Gamma_i,f)$

Induction case 1: $f = g \wedge h$. The following are equivalent:
$(\Gamma,i)\mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$
$(\Gamma,i) \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} g$ and $(\Gamma,i) \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} h$
$(\Gamma,i+1) \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} \mbox{Prog}(b, \Gamma_i,g)$ and $(\Gamma,i+1) \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} \mbox{Prog}(b, \Gamma_i,h)$ (by induction hypothesis)
$(\Gamma,i+1) \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} \mbox{Prog}(b, \Gamma_i,g) \wedge
\mbox{Prog}(b, \Gamma_i,h)$
\mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} \mbox{Prog}(b, \Gamma_i,f)$

Induction case 2: $f = g \vee h$. Analogous to case 1.

Induction case 3: $f = \raisebox{0.6mm}{$\scriptstyle \bigcirc$}g$. Trivial by inspection of the definitions.

Induction case 4: $f = g \makebox[1em]{$\mathbin{\mbox{\sf U}}$}h$. Then $f$ is logically equivalent to $h \vee (g \wedge \raisebox{0.6mm}{$\scriptstyle \bigcirc$}(g \makebox[1em]{$\mathbin{\mbox{\sf U}}$}h)$ which by cases 1, 2 and 3 holds at stage $i$ of $\Gamma$ for behaviour $B$ iff $\mbox{Prog}(b, \Gamma_i, f)$ holds at stage i+1. $\Box$

Theorem 1 Let $f$ be reward-normal, and let $\langle f_0, f_1, \ldots \rangle$ be the result of progressing it through the successive states of a sequence $\Gamma$. Then, provided no $f_i$ is $\mbox{$\bot$}$, for all $i$ $\mbox{\em Rew}(\Gamma_i,f_i)$ iff $\Gamma(i)\in B_{f}$.

Proof: First, by the definition of reward-normality, if $f$ is reward-normal then $\Gamma
\mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$ iff for all $i$, if $\Gamma(i) \in B_f$ then $\Gamma(i)\in B$. Next, if $\Gamma
\mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$ then progressing $f$ through $\Gamma$ according to $B$ (that is, letting each $b_i$ be true iff $\Gamma(i)\in B$) cannot lead to a contradiction because by Property 1, progression is truth-preserving. It remains, then, to show that if $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$ then progressing $f$ through $\Gamma$ according to $B$ must lead eventually to $\mbox{$\bot$}$. The proof of this is by induction on the structure of $f$ and as usual the base case in which $f$ is a literal (an atom, a negated atom or $\mbox{$\top$}$, $\mbox{$\bot$}$ or $\mbox{\$}$) is trivial. Case $f = g \wedge h$. Suppose $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$. Then either $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} g$ or $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} h$, so by the induction hypothesis either $g$ or $h$ progresses eventually to $\mbox{$\bot$}$, and hence so does their conjunction. Case $f = g \vee h$. Suppose $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$. Then both $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} g$ and $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} h$, so by the induction hypothesis each of $g$ and $h$ progresses eventually to $\mbox{$\bot$}$. Suppose without loss of generality that $g$ does not progress to $\mbox{$\bot$}$ before $h$ does. Then at some point $g$ has progressed to some formula $g'$ and $f$ has progressed to $g' \vee
\mbox{$\bot$}$ which simplifies to $g'$. Since $g'$ also progresses to $\mbox{$\bot$}$ eventually, so does $f$. Case $f = \raisebox{0.6mm}{$\scriptstyle \bigcirc$}g$. Suppose $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$. Let $\Gamma
= \Gamma_0;\Delta$ and let $B' = \{\gamma\vert \Gamma_0;\gamma \in
B\}$. Then $\Delta \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B'$}}\:$} g$, so by the induction hypothesis $g$ progressed through $\Delta$ according to $B'$ eventually reaches $\mbox{$\bot$}$. But The progression of $f$ through $\Gamma$ according to $B$ is exactly the same after the first step, so that too leads to $\mbox{$\bot$}$. Case $f = g \makebox[1em]{$\mathbin{\mbox{\sf U}}$}h$. Suppose $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$. Then there is some $j$ such that $(\Gamma,j) \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} g$ and for all $i \leq j$, $(\Gamma,i) \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} h$. We proceed by induction on $j$. In the base case $j=0$, and both $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} g$ and $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} h$ whence by the main induction hypothesis both $g$ and $h$ will eventually progress to $\mbox{$\bot$}$. Thus $h \vee (g \wedge f')$ progresses eventually to $\mbox{$\bot$}$ for any $f'$, and in particular for $f' = \raisebox{0.6mm}{$\scriptstyle \bigcirc$}f$, establishing the base case. For the induction case, suppose $\Gamma \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} g$ (and of course $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} h$). Since $f$ is equivalent to $h \vee (g \wedge \raisebox{0.6mm}{$\scriptstyle \bigcirc$}f)$ and $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} f$, $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} h$ and $\Gamma \mbox{$\,\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} g$, clearly $\Gamma \mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B$}}\:$} \raisebox{0.6mm}{$\scriptstyle \bigcirc$}f$. Where $\Delta$ and $B'$ are as in the previous case, therefore, $\Delta
\mbox{$\,\not\models_{\!\!\!\raisebox{-0.7ex}{\scriptsize$B'$}}\:$} f$ and the failure occurs at stage $j-1$ of $\Delta$. Therefore the hypothesis of the induction on $j$ applies, and $f$ progressed through $\Delta$ according to $B'$ goes eventually to $\mbox{$\bot$}$, and so $f$ progressed through $\Gamma$ according to $B$ goes similarly to $\mbox{$\bot$}$. $\Box$

Theorem 3 Let $S'$ be the set of e-states in an equivalent MDP $D'$ for $D=\langle S,s_0,A,\Pr ,R\rangle$. $D'$ is minimal iff every e-state in $S'$ is reachable and $S'$ contains no two distinct e-states $s'_1$ and $s'_2$ with $\tau(s'_1) = \tau(s'_2)$ and $\mu(s'_1) = \mu(s'_2)$.

Proof: Proof is by construction of the canonical equivalent MDP $D^c$. Let the set of finite prefixes of state sequences in $\widetilde{D}(s_0)$ be partitioned into equivalence classes, where ${\Gamma}1(i) \equiv {\Gamma}2(j)$ iff ${\Gamma}1_i = {\Gamma}2_j$ and for all $\Delta
\in S^*$ such that ${\Gamma}1(i);\Delta \in
\widetilde{D}(s_0)$, $R({\Gamma}1(i);\Delta) =
R({\Gamma}2(j);\Delta)$. Let $[\Gamma(i)]$ denote the equivalence class of $\Gamma(i)$. Let ${\cal E}$ be the set of these equivalence classes. Let ${\cal A}$ be the function that takes each $[\Gamma(i)]$ in ${\cal E}$ to $A(\Gamma_i)$. For each $\Gamma(i)$ and $\Delta(j)$ and for each $a \in {\cal A}([\Gamma(i)])$, let ${\cal T}( [\Gamma(i)],a,[\Delta(j)] )$ be $\Pr(\Gamma_i,a,s)$ if $[\Delta(j)] = [\Gamma(i);\langle s \rangle]$. Otherwise let ${\cal
T}( [\Gamma(i)],a,[\Delta(j)] ) = 0$. Let ${\cal R}([\Gamma(i)])$ be $R(\Gamma(i))$. Then note the following four facts:

  1. Each of the functions $A$, $T$ and $R$ is well-defined.
  2. $D^c = \langle {\cal E}, [\langle s_0 \rangle], {\cal A}, {\cal
T}, {\cal R} \rangle$ is an equivalent MDP for $D$ with $\tau([\Gamma(i)])=\Gamma_i$.
  3. For any equivalent MDP $D''$ of $D$ there is a mapping from a subset of the states of $D''$ onto $\cal E$.
  4. $D'$ satisfies the condition that every e-state in $S'$ is reachable and $S'$ contains no two distinct e-states $s'_1$ and $s'_2$ with $\tau(s'_1) = \tau(s'_2)$ and $\mu(s'_1) = \mu(s'_2)$ iff $D^c$ is isomorphic to $D'$.
What fact 1 above amounts to is that if $\Gamma{1}(i) \equiv
\Gamma{2}(j)$ then it does not matter which of the two sequences is used to define ${\cal A}$, ${\cal T}$ and ${\cal R}$ of their equivalence class. In the cases of ${\cal A}$ and ${\cal T}$ this is simply that $\Gamma{1}_i = \Gamma{2}_j$. In the case of ${\cal R}$, it is the special case $\Delta = \langle \Gamma{1}_i \rangle$ of the equality of rewards over extensions. Fact 2 is a matter of checking that the four conditions of Definition 1 hold. Of these, conditions 1 ( $\tau([s_0]) =
s_{0}$) and 2 ( ${\cal A}([\Gamma(i)])= A(\Gamma_i)$) hold trivially by the construction. Condition 4 says that for any feasible state sequence $\Gamma \in \widetilde{D}(s_0)$, we have ${\cal
R}([\Gamma(i)]) = R(\Gamma(i))$ for all $i$. This also is given in the construction. Condition 3 states:
For all $s_1,s_2\in S$, if there is $a\in A(s_1)$ such that $\Pr(s_1,a,s_2) > 0$, then for all $\Gamma(i) \in
\widetilde{D}(s_0)$ such that $\Gamma_i = s_1$, there exists a unique $[\Delta(j)] \in {\cal E}$, $\Delta_j = s_2$, such that for all $a \in {\cal A}([\Gamma(i)])$, ${\cal
T}([\Gamma(i)],a,[\Delta[j]]) = \Pr(s_1,a,s_2)$.
Suppose $\Pr(s_1,\alpha,s_2) > 0$, $\Gamma(i) \in
\widetilde{D}(s_0)$ and $\Gamma_i = s_1$. Then the required $\Delta(j)$ is $\Gamma(i);\langle s_2 \rangle$, and of course ${\cal A}([\Gamma(i)])= A(\Gamma_i)$, so the required condition reads:
$[\Gamma(i);\langle s_2 \rangle]$ is the unique element $X$ of ${\cal E}$ with $\tau(X) = s_2$ such that for all $a \in
A(\Gamma_i)$, ${\cal T}( [\Gamma(i)],a,X ) = \Pr(s_1,a,s_2)$.
To establish existence, we need that if $a \in
A(\Gamma_i)$ then ${\cal T} ( [ \Gamma(i) ],a, [\Gamma(i);\langle s_2 \rangle] ) =
\Pr(\Gamma_i,a,s_2)$, which is immediate from the definition of $\cal T$ above. To establish uniqueness, suppose that $\tau(X) = s_2$ and ${\cal T}( [\Gamma(i)],a,X ) = \Pr(s_1,a,s_2)$ for all actions $a \in
A(\Gamma_i)$. Since $\Pr(s_1,\alpha,s_2) > 0$, the transition probability from $[\Gamma(i)]$ to $X$ is nonzero for some action, so by the definition of $\cal T$, $X$ can only be $[\Gamma(i);\langle s_2 \rangle]$. Fact 3 is readily observed. Let $M$ be any equivalent MDP for $D$. For any states $s_1$ and $s_2$ of $D$, and any state $X$ of $M$ such that $\tau(X) = s_1$ there is at most one state $Y$ of $M$ with $\tau(Y) =
s_2$ such that some action $a\in A(s_1)$ gives a nonzero probability of transition from $X$ to $Y$. This follows from the uniqueness part of condition 3 of Definition 1 together with the fact that the transition function is a probability distribution (sums to 1). Therefore for any given finite state sequence $\Gamma(i)$ there is at most one state of $M$ reached from the start state of $M$ by following $\Gamma(i)$. Therefore $M$ induces an equivalence relation $\approx_M$ on $S^*$: $\Gamma(i) \approx_M \Delta(j)$ iff they lead to the same state of $M$ (the sequences which are not feasible in $M$ may all be regarded as equivalent under $\approx_M$). Each reachable state of $M$ has associated with it a nonempty equivalence class of finite sequences of states of $D$. Working through the definitions, we may observe that $\approx_M$ is a sub-relation of $\equiv$ (if $\Gamma(i) \approx_M \Delta(j)$ then $\Gamma(i) \equiv \Delta(j)$). Hence the function that takes the equivalence class under $\approx_M$ of each feasible sequence $\Gamma(i)$ to $[\Gamma(i)]$ induces a mapping $h$ (an epimorphism in fact) from the reachable subset of states of $M$ onto $\cal E$. To establish Fact 4, it must be shown that in the case of $D'$ the mapping can be reversed, or that each equivalence class $[\Gamma(i)]$ in $D^c$ corresponds to exactly one element of $D'$. Suppose not (for contradiction). Then there exist sequences $\Gamma{1}(i)$ and $\Gamma{2}(j)$ in $\widetilde{D}(s_0)$ such that $\Gamma{1}(i) \equiv
\Gamma{2}(j)$ but on following the two sequences from $s_0'$ we arrive at two different elements $s'_1$ and $s'_2$ of $D'$ with $\tau(s'_1) =
\Gamma{1}_i = \Gamma{2}_j = \tau(s'_2)$ but with $\mu(s'_1) \neq
\mu(s'_2)$. Therefore there exists a sequence $\Delta(k) \in
\widetilde{D}(s)$ such that $R(\Gamma{1}(i-1);\Delta(k)) \neq
R(\Gamma{2}(j-1);\Delta(k))$. But this contradicts the condition for $\Gamma{1}(i) \equiv
\Gamma{2}(j)$. $\Box$
Theorem 3 follows immediately from facts 1-4.

Theorem 4 Let $D'$ be the translation of $D$ as in Definition 5. $D'$ is a blind minimal equivalent MDP for $D$.

Proof: Reachability of all the e-states is obvious, as they are constructed only when reached. Each e-state is a pair $\langle s, \phi\rangle$ where $s$ is a state of $D$ and $\phi$ is a reward function specification. In fact, $s = \tau(\langle s, \phi \rangle)$ and $\phi$ determines a distribution of rewards over all continuations of the sequences that reach $\langle s, \phi\rangle$. That is, for all $\Delta$ in $S^*$ such that $\Delta_0 = s$, the reward for $\Delta$ is $\sum_{(f:r) \in \phi} \{r\mid \Delta \in
B_{f}\}$. If $D'$ is not blind minimal, then there exist distinct e-states $\langle s, \phi\rangle$ and $\langle s, \phi' \rangle$ for which this sum is the same for all $\Delta$. But this makes $\phi$ and $\phi'$ semantically equivalent, contradicting the supposition that they are distinct. $\Box$

Sylvie Thiebaux 2006-01-20