Proofs of Theorems

**Proof: ** Induction on the structure of . There are several base
cases, all fairly trivial. If
or
there is
nothing to prove, as these progress to themselves and hold
everywhere and nowhere respectively. If then if holds in
then it progresses to which holds in
while if does not hold in then it
progresses to which does not hold in . The
case is similar. In the last base case, .
Then the following are equivalent:

Induction case 1:
. The following are equivalent:

and

and
(by induction
hypothesis)

Induction case 2: . Analogous to case 1.

Induction case 3:
. Trivial by inspection of the definitions.

Induction case 4:
. Then is logically equivalent
to
which by cases 1, 2 and 3 holds at
stage of for behaviour iff
holds at stage i+1.

**Theorem 1**
*Let be reward-normal, and let
be the result of progressing it through the successive
states of a sequence . Then, provided no is ,
for all
iff
.*

**Proof: ** First, by the definition of
reward-normality, if is reward-normal then
iff for all , if
then
.
Next, if
then progressing through
according to (that is, letting each be true iff
) cannot lead to a contradiction because by
Property 1, progression is truth-preserving.
It remains, then, to show that if
then
progressing through according to must lead
eventually to . The proof
of this is by induction on the structure of and as usual the
base case in which is a literal (an atom, a negated atom or
, or ) is trivial.
Case
. Suppose
. Then
either
or
, so
by the induction hypothesis either or progresses
eventually to , and hence so does their conjunction.
Case . Suppose
. Then both
and
, so by the
induction hypothesis each of and progresses eventually to
. Suppose without loss of generality that does not
progress to before does. Then at some point has
progressed to some formula and has progressed to
which simplifies to . Since also progresses to
eventually, so does .
Case
. Suppose
. Let
and let
. Then
, so by the induction
hypothesis progressed through according to
eventually reaches . But The progression of through
according to is exactly the same after the first
step, so that too leads to .
Case
. Suppose
. Then
there is some such that
and for
all ,
. We proceed by
induction on . In the base case , and both
and
whence by the main
induction hypothesis both and will eventually progress to
. Thus
progresses eventually to
for any , and in particular for
,
establishing the base case. For the induction case, suppose
(and of course
).
Since is equivalent to
and
,
and
, clearly
. Where
and are as in the previous case, therefore,
and the failure occurs at stage of
. Therefore the hypothesis of the induction on
applies, and progressed through according to
goes eventually to , and so progressed through
according to goes similarly to .

**Theorem 3**
*Let be the set of e-states in an equivalent MDP for
. is minimal iff every e-state
in is reachable and contains no two distinct e-states
and with
and
.*

**Proof: ** Proof is by construction of the canonical equivalent MDP
. Let the set of finite prefixes of state sequences in
be partitioned into equivalence classes, where
iff
and for all such that
,
. Let denote the equivalence
class of . Let be the set of these equivalence
classes. Let be the function that takes each
in to . For each
and and for each
, let
be
if
. Otherwise let
. Let
be . Then note the following four facts:

- Each of the functions , and is well-defined.
- is an equivalent MDP for with .
- For any equivalent MDP of there is a mapping from a subset of the states of onto .
- satisfies the condition that every e-state in is reachable and contains no two distinct e-states and with and iff is isomorphic to .

For all , if there is such that , then for all such that , there exists a unique , , such that for all , .Suppose , and . Then the required is , and of course , so the required condition reads:

is the unique element of with such that for all , .To establish existence, we need that if then , which is immediate from the definition of above. To establish uniqueness, suppose that and for all actions . Since , the transition probability from to is nonzero for some action, so by the definition of , can only be . Fact 3 is readily observed. Let be any equivalent MDP for . For any states and of , and any state of such that there is at most one state of with such that some action gives a nonzero probability of transition from to . This follows from the uniqueness part of condition 3 of Definition 1 together with the fact that the transition function is a probability distribution (sums to 1). Therefore for any given finite state sequence there is at most one state of reached from the start state of by following . Therefore induces an equivalence relation on : iff they lead to the same state of (the sequences which are not feasible in may all be regarded as equivalent under ). Each reachable state of has associated with it a nonempty equivalence class of finite sequences of states of . Working through the definitions, we may observe that is a sub-relation of (if then ). Hence the function that takes the equivalence class under of each feasible sequence to induces a mapping (an epimorphism in fact) from the reachable subset of states of onto . To establish Fact 4, it must be shown that in the case of the mapping can be reversed, or that each equivalence class in corresponds to exactly one element of . Suppose not (for contradiction). Then there exist sequences and in such that but on following the two sequences from we arrive at two different elements and of with but with . Therefore there exists a sequence such that . But this contradicts the condition for .

Theorem 3 follows immediately from facts 1-4.

**Theorem 4**
*Let be the translation of as in
Definition 5. is a blind minimal
equivalent MDP for .*

**Proof: ** Reachability of all the e-states is obvious, as they are
constructed only when reached. Each e-state is a pair
where is a state of and is a reward
function specification. In fact,
and determines a distribution of rewards over all
continuations of the sequences that reach
.
That is, for all in such that , the
reward for is
. If is not blind minimal, then there exist distinct
e-states
and
for which this sum is the same for all . But this makes
and semantically equivalent, contradicting the
supposition that they are distinct.