Proofs of Theorems

Property 1 Where , iff .

Proof: Induction on the structure of . There are several base cases, all fairly trivial. If or there is nothing to prove, as these progress to themselves and hold everywhere and nowhere respectively. If then if holds in then it progresses to which holds in while if does not hold in then it progresses to which does not hold in . The case is similar. In the last base case, . Then the following are equivalent:

Induction case 1: . The following are equivalent:

and
and (by induction hypothesis)

Induction case 2: . Analogous to case 1.

Induction case 3: . Trivial by inspection of the definitions.

Induction case 4: . Then is logically equivalent to which by cases 1, 2 and 3 holds at stage of for behaviour iff holds at stage i+1.

Theorem 1 Let be reward-normal, and let be the result of progressing it through the successive states of a sequence . Then, provided no is , for all iff .

Proof: First, by the definition of reward-normality, if is reward-normal then iff for all , if then . Next, if then progressing through according to (that is, letting each be true iff ) cannot lead to a contradiction because by Property 1, progression is truth-preserving. It remains, then, to show that if then progressing through according to must lead eventually to . The proof of this is by induction on the structure of and as usual the base case in which is a literal (an atom, a negated atom or , or ) is trivial. Case . Suppose . Then either or , so by the induction hypothesis either or progresses eventually to , and hence so does their conjunction. Case . Suppose . Then both and , so by the induction hypothesis each of and progresses eventually to . Suppose without loss of generality that does not progress to before does. Then at some point has progressed to some formula and has progressed to which simplifies to . Since also progresses to eventually, so does . Case . Suppose . Let and let . Then , so by the induction hypothesis progressed through according to eventually reaches . But The progression of through according to is exactly the same after the first step, so that too leads to . Case . Suppose . Then there is some such that and for all , . We proceed by induction on . In the base case , and both and whence by the main induction hypothesis both and will eventually progress to . Thus progresses eventually to for any , and in particular for , establishing the base case. For the induction case, suppose (and of course ). Since is equivalent to and , and , clearly . Where and are as in the previous case, therefore, and the failure occurs at stage of . Therefore the hypothesis of the induction on applies, and progressed through according to goes eventually to , and so progressed through according to goes similarly to .

Theorem 3 Let be the set of e-states in an equivalent MDP for . is minimal iff every e-state in is reachable and contains no two distinct e-states and with and .

Proof: Proof is by construction of the canonical equivalent MDP . Let the set of finite prefixes of state sequences in be partitioned into equivalence classes, where iff and for all such that , . Let denote the equivalence class of . Let be the set of these equivalence classes. Let be the function that takes each in to . For each and and for each , let be if . Otherwise let . Let be . Then note the following four facts:

1. Each of the functions , and is well-defined.
2. is an equivalent MDP for with .
3. For any equivalent MDP of there is a mapping from a subset of the states of onto .
4. satisfies the condition that every e-state in is reachable and contains no two distinct e-states and with and iff is isomorphic to .
What fact 1 above amounts to is that if then it does not matter which of the two sequences is used to define , and of their equivalence class. In the cases of and this is simply that . In the case of , it is the special case of the equality of rewards over extensions. Fact 2 is a matter of checking that the four conditions of Definition 1 hold. Of these, conditions 1 ( ) and 2 ( ) hold trivially by the construction. Condition 4 says that for any feasible state sequence , we have for all . This also is given in the construction. Condition 3 states:
For all , if there is such that , then for all such that , there exists a unique , , such that for all , .
Suppose , and . Then the required is , and of course , so the required condition reads:
is the unique element of with such that for all , .
To establish existence, we need that if then , which is immediate from the definition of above. To establish uniqueness, suppose that and for all actions . Since , the transition probability from to is nonzero for some action, so by the definition of , can only be . Fact 3 is readily observed. Let be any equivalent MDP for . For any states and of , and any state of such that there is at most one state of with such that some action gives a nonzero probability of transition from to . This follows from the uniqueness part of condition 3 of Definition 1 together with the fact that the transition function is a probability distribution (sums to 1). Therefore for any given finite state sequence there is at most one state of reached from the start state of by following . Therefore induces an equivalence relation on : iff they lead to the same state of (the sequences which are not feasible in may all be regarded as equivalent under ). Each reachable state of has associated with it a nonempty equivalence class of finite sequences of states of . Working through the definitions, we may observe that is a sub-relation of (if then ). Hence the function that takes the equivalence class under of each feasible sequence to induces a mapping (an epimorphism in fact) from the reachable subset of states of onto . To establish Fact 4, it must be shown that in the case of the mapping can be reversed, or that each equivalence class in corresponds to exactly one element of . Suppose not (for contradiction). Then there exist sequences and in such that but on following the two sequences from we arrive at two different elements and of with but with . Therefore there exists a sequence such that . But this contradicts the condition for .
Theorem 3 follows immediately from facts 1-4.

Theorem 4 Let be the translation of as in Definition 5. is a blind minimal equivalent MDP for .

Proof: Reachability of all the e-states is obvious, as they are constructed only when reached. Each e-state is a pair where is a state of and is a reward function specification. In fact, and determines a distribution of rewards over all continuations of the sequences that reach . That is, for all in such that , the reward for is . If is not blind minimal, then there exist distinct e-states and for which this sum is the same for all . But this makes and semantically equivalent, contradicting the supposition that they are distinct.

Sylvie Thiebaux 2006-01-20