## Reward Functions

With the language defined so far, we are able to compactly represent behaviours. The extension to a non-Markovian reward function is straightforward. We represent such a function by a set6 of formulae associated with real valued rewards. We call a reward function specification. Where formula is associated with reward in , we write ` '. The rewards are assumed to be independent and additive, so that the reward function represented by is given by:

E.g, if is , we get a reward of 5.2 the first time that holds, a reward of from the first time that holds onwards, a reward of when both conditions are met, and 0 otherwise. Again, we can progress a reward function specification to compute the reward at all stages i of . As before, progression defines a sequence of reward function specifications, with , where is the function that applies Prog to all formulae in a reward function specification:

Then, the total reward received at stage is simply the sum of the real-valued rewards granted by the progression function to the behaviours represented by the formulae in :

By proceeding that way, we get the expected analog of Theorem 1, which states progression correctly computes non-Markovian reward functions:

Theorem 2   Let be a reward-normal7reward function specification, and let be the result of progressing it through the successive states of a sequence using the function . Then, provided for any , then .

Proof: Immediate from Theorem 1.
Sylvie Thiebaux 2006-01-20