##

Reward Functions

With the language defined so far, we are able to compactly represent
behaviours. The extension to a non-Markovian reward function is
straightforward. We represent such a function by a set^{6}
of formulae associated with real
valued rewards. We call a *reward function specification*.
Where formula is associated with reward in , we write
`
'. The rewards are assumed to be independent and
additive, so that the reward function represented by
is given by:

E.g, if is
, we get a reward of 5.2
the first time that holds, a reward of from the first time
that holds onwards, a reward of when both conditions are
met, and 0 otherwise.
Again, we can progress a reward function specification to
compute the reward at all stages i of . As before,
progression defines a sequence
of reward function specifications, with
, where
is the function that
applies Prog to all formulae in a reward function specification:

Then, the total reward received at stage is simply the sum of
the real-valued rewards granted by the progression function to the
behaviours represented by the formulae in :

By proceeding that way, we get the expected analog of
Theorem 1, which states
progression correctly computes non-Markovian reward functions:

**Theorem 2**
*Let be a reward-normal*^{7}reward function specification, and let
be the result of progressing it through the
successive states of a sequence using the function
. Then, provided
for any
, then
.
**Proof: **
Immediate from Theorem 1.

Sylvie Thiebaux
2006-01-20