Reward Functions
With the language defined so far, we are able to compactly represent
behaviours. The extension to a non-Markovian reward function is
straightforward. We represent such a function by a set6
of formulae associated with real
valued rewards. We call
a reward function specification.
Where formula
is associated with reward
in
, we write
`
'. The rewards are assumed to be independent and
additive, so that the reward function
represented by
is given by:
E.g, if
is
, we get a reward of 5.2
the first time that
holds, a reward of
from the first time
that
holds onwards, a reward of
when both conditions are
met, and 0 otherwise.
Again, we can progress a reward function specification
to
compute the reward at all stages i of
. As before,
progression defines a sequence
of reward function specifications, with
, where
is the function that
applies Prog to all formulae in a reward function specification:
Then, the total reward received at stage
is simply the sum of
the real-valued rewards granted by the progression function to the
behaviours represented by the formulae in
:
By proceeding that way, we get the expected analog of
Theorem 1, which states
progression correctly computes non-Markovian reward functions:
Theorem 2
Let
be a reward-normal7reward function specification, and let
be the result of progressing it through the
successive states of a sequence
using the function
. Then, provided
for any
, then
.
Proof:
Immediate from Theorem 1.
Sylvie Thiebaux
2006-01-20