The fundamental property of is the following. Where :

Like , the function seems to require (or at least ) as input, but of course when progression is applied in practice we only have and one new state at a time of , and what we really want to do is

To decide whether a prefix of is to be rewarded, first tries progressing the formula through with the boolean flag set to `false'. If that gives a consistent result, we need not reward the prefix and we continue without rewarding , but if the result is then we know that must be rewarded in order for to satisfy . In that case, to obtain we must progress through again, this time with the boolean flag set to the value `true'. To sum up, the behaviour corresponding to is . To illustrate the behaviour of $FLTL progression, consider the formula stating that a reward will be received the first time is true. Let be a state in which holds, then . Therefore, since the formula has progressed to , is true and a reward is received. , so the reward formula fades away and will not affect subsequent progression steps. If, on the other hand, is false in , then . Therefore, since the formula has not progressed to , is false and no reward is received. , so the reward formula persists as is for subsequent progression steps. The following theorem states that under weak assumptions, rewards are correctly allocated by progression:

The premise of the theorem is that never progresses to . Indeed if for some , it means that even rewarding does not suffice to make true, so something must have gone wrong: at some earlier stage, the boolean was made false where it should have been made true. The usual explanation is that the original was not reward-normal. For instance , which is reward unstable, progresses to in the next state if p is true there: regardless of , , , and , so if then . However, other (admittedly bizarre) possibilities exist: for example, although is reward-unstable, its substitution instance , which also progresses to in a few steps, is logically equivalent to and is reward-normal. If the progression method were to deliver the correct minimal behaviour in all cases (even in all reward-normal cases) it would have to backtrack on the choice of values for the boolean flags. In the interest of efficiency, we choose not to allow backtracking. Instead, our algorithm raises an exception whenever a reward formula progresses to , and informs the user of the sequence which caused the problem. The onus is thus placed on the domain modeller to select sensible reward formulae so as to avoid possible progression to . It should be noted that in the worst case, detecting reward-normality cannot be easier than the decision problem for $FLTL so it is not to be expected that there will be a simple syntactic criterion for reward-normality. In practice, however, commonsense precautions such as avoiding making rewards depend explicitly on future tense expressions suffice to keep things normal in all routine cases. For a generous class of syntactically recognisable reward-normal formulae, see Appendix A. Sylvie Thiebaux 2006-01-20