A(x,u)=max(Q(x,u')) + (Q(x,u) - max(Q(x,u'))) * k/dtwhere the max is taken over all choices of action

*Advantage updating* is an older algorithm than *advantage
learning*. In advantage updating, the definition of *A(x,u)*
was slightly different, and it required storing a value function
*V(x)* in addition to the advantage function. Advantage learning
is a more recent algorithm that supercedes advantage updating, and requires
only that the A(x,u) advantages be stored. The two algorithms have
essentially identical behavior, but the later algorithm requires less
information to be stored, and is a simpler algorithm, so it is generally
recommended.

Advantage learning and Q-learning learn equally quickly when used with a
lookup table. Advantage learning can learn many orders of
magnitude faster than Q-learning in some cases where a function
approximator is used, even a linear function approximator.
Specifically, if
time steps are "small" in the sense that the state changes a very
small amount on each time step, then advantage learning would be
expected to learn much faster than Q-learning. Or, for a semi Markov
Decision Problems (SMDP), if even one action consistently causes small
state changes, that also counts as "small" time steps. In that case,
the *dt* in the equation would be different for each action.

- Advantage learning applied to a game with nonlinear dynamics and a nonlinear
function approximator.

Harmon, M. E., and Baird, L. C. (1996). Multi-player residual advantage learning with general function approximation. (Techical Report WL-TR-96-1065). Wright-Patterson Air Force Base Ohio: Wright Laboratory. (available from the Defense Technical Information Center, Cameron Station, Alexandria, VA 22304-6145). - Advantage learning applied to a game with nonlinear dynamics and a nonlinear
function approximator

Harmon, M. E., and Baird, L. C. (1995). Residual Advantage Learning Applied to a Differential Game. Proceedings of the International Conference on Neural Networks (ICNN'96), Washington D.C., 3-6 June. - Advantage learning applied to a game with linear dynamics and a linear
function approximator

Harmon, M. E., Baird, L. C., and Klopf, A. H. (1995). Reinforcement Learning Applied to a Differential Game. Adaptive Behavior, MIT Press, (4)1, pp. 3-28. - Advantage learning applied to a game with linear dynamics and a linear
function approximator

Harmon, M. E., Baird, L. C., and Klopf, A. H. (1994). Advantage Updating Applied to a Differential Game. Gerald Tesauro, et. al., eds. Advances in Neural Information Processing Systems 7. pp. 353-360. MIT Press, 1995. - The old advantage updating algorithm is defined and simulation results are given.

Baird, L. C. (1993). Advantage Updating. (Technical Report WL-TR-93-1146). Wright-Patterson Air Force Base Ohio: Wright Laboratory. (available from the Defense Technical Information Center, Cameron Station, Alexandria, VA 22304-6145). - The old advantage updating algorithm is defined and simulation results are given.

Baird, L. C. (1994). Reinforcement Learning in Continuous Time: Advantage Updating. Proceedings of the International Conference on Neural Networks. Orlando, FL. June. - The old advantage updating algorithm is applied to a 2-player game.

Harmon, M. E., Baird, L. C., and Klopf, A. H. (1994). Advantage Updating Applied to a Differential Game. Gerald Tesauro, et. al., eds. Advances in Neural Information Processing Systems 7. pp. 353-360. MIT Press, 1995.