Advantage Learning
Advantage learning is a form of reinforcement learning similar to
Q-learning except that it uses advantages rather than
Q-values. For a state x and action u,
the advantage for that state-action pair A(x,u) is related to the
Q value Q(x,u) as:
A(x,u)=max(Q(x,u')) + (Q(x,u) - max(Q(x,u'))) * k/dt
where the max is taken over all choices of action u'. Notice
that the maximum advantage in a given state will equal the
maximum Q-value in that state, which will be the value of that
state. If k/dt=1 , then all of the advantages are
identical to the Q values. So Q-learning is a special case of
advantage learning. If k is a constant and dt
is the size of a time step, then advantage learning differs from
Q-learning for small time steps in that the differences between
advantages in a given state are larger than the differences between
Q values.
Advantage updating is an older algorithm than advantage
learning. In advantage updating, the definition of A(x,u)
was slightly different, and it required storing a value function
V(x) in addition to the advantage function. Advantage learning
is a more recent algorithm that supercedes advantage updating, and requires
only that the A(x,u) advantages be stored. The two algorithms have
essentially identical behavior, but the later algorithm requires less
information to be stored, and is a simpler algorithm, so it is generally
recommended.
Advantage learning and Q-learning learn equally quickly when used with a
lookup table. Advantage learning can learn many orders of
magnitude faster than Q-learning in some cases where a function
approximator is used, even a linear function approximator.
Specifically, if
time steps are "small" in the sense that the state changes a very
small amount on each time step, then advantage learning would be
expected to learn much faster than Q-learning. Or, for a semi Markov
Decision Problems (SMDP), if even one action consistently causes small
state changes, that also counts as "small" time steps. In that case,
the dt in the equation would be different for each action.
More Information
- Advantage learning applied to a game with nonlinear dynamics and a nonlinear
function approximator.
Harmon, M. E., and Baird, L. C. (1996).
Multi-player residual advantage learning with general function approximation.
(Techical Report WL-TR-96-1065).
Wright-Patterson Air Force Base Ohio: Wright Laboratory.
(available from the Defense Technical Information Center,
Cameron Station, Alexandria, VA 22304-6145).
- Advantage learning applied to a game with nonlinear dynamics and a nonlinear
function approximator
Harmon, M. E., and Baird, L. C. (1995).
Residual Advantage Learning Applied to a Differential Game.
Proceedings of the International Conference on Neural Networks
(ICNN'96), Washington D.C., 3-6 June.
- Advantage learning applied to a game with linear dynamics and a linear
function approximator
Harmon, M. E., Baird, L. C., and Klopf, A. H. (1995).
Reinforcement Learning Applied to a Differential Game.
Adaptive Behavior, MIT Press, (4)1, pp. 3-28.
- Advantage learning applied to a game with linear dynamics and a linear
function approximator
Harmon, M. E., Baird, L. C., and Klopf, A. H. (1994).
Advantage Updating Applied to a Differential Game.
Gerald Tesauro, et. al., eds.
Advances in Neural Information Processing Systems 7.
pp. 353-360. MIT Press, 1995.
- The old advantage updating algorithm is defined and simulation results are given.
Baird, L. C. (1993).
Advantage Updating. (Technical Report WL-TR-93-1146).
Wright-Patterson Air Force Base Ohio: Wright Laboratory.
(available from the Defense Technical Information Center,
Cameron Station, Alexandria, VA 22304-6145).
- The old advantage updating algorithm is defined and simulation results are given.
Baird, L. C. (1994).
Reinforcement Learning in Continuous Time: Advantage Updating.
Proceedings of the International Conference on
Neural Networks.
Orlando, FL. June.
- The old advantage updating algorithm is applied to a 2-player game.
Harmon, M. E., Baird, L. C., and Klopf, A. H. (1994).
Advantage Updating Applied to a Differential Game.
Gerald Tesauro, et. al., eds.
Advances in Neural Information Processing Systems 7.
pp. 353-360. MIT Press, 1995.
Back to Glossary Index