Friday 8 July, WeH 4601, 12:30
Risk-Averse Reinforcement Learning
Matthias Heger, University of Bremen (Germany)
Most Reinforcement Learning (RL) work supposes policies for sequential
decision tasks to be optimal that minimize the expected total
discounted cost (e. g. Q-Learning, AHC architecture). On the other
hand, it is well known that it is not always reliable and can be
treacherous to use the expected value as a decision criterion. A lot
of alternative decision criteria have been suggested in decision
theory to get a more sophisticated considaration of risk but most RL
researchers have not concerned themselves with this subject until now.
In this talk, some problems of the expected value criterion in Markov
decision processes will be shown and dynamic programming algorithms
for the minimax criterion will be given. A counterpart to Watkins'
Q-learning related to the minimax criterion will be presented. The new
algorithm, called Q-hat-learning, finds policies that minimize the
worst-case total discounted costs. This talk is more detailed than the
presentation of a related ML94 paper and extends the paper by two
topics:
(a) Modification of Q-hat-learning for less risk-averse agents;
(b) Q-hat-learning in domains with hidden states.