Friday 8 July, WeH 4601, 12:30 Risk-Averse Reinforcement Learning Matthias Heger, University of Bremen (Germany) Most Reinforcement Learning (RL) work supposes policies for sequential decision tasks to be optimal that minimize the expected total discounted cost (e. g. Q-Learning, AHC architecture). On the other hand, it is well known that it is not always reliable and can be treacherous to use the expected value as a decision criterion. A lot of alternative decision criteria have been suggested in decision theory to get a more sophisticated considaration of risk but most RL researchers have not concerned themselves with this subject until now. In this talk, some problems of the expected value criterion in Markov decision processes will be shown and dynamic programming algorithms for the minimax criterion will be given. A counterpart to Watkins' Q-learning related to the minimax criterion will be presented. The new algorithm, called Q-hat-learning, finds policies that minimize the worst-case total discounted costs. This talk is more detailed than the presentation of a related ML94 paper and extends the paper by two topics: (a) Modification of Q-hat-learning for less risk-averse agents; (b) Q-hat-learning in domains with hidden states.