THE OPTIMAL REWARD PROBLEM
OR
WHERE DO REWARDS COME FROM?
SATINDER SINGH
Joint work with Jonathan Sorg and Richard Lewis
Computer Science and Engineering, University of Michigan
Impressive results have been obtained by research approaches to autonomous
agents that start with a given reward function and focus on developing
theory and algorithms for learning or planning policies that lead to high
cumulative reward. In a departure from this work, we recognize that in
many situations the starting point is an agent designer with a reward
function seeking to build an autonomous agent to act on its behalf. What
reward function should the designer build into the autonomous agent? In
this new view, setting the parameters (agent's reward function) equal to
the given preferences (designer's reward function) implements a
preferences-parameters confound. If an agent is bounded, as most agents
are in practice, we expect that breaking the preferences-parameters
confound would be beneficial. We define the optimal reward problem, that
of designing the agent's reward function from among a set of reward
functions given a designer's reward function, an agent architecture, and a
distribution over environments. The main focus of the talk will be on a
discussion of some empirical and theoretical insights obtained by solving
the optimal reward problem.
BIO
Satinder Singh is a Professor of Computer Science and Engineering at the
University of Michigan. He is also presently serving as the AI Lab
Director. He contributes to the research areas of reinforcement learning,
decision-theoretic planning, and computational game theory.