===========================
Distributed Value Functions
===========================
- Jeff Schneider, joint work with Weng-Keen Wong and Andrew Moore
Many interesting problems, such as power grids, network switches,
and traffic flow, that are candidates for solving with reinforcement
learning (RL), also have properties that make distributed solutions
desirable. We propose an algorithm for distributed reinforcement
learning based on distributing the representation of the value
function across nodes. Each node in the system only has the ability
to sense state locally, choose actions locally, and receive reward
locally (the goal of the system is to maximize the sum of the rewards
over all nodes and over all time). However each node is allowed to
give its neighbors the current estimate of its value function for the
states it passes through. We present a value function learning rule,
using that information, that allows each node to learn a value
function that is an estimate of a weighted sum of future rewards for
all the nodes in the network. With this representation, each node can
choose actions to improve the performance of the overall system.
We demonstrate our algorithm on the distributed control of a simulated
power grid. We compare it against other methods including: use of a
global reward signal, nodes that act locally with no communication,
and nodes that share rewards (but not value function) information with
each other. Our results show that the distributed value function
algorithm outperforms the others, and we conclude with an analysis of
what problems are best suited for distributed value functions and the
new research directions opened up by this work.