A New Algorithm For Approximating Value Functions
Geoff Gordon
I'll start by talking about the connection between Markov decision
problems and linear programs. Then I'll describe a new algorithm,
motivated by this connection, which finds an approximation to the
value function of an MDP. The algorithm is similar to interior point
methods for convex programming. Its advantages include: it is fast;
it can handle any representation of the value function which is linear
in its adjustable parameters; and it is not troubled by local minima.
Finally, I'll present some initial experimental results from using
this algorithm to control a well-known MDP which has about 10^{66}
states and 40 actions.