We study the problem of computing the optimal value function for a
Markov decision process with positive costs. Computing this function
quickly and accurately is a basic step in many schemes for deciding
how to act in stochastic environments. There are efficient algorithms
which compute value functions for special types of MDPs: for
deterministic MDPs with S states and A actions, Dijkstra's
algorithm runs in time O(AS log S). And, in single-action MDPs
(Markov chains), standard linear-algebraic algorithms find the value
function in time O(S^3), or faster by taking advantage of sparsity
or good conditioning. Algorithms for solving general MDPs can take
much longer: we are not aware of any speed guarantees better than
those for comparably-sized linear programs. We present a family of
algorithms which reduce to Dijkstra's algorithm when applied to
deterministic MDPs, and to standard techniques for solving linear
equations when applied to Markov chains. More importantly, we
demonstrate experimentally that these algorithms perform well when
applied to MDPs which ``almost'' have the required special structure.