Orbital library

Class MarkovDecisionProcess

  extended by orbital.algorithm.template.MarkovDecisionProcess
All Implemented Interfaces:
java.io.Serializable, AlgorithmicTemplate
Direct Known Subclasses:

public abstract class MarkovDecisionProcess
extends java.lang.Object
implements AlgorithmicTemplate, java.io.Serializable

Markov decision process (MDP). For closed-loop planning.

Let γ∈[0,1] be a discount factor and π:S→A(s) a policy.
expected cost sum
One evaluation function (or cost function or utility function or value function) fπ corresponding to the policy π is
fπ:S→R; fπ(s) = E[∑t=0 γtc(st,at) | s0=s,π]
i.e. the expected value of the discounted immediate cost sum starting from state s under policy π.
The action-value cost of an action a∈A(s) in state s∈S as evaluated by f:S→R is
Qf:S×A(s)→R; Qf(s,a) = c(s,a) + γ*E[f(st+1)|st=s,at=a]
= c(s,a) + γ*∑s'∈S P(s'|s,a)*f(s')
If f=fπ, Qfπ(s,a) is the expected cost of performing action a∈A(s) in state s∈S and thereafter following the policy π.
greedy policy
A policy π is greedy with respect to an action-value function Q:S×A(s)→R if
∀s∈S Q(s,π(s)) = mina∈A(s) Q(s,a)
Given an action-value function Q:S×A(s)→R, one greedy policy πQ with respect to Q is given by
πQ(s) = argmina∈A(s) Q(s,a)
Similarly, πf := πQf is one greedy policy with respect to an evaluation function f:S→R.
A solution π* is optimal if it has minimum expected cost (alias maximum expected utility), i.e. fπ* = f* ∧ πf* = π*. It is
f*:S→R is the optimal evaluation function ⇔ f*(s) = mina∈A(s) Qf*(s,a)
Which is one form of the Bellman optimality equation.

By the way, a minor generalization is sometimes used that would change a policy to be a stochastic function π:S×A→[0,1] specifying the probability π(s,a) with that it chooses the action a∈A(s) in state s∈S. But this does not usually improve the quality of solution policies.

To apply MDP planning, perform something like:

 MarkovDecisionProcess planner = ...;
 // obtain a plan
 Function plan = planner.solve(problem);
 // follow the plan
 for (Object state = initial; !problem.isSolution(state); state = observe()) {

André Platzer
See Also:
MarkovDecisionProblem, Hector Geffner. Modelling and Problem Solving, "A. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81-138, 1995.", Serialized Form

Nested Class Summary
static class MarkovDecisionProcess.DynamicProgramming
          Abstract base class for Markov decision processes solved per dynamic programming.
Nested classes/interfaces inherited from interface orbital.algorithm.template.AlgorithmicTemplate
Constructor Summary
Method Summary
protected  MarkovDecisionProblem getProblem()
          Get the current problem.
protected abstract  Function plan()
          Run the planning.
 java.lang.Object solve(AlgorithmicProblem p)
          Solves an MDP problem.
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface orbital.algorithm.template.AlgorithmicTemplate
complexity, spaceComplexity

Constructor Detail


public MarkovDecisionProcess()
Method Detail


protected final MarkovDecisionProblem getProblem()
Get the current problem.

the problem specified in the last call to solve.


public java.lang.Object solve(AlgorithmicProblem p)
Solves an MDP problem.

Specified by:
solve in interface AlgorithmicTemplate
p - algorithmic problem hook class which must fit the concrete algorithmic template framework implementation.
the solution policy function S→A found, or null if no solution could be found.
See Also:


protected abstract Function plan()
Run the planning.

the solution policy function S→A found, or null if no solution could be found.

Orbital library
1.3.0: 11 Apr 2009

Copyright © 1996-2009 André Platzer
All Rights Reserved.