Orbital library

## orbital.algorithm.template Class MarkovDecisionProcess

```java.lang.Object orbital.algorithm.template.MarkovDecisionProcess
```
All Implemented Interfaces:
java.io.Serializable, AlgorithmicTemplate
Direct Known Subclasses:
MarkovDecisionProcess.DynamicProgramming

`public abstract class MarkovDecisionProcessextends java.lang.Objectimplements AlgorithmicTemplate, java.io.Serializable`

Markov decision process (MDP). For closed-loop planning.

Let γ∈[0,1] be a discount factor and π:S→A(s) a policy.
expected cost sum
One evaluation function (or cost function or utility function or value function) fπ corresponding to the policy π is
fπ:S→R; fπ(s) = E[∑t=0 γtc(st,at) | s0=s,π]
i.e. the expected value of the discounted immediate cost sum starting from state s under policy π.
Q-values
The action-value cost of an action a∈A(s) in state s∈S as evaluated by f:S→R is
Qf:S×A(s)→R; Qf(s,a) = c(s,a) + γ*E[f(st+1)|st=s,at=a]
= c(s,a) + γ*∑s'∈S P(s'|s,a)*f(s')
If f=fπ, Qfπ(s,a) is the expected cost of performing action a∈A(s) in state s∈S and thereafter following the policy π.
greedy policy
A policy π is greedy with respect to an action-value function Q:S×A(s)→R if
∀s∈S Q(s,π(s)) = mina∈A(s) Q(s,a)
Given an action-value function Q:S×A(s)→R, one greedy policy πQ with respect to Q is given by
πQ(s) = argmina∈A(s) Q(s,a)
Similarly, πf := πQf is one greedy policy with respect to an evaluation function f:S→R.
A solution π* is optimal if it has minimum expected cost (alias maximum expected utility), i.e. fπ* = f* ∧ πf* = π*. It is
f*:S→R is the optimal evaluation function ⇔ f*(s) = mina∈A(s) Qf*(s,a)
Which is one form of the Bellman optimality equation.

By the way, a minor generalization is sometimes used that would change a policy to be a stochastic function π:S×A→[0,1] specifying the probability π(s,a) with that it chooses the action a∈A(s) in state s∈S. But this does not usually improve the quality of solution policies.

To apply MDP planning, perform something like:

``` MarkovDecisionProcess planner = ...;
// obtain a plan
Function plan = planner.solve(problem);
for (Object state = initial; !problem.isSolution(state); state = observe()) {
perform(plan.apply(state));
}
```

Author:
André Platzer
`MarkovDecisionProblem`, Hector Geffner. Modelling and Problem Solving, "A. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81-138, 1995.", Serialized Form

Nested Class Summary
`static class` `MarkovDecisionProcess.DynamicProgramming`
Abstract base class for Markov decision processes solved per dynamic programming.

Nested classes/interfaces inherited from interface orbital.algorithm.template.AlgorithmicTemplate
`AlgorithmicTemplate.Configuration`

Constructor Summary
`MarkovDecisionProcess()`

Method Summary
`protected  MarkovDecisionProblem` `getProblem()`
Get the current problem.
`protected abstract  Function` `plan()`
Run the planning.
` java.lang.Object` `solve(AlgorithmicProblem p)`
Solves an MDP problem.

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Methods inherited from interface orbital.algorithm.template.AlgorithmicTemplate
`complexity, spaceComplexity`

Constructor Detail

### MarkovDecisionProcess

`public MarkovDecisionProcess()`
Method Detail

### getProblem

`protected final MarkovDecisionProblem getProblem()`
Get the current problem.

Returns:
the problem specified in the last call to solve.

### solve

`public java.lang.Object solve(AlgorithmicProblem p)`
Solves an MDP problem.

Specified by:
`solve` in interface `AlgorithmicTemplate`
Parameters:
`p` - algorithmic problem hook class which must fit the concrete algorithmic template framework implementation.
Returns:
the solution policy function S→A found, or `null` if no solution could be found.
`Function.apply(Object)`

### plan

`protected abstract Function plan()`
Run the planning.

Returns:
the solution policy function S→A found, or `null` if no solution could be found.

Orbital library
1.3.0: 11 Apr 2009