Input language

The input language enables the specification of actions, initial states, rewards, and search control-knowledge. The format for the action specification is essentially the same as in the SPUDD system [29]. The reward specification is one or more formulae, each associated with a name and a real number. These formulae are in either PLTL or $FLTL. Control knowledge is given in the same language as that chosen for the reward. Control knowledge formulae will have to be verified by any sequence of states feasible under the generated policies. Initial states are simply specified as part of the control knowledge or as explicit assignments to propositions. For instance, consider a simple example consisting of a coin showing either heads or tails ( $\neg \mbox{heads}$). There are two actions that can be performed. The flip action changes the coin to show heads or tails with a 50% probability. The tilt action changes it with 10% probability, otherwise leaving it as it is. The initial state is tails. We get a reward of 5.0 for the very first head (this is written $\mbox{heads}\wedge \neg \circleddash \makebox[8pt][c]{\makebox[0pt][c]{$\Diamond$}\makebox[0pt][c]{\raisebox{0.5pt}{-}}}\mbox{heads}$ in PLTL) and a reward of 1.0 each time we achieve the sequence heads, heads, tails ( $\circleddash ^{2}
\mbox{heads}\wedge \circleddash \mbox{heads}\wedge \neg \mbox{heads}$ in PLTL). In our input language, this NMRDP is described as shown in Figure 8.
Figure: Input for the Coin Example. prv (previously) stands for $\circleddash $ and
pdi (past diamond) stands for $\Diamond$-.
\begin{figure}\begin{verbatim}action flip
heads (0.5)
endactionaction tilt
...(prv heads) and ~heads\end{verbatim}
Figure 9: Sample Session
> loadWorld('coin') load coin NMRDP
> preprocess('sPltl') PLTLSTR preprocessing
> startCPUtimer
> spudd(0.99, 0.0001) solve MDP with SPUDD $(\beta,\epsilon)$
> stopCPUtimer
> readCPUtimer report solving time
> iterationCount report number of iterations
> displayDot(valueToDot) display ADD of value function
> displayDot(policyToDot) display policy
> preprocess('mPltl') PLTLMIN preprocessing
> expand completely expand MDP
> domainStateSize report MDP size
> printDomain ("") | 'show-domain.rb' display postcript rendering of MDP
> valIt(0.99, 0.0001) solve MDP with VI $(\beta,\epsilon)$
> iterationCount report number of iterations
> getPolicy output policy (textual)
Sylvie Thiebaux 2006-01-20