next up previous
Next: Regression Up: Planning Graph Heuristics for Previous: Introduction

Belief Space Planners

Our planning formulation uses regression search to find strong conformant plans and progression search to find strong conformant and conditional plans. A strong plan guarantees that after a finite number of actions executed from any of the many possible initial states, all resulting states are goal states. Conformant plans are a special case where the plan has no conditional plan branches, as in classical planning. Conditional plans are a more general case where plans are structured as a graph because they include conditional actions (i.e. the actions have causative and observational effects). In this presentation, we restrict conditional plans to DAGs, but there is no conceptual reason why they cannot be general graphs. Our plan quality metric is the maximum plan path length.

We formulate search in the space of belief states, a technique described by [6]. The planning problem $ P$ is defined as the tuple $ \langle D, BS_I, BS_G\rangle$ , where $ D$ is a domain description, $ BS_I$ is the initial belief state, and $ BS_G$ is the goal belief state (consisting of all states satisfying the goal). The domain $ D$ is a tuple $ \langle F, A\rangle$ , where $ F$ is a set of fluents and $ A$ is a set of actions.

Logical Formula Representation: We make extensive use of logical formulas over $ F$ to represent belief states, actions, and $ LUG$ labels, so we first explain a few conventions. We refer to every fluent in $ F$ as either a positive literal or a negative literal, either of which is denoted by $ l$ . When discussing the literal $ l$ , the opposite polarity literal is denoted $ \neg l$ . Thus if $ l$ = $ \neg$ at(location1), then $ \neg l$ = at(location1). We reserve the symbols $ \perp$ and $ \top$ to denote logical false and true, respectively. Throughout the paper we define the conjunction of an empty set equivalent to $ \top$ , and the disjunction of an empty set as $ \perp$ .

Logical formulas are propositional sentences comprised of literals, disjunction, conjunction, and negation. We refer to the set of models of a formula $ f$ as $ {\cal M}(f)$ . We consider the disjunctive normal form of a logical formula $ f$ , $ \hat{\xi}(f)$ , and the conjunctive normal form of $ f$ , $ \kappa(f)$ . The DNF is seen as a disjunction of ``constituents'' $ \hat{S}$ each of which is a conjunction of literals. Alternatively the CNF is seen as a conjunction of ``clauses'' $ C$ each of which is a disjunction of literals.1 We find it useful to think of DNF and CNF represented as sets - a disjunctive set of constituents or a conjunctive set of clauses. We also refer to the complete representation $ \xi(f)$ of a formula $ f$ as a DNF where every constituent - or in this case state $ S$ - is a model of $ f$ .

Belief State Representation: A world state, $ S$ , is represented as a complete interpretation over fluents. We also refer to states as possible worlds. A belief state $ BS$ is a set of states and is symbolically represented as a propositional formula over $ F$ . A state $ S$ is in the set of states represented by a belief state $ BS$ if $ S \in {\cal M}(BS)$ , or equivalently $ S \models BS$ .

For pedagogical purposes, we use the bomb and toilet with clogging and sensing problem, BTCS, as a running example for this paper.2 BTCS is a problem that includes two packages, one of which contains a bomb, and there is also a toilet in which we can dunk packages to defuse potential bombs. The goal is to disarm the bomb and the only allowable actions are dunking a package in the toilet (DunkP1, DunkP2), flushing the toilet after it becomes clogged from dunking (Flush), and using a metal-detector to sense if a package contains the bomb (DetectMetal). The fluents encoding the problem denote that the bomb is armed (arm) or not, the bomb is in a package (inP1, inP2) or not, and that the toilet is clogged (clog) or not. We also consider a conformant variation on BTCS, called BTC, where there is no DetectMetal action.

The belief state representation of the BTCS initial condition, in clausal representation is:

$ \kappa(BS_I)$ = arm $ \wedge \neg$ clog $ \wedge$ (inP1 $ \vee$ inP2) $ \wedge$ ($ \neg$ inP1 $ \vee \neg$ inP2),

and in constituent representation is:

$ \hat{\xi}(BS_I)$ = (arm $ \wedge \neg$ clog $ \wedge$ inP1 $ \wedge \neg$ inP2) $ \vee$ (arm $ \wedge \neg$ clog $ \wedge \neg$ inP1 $ \wedge$ inP2).

The goal of BTCS has the clausal and constituent representation:

$ \kappa(BS_G)$ = $ \hat{\xi}(BS_G)$ = $ \neg$ arm.

However, the goal has the complete representation:

$ \xi(BS_G)$ = ($ \neg$ arm $ \wedge$ clog $ \wedge$ inP1 $ \wedge \neg$ inP2) $ \vee$ ($ \neg$ arm $ \wedge$ clog $ \wedge \neg$ inP1 $ \wedge$ inP2) $ \vee$
($ \neg$ arm $ \wedge \neg$ clog $ \wedge$ inP1 $ \wedge \neg$ inP2) $ \vee$ ($ \neg$ arm $ \wedge \neg$ clog $ \wedge \neg$ inP1 $ \wedge$ inP2) $ \vee$
($ \neg$ arm $ \wedge$ clog $ \wedge \neg$ inP1 $ \wedge \neg$ inP2) $ \vee$ ($ \neg$ arm $ \wedge$ clog $ \wedge$ inP1 $ \wedge$ inP2) $ \vee$
($ \neg$ arm $ \wedge \neg$ clog $ \wedge \neg$ inP1 $ \wedge \neg$ inP2) $ \vee$ ($ \neg$ arm $ \wedge \neg$ clog $ \wedge$ inP1 $ \wedge$ inP2).

The last four states (disjuncts) in the complete representation are unreachable, but consistent with the goal description.

Action Representation: We represent actions as having both causative and observational effects. All actions $ a$ are described by a tuple $ \langle \rho^e(a), \Phi(a), \Theta(a) \rangle$ where $ \rho^e(a)$ is an execution precondition, $ \Phi(a)$ is a set of causative effects, and $ \Theta(a)$ is a set of observations. The execution precondition, $ \rho^e(a)$ , is a conjunction of literals that must hold for the action to be executable. If an action is executable, we apply the set of causative effects to find successor states and then apply the observations to partition the successor states into observational classes.

Each causative effect $ \varphi^j(a) \in \Phi(a)$ is a conditional effect of the form $ \rho^j(a)$ $ \implies$ $ \varepsilon^j(a)$ , where the antecedent $ \rho^j(a)$ and consequent $ \varepsilon^j(a)$ are both a conjunction of literals. We handle disjunction in $ \rho^e(a)$ or a $ \rho^j(a)$ by replicating the respective action or effect with different conditions, so with out loss of generality we assume conjunctive preconditions. However, we cannot split disjunction in the effects. Disjunction in an effect amounts to representing a set of non-deterministic outcomes. Hence we do not allow disjunction in effects thereby restricting to deterministic effects. By convention $ \varphi^0(a)$ is an unconditional effect, which is equivalent to a conditional effect where $ \rho^0(a) =
\top$ .

The only way to obtain observations is to execute an action with observations. Each observation formula $ o^j(a) \in \Theta(a)$ is a possible sensor reading. For example, an action $ a$ that observes the truth values of two fluents $ p$ and $ q$ defines $ \Theta(a) = \{p
\wedge q, \neg p \wedge q, p \wedge \neg q, \neg p \wedge \neg q\}$ . This differs slightly from the conventional description of observations in the conditional planning literature. Some works (e.g., [28]) describe an observation as a list of observable formulas, then define possible sensor readings as all boolean combinations of the formulas. We directly define the possible sensor readings, as illustrated by our example. We note that our convention is helpful in problems where some boolean combinations of observable formulas will never be sensor readings.

The causative and sensory actions for the example BTCS problem are:

DunkP1: $ \langle \rho^e$ = $ \neg$ clog, $ \Phi = \{\varphi^0
=$ clog, $ \varphi^1 =$ inP1 $ \implies \neg$ arm $ \}, \Theta =
\{\}\rangle$ ,

DunkP2: $ \langle \rho^e = \neg$ clog, $ \Phi = \{\varphi^0
=$ clog, $ \varphi^1 =$ inP2 $ \implies \neg$ arm $ \}, \Theta =
\{\}\rangle$ ,

Flush: $ \langle \rho^e = \top$ , $ \Phi = \{\varphi^0 =
\neg$ clog $ \}, \Theta =
\{\}\rangle$ , and

DetectMetal: $ \langle \rho^e = \top, \Phi = \emptyset,
\Theta = \{o^0 =$ inP1, $ o^1 = \neg$ inP1$ \}\rangle$ .

next up previous
Next: Regression Up: Planning Graph Heuristics for Previous: Introduction