We use top down AO* search [24], in the planner to generate conformant and conditional plans. In the search graph, the nodes are belief states and the hyper-edges are actions. We need AO* because applying an action with observations to a belief state divides the belief state into observational classes. We use hyper-edges for actions because actions with observations have several possible successor belief states, all of which must be included in a solution.

The AO* search consists of two repeated steps: expand the current partial solution, and then revise the current partial solution. Search ends when every leaf node of the current solution is a belief state that satisfies the goal and no better solution exists (given our heuristic function). Expansion involves following the current solution to an unexpanded leaf node and generating its children. Revision is a dynamic programming update at each node in the current solution that selects a best hyper-edge (action). The update assigns the action with minimum cost to start the best solution rooted at the given node. The cost of a node is the cost of its best action plus the average cost of its children (the nodes connected through the best action). When expanding a leaf node, the children of all applied actions are given a heuristic value to indicate their estimated cost.

The main differences between our formulation of AO* and that of [24] are that we do not allow cycles in the search graph, we update the costs of nodes with an average rather than a summation, and use a weighted estimate of future cost. The first difference is to ensure that plans are strong (there are a finite number of steps to the goal), the second is to guide search toward plans with lower average path cost, and the third is to bias our search to trust the heuristic function. We define our plan quality metric (maximum plan path length) differently than the metric our search minimizes for two reasons. First, it is easier to compare to other competing planners because they measure the same plan quality metric. Second, search tends to be more efficient using the average instead of the maximum cost of an action's children. By using average instead of maximum, the measured cost of a plan is lower - this means that we are likely to search a shallower search graph to prove a solution is not the best solution.

Conformant planning, using actions without observations, is a special case for AO* search, which is similar to A* search. The hyper-edges that represent actions are singletons, leading to a single successor belief state. Consider the BTC problem (BTCS without the DetectMetal action) with the future cost (heuristic value) set to zero for every search node. We show the search graph in Figure 2 for this conformant example as well as a conditional example, described shortly. We can expand the initial belief state by progressing it over all applicable actions. We get:

Progress(
DunkP1)

{(inP1
inP2
clog
arm)
(
inP1
inP2
clog
arm)}

and

Progress(
DunkP2)

{(inP1
inP2
clog
arm)
(
inP1
inP2
clog
arm)}.

Since clog already holds in every state of the initial belief state, applying Flush to leads to creating a cycle. Hence, a hyper-edge for Flush is not added to the search graph for . We assign a cost of zero to and , update the internal nodes of our best solution, and add DunkP1 to the best solution rooted at (whose cost is now one).

We expand the leaf nodes of our best solution, a single node , with all applicable actions. The only applicable action is Flush, so we get:

Progress(
, Flush)

{(inP1
inP2
clog
arm)
(
inP1
inP2
clog
arm)}.

We assign a cost of zero to and update our best solution. We choose Flush as the best action for (whose cost is now one), and choose DunkP2 as the best action for (whose cost is now one). DunkP2 is chosen for because its successor has a cost of zero, as opposed to which now has a cost of one.

Expanding the leaf node with the only applicable action, Flush, we get:

Progress(
, Flush)

{(
inP1
inP2
clog
arm)
(inP1
inP2
clog
arm)}.

We update (to have cost zero) and (to have a cost of one), and choose Flush as the best action for . The root node has two children, each with cost one, so we arbitrarily choose DunkP1 as the best action.

We expand with the relevant actions to get with the DunkP2 action. DunkP1 creates a cycle back to so it is not added to the search graph. We now have a solution where all leaf nodes are terminal. While it is only required that a terminal belief state contains a subset of the states in , in this case the terminal belief state contains exactly the states in . The cost of the solution is three because, through revision, has a cost of one, which sets to a cost of two. However, this means now that has cost of three if its best action is DunkP1. Instead, revision sets the best action for to DunkP2 because its cost is currently two.

We then expand with DunkP1 to find that its successor is . DunkP2 creates a cycle back to so it is not added to the search graph. We now have our second valid solution because it contains no unexpanded leaf nodes. Revision sets the cost of to one, to two, and to three. Since all solutions starting at have equal cost (meaning there are now cheaper solutions), we can terminate with the plan DunkP2, Flush, DunkP1, shown in bold with dashed lines in Figure 2.

As an example of search for a conditional plan in , consider the BTCS example whose search graph is also shown in Figure 2. Expanding the initial belief state, we get:

Progress( DunkP1),

Progress( DunkP2),

and

= Progress
DetectMetal)

inP1
inP2
clog
arm,
inP1
inP2
clog
arm
.

Each of the leaf nodes is assigned a cost of zero, and DunkP1 is chosen arbitrarily for the best solution rooted at because the cost of each solution is identical. The cost of including each hyper-edge is the average cost of its children plus its cost, so the cost of using DetectMetal is (0+0)/2 + 1 = 1. Thus, our root has a cost of one.

As in the conformant problem we expand , giving its child a cost of zero and a cost of one. This changes our best solution at to use DunkP2, and we expand , giving its child a cost of zero and it a cost of one. Then we choose DetectMetal to start the best solution at because it gives a cost of one, where using either Dunk action would give a cost of two.

We expand the first child of DetectMetal, , with DunkP1 to get:

{inP1 inP2 clog arm},

which is a goal state, and DunkP2 to get:

= Progress DunkP2) = {inP1 inP2 clog arm}.

We then expand the second child, , with DunkP2 to get:

{ inP1 inP2 clog arm},

which is also a goal state and DunkP1 to get:

= Progress DunkP1) = { inP1 inP2 clog arm}.

While none of these new belief states are not equivalent to , two of them entail , so we can treat them as terminal by connecting the hyper-edges for these actions to . We choose DunkP1 and DunkP2 as best actions for and respectively and set the cost of each node to one. This in turn sets the cost of using DetectMetal for to (1+1)/2 + 1 = 2. We terminate here because this plan has cost equal to the other possible plans starting at and all leaf nodes satisfy the goal. The plan is shown in bold with solid lines in Figure 2.