While the scope of our presentation and evaluation is restricted to planning with initial state uncertainty and deterministic actions, some of the planning graph techniques can be extended to include non-deterministic actions of the type described by . Non-deterministic actions have effects that are described in terms of a set of outcomes. For simplicity, we consider Rintanen's conditionality normal form, where actions have a set of conditional effects (as before) and each consequent is a mutually-exclusive set of conjunctions (outcomes) - one outcome of the effect will result randomly. We outline the generalization of our single, multiple, and labelled planning graphs to reason with non-deterministic actions.
Single Planning Graphs: Single planning graphs, that are built from approximate belief states or a sampled state, do not lend themselves to a straight-forward extension. A single graph ignores uncertainty in a belief state by unioning its literals or sampling a state to form the initial planning graph layer. Continuing with the single graph assumptions about uncertainty, it makes sense to treat non-deterministic actions as deterministic. Similar to how we approximate a belief state as a set of literals to form the initial literal layer or sample a state, we can assume that a non-deterministic effect adds all literals appearing in the effect or samples an outcome as if the action were deterministic (i.e. gives a set of literals). Single graph relaxed plan heuristics thus remain unchanged.
Multiple Planning Graphs: Multiple planning graphs are very much like Conformant GraphPlan . We can generalize splitting the non-determinism in the current belief state into multiple initial literal layers to splitting the outcomes of non-deterministic effects into multiple literal layers. The idea is to root a set of new planning graphs at each level, where each has an initial literal layer containing literals supported by an interpretation of the previous effect layer. By interpretations of the effect layer we mean every possible set of joint effect outcomes. A set of effect outcomes is possible if no two outcomes are outcomes of the same effect. Relaxed plan extraction still involves finding a relaxed plan in each planning graph. However, since each planning graph is split many times (in a tree-like structure) a relaxed plan is extracted from each ``path of the tree''.
We note that this technique is not likely to scale because of the exponential growth in redundant planning graph structure over time. Further, in our experiments CGP has enough trouble with initial state uncertainty. We expect that we should be able to do much better with the .
Labelled Uncertainty Graph: With multiple planning graphs we are forced to capture non-determinism through splitting the planning graphs not only in the initial literal layer, but also each literal layer that follows at least one non-deterministic effect. We saw in the that labels can capture the non-determinism that drove us to split the initial literal layer in multiple graphs. As such, these labels took on a syntactic form that describes subsets of the states in our source belief state. In order to generalize labels to capture non-determinism resulting from uncertain effects, we need to extend their syntactic form. Our objective is to have a label represent which sources of uncertainty (arising from the source belief state or effects) causally support the labelled item. We also introduce a graph layer to represent outcomes and how they connect effects and literals.
It might seem natural to describe the labels for outcomes in terms of their affected literals, but this can lead to trouble. The problem is that the literals in effect outcomes are describing states at a different time than the literals in the projected belief state. Further, an outcome that appears in two levels of the graph is describing a random event at different times. Using state literals to describe all labels will lead to confusion as to which random events (state uncertainty and effect outcomes at distinct steps) causally support a labelled item. A pathological example is when we have an effect whose set of outcomes matches one-to-one with the states in the source belief state. In such a case, by using labels defined in terms of state literals we cannot distinguish which random event (the state uncertainty or the effect uncertainty) is described by the label.
We have two choices for describing effect outcomes in labels. In both choices we introduce a new set of label variables to describe how a literal layer is split. These new variables will be used to describe effect outcomes in labels and will not be confused with variables describing initial state uncertainty. In the first case, these variables will have a one-to-one matching with our original set of literals, but can be thought of as time-stamped literals. The number of variables we add to the label function is on the order of 2 per level (the number of fluent literals - assuming boolean fluents). The second option is to describe outcomes in labels with a new set of fluents, where each interpretation over the fluents is matched to a particular outcome. In this case, we add on the order of log variables, where is the outcome layer. It would actually be lower if many of the outcomes were from deterministic effects because there is no need to describe them in labels. The former approach is likely to introduce fewer variables when there are a lot of non-deterministic effects and they affect quite a few of the same literals. The latter will introduce fewer variables when there are relatively few non-deterministic effects whose outcomes are fairly independent.
With the generalized labelling, we can still say that an item is reachable from the source belief state when its label is entailed by the source belief state. This is because even though we are adding variables to labels, we are implicitly adding the fluents to the source belief state. For example, say we add a fluent to describe two outcomes of an effect. One outcome is labelled , the other . We can express the source belief state that is projected by the with the new fluent as . An item labelled as will not be entailed by the projected belief state (i.e. is unreachable) because only one outcome causally supports it. If both outcomes support the item, then it will be reachable.
Given our notion of reachability, we can determine the level from which to extract a relaxed plan. The relaxed plan procedure does not change much in terms of its semantics other than having the extra graph layer for outcomes. We still have to ensure that literals are causally supported in all worlds they are labelled with in a relaxed plan, whether or not the worlds are from the initial state uncertainty or supporting non-deterministic effects.