Assumptive Planning and Execution:
a Simple, Working Robot Architecture
Illah R. Nourbakhsh
illah@cs.stanford.edu
Department of Computer Science, Stanford University,
Stanford, CA 94305
Michael R. Genesereth
genesereth@cs.stanford.edu
Department of Computer Science, Stanford University,
Stanford, CA 94305
Abstract.
In this paper, we present a simple
approach to interleaving planning and execution
based on the idea of making simplifying assumptions.
Since assumptions can be tragically wrong, this
assumptive approach must ensure both that the robot
does not believe it has solved a problem when it has
not and that it does not take steps that make a problem
unsolvable. We present an assumptive algorithm that
preserves goalreachability and in addition we specify
conditions under which the assumptive
architecture is sound and complete.
We have successfully implemented the
assumptive architecture on several realworld robots.
Students in an introductory robot lab at Stanford
University implement an assumptive system on
robots that have incomplete information about their maze
world. Dervish, our winning entry in the 1994 AAAI
National Robot Competition, implements an
assumptive architecture to cope with partially
specified environments and unreliable effectors and
sensors.
Keywords: interleaving planning and execution,
mobile robotics, assumptions, robot architecture,
planning with incomplete information
Figure captions:
Fig. 1. The interaction paradigm between the robot and the
environment.
Fig 2. Stateset transitions using (a) actions and (b) percepts.
Fig 3. Property sets and corresponding sets of possible world
states.
Fig. 4. Positional uncertainty for a mobile robot system.
Fig. 5. A conditional graph segment and an embedded
conditional plan (emphasized).
Fig. 6. An initial uncertainty region (a) and its simplification
(b).
Fig. 7. Arc transitions corresponding to (a) effect and (b)
seffect..
Fig. 8. Arc transitions corresponding to (a) perceive and (b)
sperceive.
Fig. 9. A conditional plan (a) displayed graphically and (b) as
a lisp program.
Fig. 10. The overlay of the conditional plan from Fig. 9 using
the actual initial conditions I.
Fig. 11. An abstract example of the overcommitment problem.
Fig. 12. CS 224 students developing navigation code in a small
maze.
Fig. 13. The CS224 assumptive student evaluation maze.
Fig. 14. An office environment represented (a) geometrically
and (b) as a topological map.
Fig. 15. A quantization of the office environment of Fig. 14
based on connectivity.
Table titles:
Table 1. A comparison of the cost of conditional and assumptive
planning.
Table 2. A comparison of the conditional and assumptive search
spaces for CS 224.
Table 3. A comparison of the conditional and assumptive search
spaces for the final contest at the AAAI 1994 National Robot
Competition.
Contact author information:
Illah R. Nourbakhsh
4157258086 office
4154970511 home
4157257411 fax
Computer Science Department
Stanford University
Stanford, CA 94305
electronic mail: illah@cs.stanford.edu
1 Introduction
We plan and act in the world with the help of assumptions that
enable us to ignore many absurdly complex worstcase scenarios.
Yet we often program mobile robots to create complete plans
from initial conditions that detail every conceivable case. In
domains with pathological but improbable possibilities, this
planning task quickly becomes intractable.
Rather than foregoing planning because of the complexity of
complete planning under these conditions, we propose that the
robot use assumptions to make the planning task more tractable.
When such simplifying assumptions are correct, the robot will
reach the goal after executing the resultant plan. But if
assumptions are chosen naively and turn out to be invalid, then
the robot's actions may be disastrous.
In this paper we present an architecture that incorporates
simplifying assumptions, yet preserves the robot's ability to reach
the goal even when the assumptions are wrong. This assumptive
architecture is one method for interleaving planning and
execution. Interleaved planning and execution was first
introduced in the context of plan execution monitoring (Fikes,
1971). This work used the robot's runtime inputs to modify plan
execution; however, the replanning was limited to modifying the
program counter's position with respect to a fixed sequence of
actions. Sacerdoti outlined the use of multiple episodes of
execution preceding completion of the planning task, offering a
paradigm that embodies the conventional view of interleaving
planning and execution (Sacerdoti, 1974). A number of recent
approaches to interleaving planning and execution have involved
cutting a planning session short and executing early so
that the robot can receive sensory feedback
from the environment as early as possible. The resulting
information gain can then be used to simplify the remaining
planning task (AmbrosIngerson & Steel, 1988; Dean et. al,
1990; Olawsky et. al., 1993; Genesereth & Nourbakhsh, 1993).
The present work expands upon current research in this area
by providing a formal framework for expressing incomplete
information and presenting sound and complete algorithms for
interleaving, even in domains with irreversibilities. After
presenting the formal framework in section 2, we describe two
assumptive algorithms in detail and compare their computational
costs to those of a classical advanceplanning system. Finally,
we describe two successful implementations of the assumptive
architecture on realworld robots in nontrivial domains.
2 A Representation for Action and Perception
We begin by describing a formal model that expresses complete
information about the interaction of a robot with its environment.
We then show how common types of incomplete information can
be formally represented within this model Finally, we
demonstrate the representationindependent nature of the model
by summarizing two instantiations using different
representations.
2.1 A Model of Complete Information
A robot is an agent with one (aggregate) input line and one
(aggregate) output line. We represent the set of possible robot
input vectors, or percepts, as P. The robot's set of possible
outputs, or actions, is O. We refer to a particular action (i.e. a
member of O) as a; and to a particular percept as p.
The robot's behavior at each point in time may be viewed as a
function from a history of percepts to an action: p*>a. As
shown in Fig. 1, the robot is connected to its environment
through its percepts and actions. The robot's outputs are inputs to
the environment, which we assume is otherwise closed. The
environment's output is coupled to the robot's input, allowing the
environment's state s to determine the robot's perceptual input
vector, or percept.
A percept is a complete description of the robot's input. It
corresponds to a vector of the values of every input line. This model
of perception contrasts with the common view of robot perception as
an environmental event or a perceptual "process" (Etzioni et. al.,
1992; Olawsky et. al., 1993). In our model, a percept is not an active
percept. We view a perceptual "process" as a sequence of actions
that transform the state of the environment such that the robot's
subsequent percept conveys meaningful information (Rivest and
Shapire, 1993; Rosenschein and Kaelbling, 1995).
We formally define the relationship between the robot's
percepts and actions and the states S of the environment using
two functions:
result(a,s) isamemberof S
see(s) isamemberof P
Result maps an action and a state into the state set that results
from the application of the given action in the given state. See
maps each environment state into its corresponding percept.
In the complete information case, the robot's goal can be
expressed as the transformation of an initial state i into any
member of a set of desirable states, G.[footnote 1] Formally, a robot
will
achieve its goal when execution of its plan from i eventually
leads to a state in G. We express this concept by temporally
indexing the world state:
s_1 = i
Exists n s_n isamemberof G
The above formalism describes the complete relationship
between the robot and the world. Unfortunately, robots do not
always have such complete information.
2.2 A Model of Incomplete Information
Incomplete information commonly occurs in three places: initial
conditions, actions and percepts. For each type of incomplete
information, the model is modified to allow partial specification
of state and behavior.
Incomplete information about initial conditions involves
partial specification of the initial world state, in which case the
robot must plan and act to reach the goal from any initial state
consistent with the specification. We can partially specify the
initial conditions using an initial stateset, I, rather than an initial
state i. The state set denotes the logical disjunction of its
member states. The concept of goal fulfillment changes
accordingly to allow the agent to achieve the goal despite
incomplete state information. Here S_n represents a state set, where
s_i was a single state:
S_1 = I
Exists n S_n subsetof G
This model of goal achievement means that the robot has
achieved the goal when its state set is a subset of the goal set, G.
The robot may still have incomplete information about its actual
state; nevertheless, it must be at some goal state when this subset
condition holds.
Incomplete information about the result of actions can
capture the behavior of both unreliable actions as well as truly
nondeterministic actions. In both cases, the results of performing
an action are only partially specified.
Incomplete information about the discerning power of
sensors can also capture both concepts of sensory unreliability
and poor sensory resolution. In the case of poor sensory
resolution, multiple states may be consistent with a single
percept. This type of uncertainty, called "perceptual aliasing,"
can complicate planning if the robot's uncertainty encompasses
more than one such state (McCallum, 1995). McCallum also
notes that a single state may be consistent with multiple percepts.
This corresponds to the case of sensory unreliability. Taken
together, these two types of uncertainty cause manytomany
relationships between possible states and possible percepts.
The new functions, results and sees, express the above types
of incomplete information using states sets in lieu of single
states, where U is some state set:
results(a,U) subsetof S
sees(p,U) subsetof U
Note that results is identical to result, if the parameters are
changed from state sets to states. Sees is less intuitive, however.
It is essentially a filter function, returning that subset of the
specified state set that is consistent with the given percept. The
return value of sees is a state set whereas the return value of see
is a percept.
Fig. 2a depicts an action arc in the case of incomplete
information: . The grab action transforms the
state set containing states a and b into a larger state set
containing four new states. The fact that the sink set is larger
implies that the grab action is nondeterministic. For example, e
and f may correspond to states in which the grab fails while g and
h may correspond to different cases of successful grabs.
Fig. 2b depicts perceptual branching in the case of
incomplete perceptual information. The two arcs are labeled
with percepts used to conduct the filtering operation. Therefore,
the sink of the arc labeled p1 corresponds to sees(p1, {a,b,c}). In
this case, only states a and b are consistent with p1. Note that
state b is consistent with both p1 and p2. This means that this
pair of percepts is only capable of discriminating state a from
state c. If the solution from a to the goal differs greatly with the
solution from c to the goal, this perceptual branch may permit a
solution whereas a plan that ignores sensory input may fail.
2.3 RepresentationIndependence
Robots frequently take advantage of efficient, domainspecific
representations that do not require explicit enumeration of state
sets. Of course, a formal mapping from the specialpurpose
representation to state sets will exist. We abstract the above
definitions to a representationindependent level in order to
emphasize the model's applicability to any domainspecific
representation.
The central objects transformed by results and sees are state
sets in the model we have presented. We abstract these objects
to a representationindependent concept called nodes. There
must be a welldefined mapping from nodes to state sets, including
the stateset specific functions results and sees. We define these
mappings using the states function and the new functions effect
and perceive, which transform a node n rather than a state set:
states(effect(a,n)) = results(a,states(n))
states(perceive(p,n)) = sees(p,states(n))
Perceive and effect refer to the domainspecific implementations
of the robot's input and output functions, respectively. The above
formulas specify the correspondence between perceive and sees
and between effect and results.
The number of possible domainspecific representations is of
course unlimited. We discuss two representations below to
clarify the relationship between state sets and more specific
representation schemes.
2.4 Property Set Representation
One wellknown representation technique has its origins in
STRIPS and involves the specification of nodes as sets of
properties, which formally denote the intersection of the
corresponding state sets (Fikes & Nilsson, 1971). The property
set {r1,r2,r3} denotes the state set
.
Within the property set approach, a common way to implement
effect and perceive involves defining add lists and delete lists. In
that case, effect(a,n) is defined as . Fig. 3
depicts two property sets and their corresponding state sets. Note
that Fig. 3b maps to a singleton state set and thus represents
complete information about the world state.
Using a property set formalism obviates the need to translate
down into the explicit state set representation when
implementing the assumptive architecture because we will
specify the assumptive architecture in terms of nodes, effect, and
perceive. Yet, a precise mapping from property sets
to state sets does exist because a property can be mapped to a
state set and, therefore, a set of properties can be mapped to the
intersection of a set of state sets.
2.5 Geometric Representation
In the case of mobile robotics, a useful way of capturing
positional uncertainty (i.e., world state) is to geometrically
describe the smallest twodimensional region, Ur, certain to
contain the robot. Fig. 4 depicts an example of Ur for a robot
traveling down an outdoor arcade.
A clear mapping to state sets again exists: Ur represents the
state set containing every state in which the robot is positioned
inside the region delimited by Ur. Of course, in practice a finite
mapping would require a discretization of the continuous
geometric space. In this geometric space, effect and perceive
can be defined in terms of polygonal transformations to effect
highly efficient expansion of the nodes during search, allowing
for fast motion planning algorithms. (Latombe et. al., 1991).
2.6 Representation Requirements
These are just two of many possible representation schemes that
are computationally practical for specific classes of problems.
We go on to describe assumptive architectures without regard to
any particular representation. In order for our algorithm to be
applicable to a particular representation, two conditions must be
met. (1) The representation must define effect and perceive. (2)
The representation must define subsume, which can be specified
as follows:
subsumes(n_1,n_2) iff states(n_1) supersetof states(n_2)
For example, the following formula is true for Fig. 3:
subsumes({on(a,b)},{(on(a,b),on(b,c)})
3 AdvancePlanning Architectures
The traditional approach to problemsolving is easy to
understand because it clearly subdivides the general problem into
two subproblems: planning and executing. We call this advance
planning architecture because the problemsolving method is a
twostep process: (1) Plan from the initial conditions to the goal
conditions; (2) Execute the plan.
This approach has the benefit of allowing the system designer
to avoid wrestling with the complexities of interleaving planning
and execution. However, the incurred cost is that the planning
episode must result in a solution from the initial conditions all
the way to the goal. In this section, we motivate the use of
conditional plans and describe a general conditional advance
planner using the formalisms provided in section 2. Although
the resulting planner will be similar to standard stateset based
planners, this exercise will make the role of incomplete
information explicit in the search process, facilitating the shift to
interleaving architectures.
3.1 Conditional Planning
Finding a sequential solution from the initial conditions to the
goal is often impossible when the robot is faced with incomplete
information. The solution is to construct conditional plans that
take advantage of runtime perceptual feedback as information
gain. Several classical instantiations of general conditional
planning have been described in the literature (Warren, 1976;
Drummond, 1985; Genesereth, 1993). Unfortunately, robot
problems often cannot take advantage of general conditional
planning because its search space continues to be prohibitively
large, as a conditional search engine branches not only on
possible actions to perform but also on possible percepts that the
robot may encounter.
More recently researchers have introduced limited forms of
conditionality into planning, combining more tractable search
methods with the powerful idea of runtime information gain
(Kaelbling & Rosenschein, 1987; Doyle, Atkinson & Doshi,
1986; Etzioni et. al., 1992; Peot & Smith, 1992). In domains
where irreversibility is rampant and optimal performance is
required, this form of advanceplanning is sometimes the only
option.
3.2 A NodeSpace Conditional Planner
Consider the search space of a nodespace planner. Because of
the incompleteness of its information, the robot must use run
time perceptual inputs as feedback in order to potentially achieve
information gain. The planner must allow for such perceptual
discrimination in advance by searching a graph that not only
branches based on possible actions but also branches based on
possible perceptual inputs. Thus, the planner searches a
conditional graph.
A conditional graph is formally a directed bipartite graph with
alternating perceptory nodes and effectory nodes. Fig. 5 depicts
such a graph, using squares to indicate effectory nodes and
circles to indicate perceptory nodes.
A perceptory node has actionlabelled arcs leading to it and
perceptlabelled arcs emanating from it. An effectory node has
percept arcs leading to it and action arcs emanating from it. An
action arc labelled with action a connects node n1 to node n2 iff
effect(a,n1)=n2. Similarly, a percept arc labelled with percept p
connects n1 to n2 iff perceive(p,n1)=n2. Note that this means
that the sink of a percept arc is always subsumed by its source:
subsumes(n1,perceive(p,n1)).
Of course, building the entire conditional graph is
unreasonably expensive since the number of nodes is 2^S.
Therefore, a planner searches for a conditional solution by
incrementally constructing the graph using effect and perceive.
The search process is identical to an andor graph search because
the conditional plan must select actions sequences (orbranch),
but must do so over all percept sequences that the system can
encounter during execution (andbranch).
Unfortunately, complete conditional planning quickly
becomes too costly. For a system with O actions and P
percepts, the cost is exponential based on the length of the
solution, k: O(P^kO^k). Of course, P and O represent
theoretical worstcase branching factors. In instances where
some percepts are inconsistent with world conditions or some
actions have no impact on world state, the effective branching
factors may be much smaller.
4 An Assumptive Approach
Assumptive architectures decrease total search cost by
incorporating simplifying assumptions that make the planning
task tractable. We consider three types of assumptions in this
paper: assumptions about initial conditions, assumptions about
the behavior of actions, and assumptions about the behavior of
percepts. After describing these assumptions, we provide an
assumptive algorithm that interleaves planning and execution,
then provide soundness and completeness results.
4.1 Assumptions about Initial Conditions
One of the forms of a robot's uncertainty can be incomplete
knowledge about its exact initial conditions. For example, Fig.
6a depicts a navigation robot faced with a high degree of
positional uncertainty.
Rather than trying to produce a plan guaranteed to work from
every possible position captured by Ur, the robot can make
simplifying assumptions to "virtually" reduce its initial positional
uncertainty and produce plans based on these assumptions, as in
Fig. 6b. Since the number of cases being considered under the
assumption is a proper subset of the actual initial conditions, any
plan from the actual conditions will certainly work from the
assumed conditions. Therefore, the solution from the assumed
conditions is guaranteed to be the same length or shorter than the
solution from the actual conditions.
We define a selection function sel that is constrained to
simplify a node by making assumptions that correspond to
informatio gain:
Definition. sel returns a node representing information gain:
subsumes(I,sel(I)).
We will refer to the assumed initial conditions, sel(I), as A.
4.2 Assumptions about Actions
A robot often has available actions that are reliable almost
always. One way to capture this unreliability is to model the
action, via effect, as nondeterministic. For example, consider a
STRIPS gripper that sometimes slips. Fig. 7a depicts the result
of the grab operation from a particular node. Note that the nodes
in Fig. 7 are specified as explicitly enumerated state sets.
The robot can make simplifying assumptions about the reliability
of its actions in order to reduce or even eliminate this
nondeterministic property. Fig. 7b depicts the result of grab
from the same source node when the planner assumes away the
potential for gripper failure.
Definition. The function seffect represents a simplification of
effect iff:
subsumes(effect(a,n),seffect(a,n))
4.3 Assumptions about Percepts
Analogous to unreliable actions, the percepts as defined by
perceive may be incapable of distinguishing between states of the
world because of the existence of false positives or false
negatives. Given the robot's set of percepts {p1,p2,..,pn}, we
know from the definition of see and perceive that the sink nodes
corresponding to all perceptual arcs from a specific source node
must together cover the source node. Fig. 8a depicts this
situation for a robot with two percepts.
Note that the two sink nodes are not mutually exclusive in Fig.
8a. The fact that the sink nodes share states b and d indicates
that the robot's sensors cannot discriminate these states, perhaps
because of false positives.
Simplification of perceive is more complex than the
simplification of effect.
Definition. Given a robot with percepts P = {p1,..,pm}, the
function sperceive is a simplification of perceive iff:
Forall i subsumes(perceive(p_i,n), sperceive(p_i,n)) AND
UNION(sperceive(p_1,n),...,sperceive(p_m,n)) = n
This formula allows sperceive to exhibit better discrimination
than perceive, but requires all percepts, together, to cover all
states. Fig. 8b is an example of such a simplification because
state b appears in only one of the two sink nodes, and all source
node states can be found in at least one sink node. Insofar as 8b
eliminates one false positive, the simplification represents greater
information and may make planning cheaper.
4.4 The Assumptive Algorithm
The assumptive algorithm we present uses a conditional planner
and implements the simplifying assumptions described above.
Of course, the assumptions may turn out to be wrong. In order to
deal with this possibility, the robot monitors the execution of its
plan. If the robot's actual percept contradicts its expectation, the
robot knows that it made a bad assumption. In that case, it must
take the new sensory data into account to change its assumptions
and replan.
Algorithm 1: Assumptive Algorithm
1. A = sel(I)
2. Construct a plan C from A to G using seffect and
sperceive
3. Execute the next step of C
4. Update I and A using the action just executed, a:
I = effect(a,I)
A = seffect(a,A)
5. Update I and A using the current percept, p:
I = perceive(p,I)
A =perceive(p,A)
6. If A = {}, goto step 1 (perceptual contradiction)
7. Execution of C complete? Terminate
8. Goto step 3
An important facet of this algorithm is the necessity of updating
both I and A based upon the actions taken and percepts
encountered. A must be updated in order to detect any
contradiction between the assumptions resulting from the sel call
and the percepts actually encountered. I must be updated in case
such a contradiction is encountered. If a perceptual contradiction
occurs, A will become the empty set in step 5. Then, step 6 will
lead back to step 1, where the robot will redefine A in terms of
the current value of I, which takes into account the partial plan
executed thus far and the percepts encountered along the way.
4.5 Conserving Reachability
The greatest risk of making assumptions is that the ensuing
execution may be tragic if the assumptions are inappropriate.
Formally, the execution of an assumptive plan may transform
some into state r such that there is no path from r to
a goal state. This means that the assumptive algorithm presented
thus far is in fact incomplete.[footnote 2]
One solution is to force sel to be conservative by including
all potentially irreversible states in sel(I):
Definition. path_k(s,r) denotes that there exists an actionlabeled
path of length k or less in the statespace graph from state s to r.
Definition. Given a problem domain, k, and I, sel is strong
iff: forall s memberof states(I)
exists r {path_k(s,r) AND not(path_k(r,s))} implies
s isamemberof states(sel(I))
Note that strong selection is a constraint on the behavior of sel
and not a constructive method for specifying a selection function.
One approach to designing a sel is to define sel without regard to
irreversibility issues, then augment the resulting node until strong
selection is satisfied.
In practice, the strong selection check is relatively
inexpensive. Strong selection involves a search conducted on the
state space graph, which denotes complete information and is
therefore less expensive than search on the nodespace graph.
4.6 False Goal Detection
Before any claims can be made about the completeness of this
assumptive architecture, one further problem must be addressed.
This more subtle problem occurs because an incorrect
assumption may go undiscovered during execution, leading the
robot to believe it has reached the goal when in fact it has not.
Definition. Given a conditional plan C, if construction of a
conditional graph from I using effect and perceive results in some
fringe node n s.t.
not(subsumes(a,n)) then C contains a false goal.
Definition. False goal detector prunes any plan containing a
false goal.
The existence of false goals is a result of the fact that the planner
can use simplifying assumptions that are sufficiently extreme as
to make the planning episode unsound. Thus, actual execution of
the resultant plan can lead to fringe nodes containing unforeseen
and indistinguishable situations.
Consider a robot in initial node I={a,b,c} with the goal set
G={x,y}. Furthermore, assume that sel(I) = {a,b}. Fig. 9a
graphically depicts the result of invoking a planner using sel(I),
seffect, and sperceive. Fig. 9b is the resultant plan extracted from
this conditional graph.
The next step is to construct a conditional graph from I using
effect, perceive and the plan in Fig. 9b. The result is the plan
overlay shown in Fig. 10. Constructing this plan overlay is
inexpensive because it involves no search; it consists only of the
application of effect and perceive functions to "simulate" plan
execution.
Note that the fringe node of this conditional graph unexpectedly
contains state w. This is the unsound false goal case.
What else could have happened to state c? If a perceptual
contradiction were possible, then state f would have fallen off the
condition plan overlay at the perceptual branch point because it
would have been consistent only with percept p3, which is not an
expected input during plan execution.
4.7 Results
Soundness and completeness results follow if we allow
simplifying assumptions to be made only about the initial
conditions. Of course, since there may be multiple episodes of
planning and acting, there may be multiple points at which the
sel function can be invoked and therefore multiple "initial"
conditions.
Theorem 1 Algorithm 1 with strong sel and false goal detection
is sound and preserves goal reachability if seffect = effect,
sperceive = perceive.
The proof is actually two separate proofs: one for goal
reachability, and another for soundness. The goal reachability
proof is based on the following four assumptions:
Assumption 1: The planner used in Algorithm 1 step 2 is
sound and complete.
Assumption 2: The planner used in Algorithm 1 step 2 uses a
false goal detector (Definition 7).
Assumption 3: The sel function is consistent with the Strong
Selection constraint (Definition 5).
Assumption 4: State reachability implies stateset reachability
(Footnote 2)
The goal reachability proof uses induction on the number of
planning episodes j (i.e. the number of times step 2 of Algorithm
1 is executed).
The basecase is the single planning episode case: j = 1. Since
the proof is about preservation of goal reachability, we are
interested only in the case when a conditional plan D does exist
from I to G. The proof involves showing that, during execution
of Algorithm 1, there is always a plan from each current state s to
the goal. Once this is shown, the stateset reachability follows by
Assumption 4. For each such s, it is the progression of some
state i from I. By case analysis, i is either (1) a member of
sel(I); or it is (2) a member of I  sel(I). In case (1), there must
be a path from s to the goal by Assumption 1. In case (2), there
is obviously a path from i to s. Therefore, there must be a path
from s to i (Definition 5; Assumption 3).
The induction step is easy to see by showing that, at the end
of each planning and execution episode, the resultant state set
preserves goal reachability. Thus, the basecase argument applies
to every subsequent planning and execution episode.
The soundness proof, which is based on Assumption 2, is
extremely simple. The proof shows that, when Algorithm 1
terminates (step 7), the updated fringe node I is a subset of G.
Step 7 only invokes termination upon plan completion. In the
case of plan completion, then the current node I is in fact equal to
some plan overlay fringe node F (Fig. 10). Definition 7
guarantees that all plan overlay fringe nodes of an executed plan
are subset of G, and so the soundness proof is complete. Note
that soundness does not require seffect and sperceive to be equal
to effect and perceive, respectively.
Although algorithm 1 preserves goal reachability, it is not
necessarily complete. This is because of the potential for the
robot to enter a pathological loop consisting of mistaken
assumptions, partial execution, mistaken assumptions, and so on.
This can only occur if the robot's actions (as specified by effect)
are nondeterministic and the stateset contraction caused by
perceptual discrimination is overwhelmed by stateset expansion
caused by effectory uncertainty.
Theorem 2 Algorithm 1 with strong sel and false goal detection
is sound and complete if seffect=effect, sperceive=perceive and
states(effect(a,n) lessthanorequalto states(n).
The only additional step in this proof, beyond the proof of
Theorem 1, is to show that Algorithm 1 eventually terminates
under the additional assumption of deterministic actions. Under
Assumption 1, we know that the planner is complete and
therefore the only source of incompleteness involves the
potential for the number of planning episodes, j, to be
unbounded. In fact, we can show that j must be less than or
equal to states(I). This uses the fact that a planning episode can
only end prematurely if a perceptual contradiction occurs
(Algorithm 1, step 6). In this case, at least one member of the
initial conditions I has been shown to be incorrect since
states(sel(I)) must contain at least one state. Therefore, the
consistent subset K of I is subsumed by I. Therefore, the current
node N must correspond to fewer states than I since states(N) <=
states(K). Since the corresponding state set shrinks in size by
one as a result of every planning and execution episode, the
result follows that there can be no more than states(I) episodes.
5 Continuous Selection
Continuous selection architecture is born out of a desire to
maximize the use of runtime perceptual inputs. We begin by
demonstrating that Algorithm 1 will, in some cases, fail to make
optimal use of runtime perception. Then, we describe an
algorithm that maximizes the use of runtime feedback by
replanning after every action execution. Finally, we present
soundness and completeness results for the continuous selection
architecture.
5.1 Maximizing the Use of Information
Three forms of information gain appear in Algorithm 1: (1) the
conditional planner branches based on possible perceptual inputs,
breaking one large problem into several smaller problems, each
of which is easier because of information gain; (2) the
assumptive architecture introduces "virtual" information gain by
making simplifying assumptions; and (3) the robot uses
executiontime perceptual inputs as a direct form of information
gain to both update and "narrow" I.
Given that Algorithm 1 depends to such a large extent on the
concept of information gain, it is surprising to note that the
algorithm does not make maximal use of perceptual inputs
during runtime. In particular, Algorithm 1 has the ability to
overcommit to a particular assumption, despite perceptual
feedback that could be used to change the assumption.
For example, assume that I = {a,b,c,d} for a robot that has a
sel function consistent with the following: sel(a,b,c,d,e}={a,b}
and sel(g,h,i}={g,h}. Fig. 11 depicts the scenario for
overcommitment in this situation.
The planner will use A={a,b}=I1 for planning and will update
both I and A during execution. Suppose that during execution the
robot reaches node In as shown in Fig. 11. The shaded area
indicates An, which is always subsumed by In.
If the robot were to reinvoke sel on node In, the result would
be {g,h} and not the singleton state set {h}. Algorithm 1 would
never conduct such reselection unless and until An became
wholly empty because of contradictory perceptual inputs.
The basic idea is that selection and progression are not
necessarily commutative. If we index A and I temporally, the
problem can be expressed simply:
A_n notequal sel(I_n)
5.2 Continuous Selection Architecture
We propose an alternative algorithm that favors the right side of
this inequality by reselecting and replanning continuously.
Algorithm 2: Continuous Selection
1. A = sel(I)
2. Construct a plan C from A to G using seffect and
sperceive
3. Execute the next step of C
4. Update I using the action just executed, a
I = effect(a,I)
5. Update I using the current percept, p :
I = perceive(p,I)
6. I subsetof G? return Success
7. Goto 1
Algorithm 2 will invoke the planner far more frequently than
Algorithm 1. However, this characteristic has advantages and
disadvantages. On one hand, the total conditional planning cost of
Algorithm 2 may be much higher than that of Algorithm 1. But
Algorithm 2 forces a tighter interleaving of planning and
execution. In domains with incomplete information, where
executiontime percepts are crucial, delaying planning longer can
make the future planning episodes much less expensive. An
additional benefit of this high planning frequency is that only the
first step of a plan is ever used. This allows the strong selection
constraint that preserves goal reachability to be weakened
significantly:
Definition. Given a problem domain and I, sel is weak iff:
forall s memberof states(I)
exists r{path_1(s,r) AND not(path_1(r,s))} IMPLIES
s memberof states(sel(I))
The intuition behind this weak selection constraint is that since
only a single step of a given plan is executed, the reversibility
horizon is never deeper than one step.
Of course, this weakening of the selection constraint is not
free. In particular, the robot will fail to recognize deep, useless
plan sequences until it has wandered much further down such a
path at execution time. Therefore, weak selection admits a
potential for greater executiontime backtracking than strong
selection.
5.3 Results
Theorem 3 Algorithm 2 with weak sel and false goal detection is
sound and preserves goal reachability if seffect = effect,
sperceive = perceive.
The proof of Theorem 3 is almost identical to that of Theorem 1.
The difference is that only one step of a plan is ever executed.
Therefore, in proving the basecase of the induction, the strong
selection criteria is only required to depth 1, which is precisely
weak selection.
An important advantage of Algorithm 2 involves the constant
depth of weak selection. Because of this fixed depth, the cost of
weak selection is constant and therefore, unlike strong selection,
independent of the solution search depth.
6 The Price of Assumptions
An exhaustive comparison of Algorithm 1 and Algorithm 2 is
very difficult because the relative costs depend on the actual
percepts encountered and the specific sel functions used. If the
two algorithms use identical sel functions (modulo strong and
weak selection), it is the case that the initial node A in Algorithm
2 will be subsumed by the initial node of Algorithm 1.
Therefore, the first planning episode of Algorithm 1 will always
be of equal or greater cost.
Although this analysis applies only to the first planning
episode, it is significant from an informationgain perspective.
Any ability to stop planning early and execute is an opportunity
to gain information through execution and thus simplify future
planning episodes.
A more quantitative cost comparison is possible between
assumptive planning in general and conditional advance
planning, which exhibits no interleaving of planning and
execution. We present two types of cost comparisons between
conditional and assumptive planninga: planning complexity cost
and executionlength cost.
6.1 Planning Complexity
For the analysis of planning complexity we introduce two new
variables: l represents the search depth for the first assumptive
episode; and j refers to the total number of incorrect assumption
episodes. Note that j is identical to the counterexample measure
defined by [Rivest and Shapire, 1993].
We have looked at the relative costs of conditional and assumptive
architectures in three classes of problem domains.
Double whammy represents a problem domain that is so
unforgiving that every step executed under an incorrect
assumption puts the robot two more steps away from the goal.
Furthermore, double whammy assumes that assumption errors
are not detected until the very last step of the conditional plan
being executed. Under these extreme conditions, the assumptive
approach is at a clear disadvantage because of the exponential
backtracking cost.
Ceiling distance captures what may appear to lie at the other
extreme: domains in which the robot cannot get further than a
certain ceiling, c, from the goal. In this case, l turns out to
dominate the total assumptive planning cost. A further intuition
behind this domain lies in the possibility that actions taken under
incorrect assumptions can still be benign and even useful.
In realworld domains, we have discovered that both of these
analyses can overestimate the assumptive planning cost.
One reason is that the probability distribution over the states in I
is usually not uniform. Therefore, the assumption made by sel
tends to be correct. In addition, execution under an invalid
assumption still tends to take the robot closer to the goal. In the
kind world case, we assume that every l actions executed by the
robot place it l steps closer to the goal.
6.2 Execution Length
The second type of comparison we make between conditional
advance planners and assumptive architectures involves the
number of steps actually executed. This comparison requires the
introduction of two new domain characteristics:
m represents the reversibility ratio of the specified domain. For
every action executed from a state s in the domain, m is an upper
bound on the number of actions required to return to state s.
n represents an upper bound on the number of actions executed
under an invalid assumption before the robot encounters an
assumption error.
Given I, m, and n we can express an upper bound for the
competitive ratio of the number of steps executed by the
conditional preplanner versus the number of steps executed by
an assumptive robot:
(I1)mn.
This is a weak bound in that it assumes the (unrealistic) worst
case scenario in which every assumption error requires full
executiontime reversal. More commonly, assumptive execution
results in nonoptimal progress toward the goal, despite an invalid
assumption; therefore, m can be much smaller in practice than
the actual reversibility ratio.
Taken to the extreme, assumptive architecture optimality
occurs precisely when the effective value of m drops to zero,
indicating that the executionlengths of the assumptive and
conditional approaches converge. The relationship of such an
optimal assumptive plan to an optimal conditional plan is easy to
describe: the plan overlay of the optimal assumptive plan from I
is a subplan of the optimal conditional plan.
Surprisingly, if an assumptive architecture produces such
optimal assumptive plans at every planning episode, then the
robot as a whole will have optimal execution behavior. This
follows from the fact that both the assumptive architecture (i.e.
Algorithms 1 and 2) and the conditional planner progress I based
on perceptual inputs. Thus, the assumptive architecture will
encode the same information gain during execution that the
conditional planner uses for its conditional decision points during
planning.
These conditions may seem unrealistic in realworld problem
domains. Yet, the next section describes an implementation that
has both resulted in optimal assumptive execution behavior.
7 The Robots of CS 224
CS 224 is a handson introduction to robot programming for
undergraduate and graduate students. The course covers the
theory of perception and action as well as pragmatic aspects of
robot programming, culminating in robots that plan and act in the
real world despite incomplete information. The assumptive
architecture is one of several interleaving techniques
implemented by the students midway through the course, after
several weeks of constructing lowlevel reactive systems.
All student assignments for CS 224 involve programming
four actual robots; the students never use simulated robots.
These Nomad 150 robots have sixteen sonars, arranged radially,
and motor encoders as their only inputs. The robots are fully
selfcontained, housing Macintosh Powerbooks running
Common Lisp as the development environment.
The robots' domain is a maze world built with threefoot
high, 40 inch long cardboard walls. The position of the wall
segments is highly constrained. All maze walls are rectilinear
that is, either parallel to the xaxis or the yaxis and are located
on the boundaries of uniformly spaced grid lines that subdivide
the maze. The size of these mazes is limited only by the space
available. By the end of the quarter, mazes are typically 8 x 5
grid lines large, with 35 internal walls. Fig. 12 pictures one of
this year's robot teams developing robot navigation code in a
small maze.
The constrained nature of this maze world  most importantly,
the fact that walls must occupy the edges of cells allows a clear
abstraction to a highly granular representation of world state. In
particular, one can specify robot position and orientation in terms
of a cell and a cardinal direction without losing reachability
information. This means that whenever a solution exists in the
maze world at the fine level of degrees and inches, then there
must be a solution at the higher level in terms of cells and the
four cardinal orientations.
Although this abstraction is the basis of the robots' high level
planning systems, the students have to first implement reliable
abstract actions and percepts using the robot's much more fine
grained outputs and sensors. The students implement 16 high
level percepts, corresponding to every possible configuration of
walls immediately surrounding the robot (i.e. walls on the edges
of the cell containing the robot).
Of course, a real robot is rarely aligned perfectly with the
maze world and centered exactly in a cell. The students must
design a sensor interpreter that is reliable despite reasonable
orientation and position errors. This is a nontrivial challenge
since sonar echoes that strike walls at shallow angles tend to
coherently bounce away, providing no depth information. Given
a robot with only 16 sonars, this leaves very few sonars with
valid readings.
The right set of highlevel actions is easy to predict: move
forward one cell, turn left (90 degrees), turn right. The move
forward action may make certain assumptions about the robot's
initial conditions: the degree to which the robot is centered and
aligned within a cell. But the action must also guarantee that, at
termination, the robot will fulfill those assumptions. The
students generally use bounds of +/ 5 inches from the cell center
and +/ 23 degrees from rotational alignment.
Once the students have robust highlevel actions and
percepts, they go on to design an assumptive architecture for
travelling from a known initial position in a maze to a specified
goal position. The challenge is that the robot has no knowledge
about the positions of the internal walls of the maze. The
number of possible initial conditions grows quickly with the size
of the maze. For a 4 x 5 maze, the number of possible initial
configurations is slightly larger than 2^27.
The assumptive approach is an excellent candidate in this
"safe" world that has far too many possible initial conditions.
But even the assumptive approach requires keeping track of I,
not just A. Explicitly enumerating the initial state set is
obviously not an option. Most teams choose a propertybased
representation in which the set of possible states is partitioned
based on the existence (or lack) of a wall on a particular cell
edge. This is an extremely cheap representation method since a
placeholder for each cell edge suffices, indicating whether the
existence of a wall at that location is known or unknown. As the
robot moves through the world, it adds properties because it
gains information about the existence of walls at particular cell
edges, monotonically decreasing the number of possible maze
configurations.
Note that this propertybased representation, although
practical, gives up some degree of expressibility. With an
explicit stateset enumeration, one could express the condition
that there is either one wall at position x1 OR one wall at position
x2. This type of disjunction is impossible in the above
representation because the properties of cell edges are taken to be
independent. The robot's position is justifiably orthogonal to the
maze configuration; therefore, it is specified as a separate tuple. In the case of incomplete information about
robot position, one simply uses a list of such tuples.
The perceive function is easy to implement as a lisp function
that takes a node and a high level percept and returns a new node
in which the existence of walls at cell edges adjacent to the robot
is known. This maps, in the stateset formalism, to a function
that filters a set of states, returning only the subset that is
consistent with the current percept.
The effect function simply changes the orientation of the
robot in the case of turnleft and turnright. But when called with
moveforward, effect may progress the robot to the next cell, if
there is no wall blocking that path; or, it may keep the robot at its
current position if there is such a blockade.
A final issue is the specification of the sel function. The
most commonly used sel function is one that chooses a node
representing complete information (i.e. a singleton state set) in
which the number of internal walls is minimized to only those
walls that must exist because they have been encountered.
Fig. 13 contrasts the actual maze used to test students'
assumptive solutions (13a) with the maze information given to
the robots at the beginning of each robot's run (13b).
Table 2 compares the properties of the search space of a
conditional planner for the particular maze in Fig.
13 with the search space of the assumptive solution described
below.

A surprising result is that, for this specific problem, the
assumptive Algorithm 1 with the wallminimizing sel function is
optimal. The optimal conditional plan must include a trace of
execution that moves the robot from its current position straight
to the goal since such a path may exist according to the initial
conditions. The assumptive architecture is indeed generating this
path; therefore, an overlay of its action sequence matches this
conditional trace perfectly.
Of course, the total planning cost of the assumptive approach
is trivial compared to the execution time while the total planning
cost of conditional advanceplanning is 4^k16^k for a kstep
solution. The students' assumptive robots in the maze of Fig. 13
planned for a total of less than 1 second total and typically
achieved the goal position within 9 minutes, after executing 18 to
27 actions.
8 A Whirling Dervish
The 1994 AAAI National Robot Competition consisted of two
events, the first of which challenged robots to reliably navigate a
simulated office environment. The office environment was built
with partition walls similar to those used to create cubicles in
modern offices. The building also contained realworld
furniture, such as chairs, tables and trash cans. Additionally,
some hallways or foyers would be filled with large boxes or
clusters of furniture, making them impassable.
The robot would be given a topological map of the environment;
that is, purely nonmetric information about the connectivity of
rooms and hallways. Fig. 14 geometrically depicts a segment of
an office environment (14a) and the corresponding topological
map (14b).
The robot would start in a known initial room and was
required to travel to a goal room specified at the beginning of the
run. The challenge was to reach the goal room reliably in spite
of obstacles that render the supplied topological map inaccurate.
We chose to enter this contest because it provided us with an
excellent motivation for applying lessons learned from outdoor
navigation to a problem we had avoided: indoor office
navigation. Like the students of CS 224, we would avoid
simulation altogether. This meant developing and testing our
robot in several office buildings at Stanford University despite
the fact that the contest would be in a "simulated" office building
constructed with plastic partitions.
Dervish's hardware was born out of the modification of a
fouryear old Nomad 100 mobile robot. We removed infrared
and tactile (bumper) sensors, then rearranged the Nomad's
sonars, which were initially positioned in a traditional, radial 16
sonar ring. The sonars were repositioned in two ways: clusters of
sonars pointed in approximately the same direction, providing
redundancy and therefore more reliable freespace information;
and, clusters of sonars were pointed up and down at radical
angles in order to detect ledges and tabletops that may decapitate
Dervish [Nourbakhsh, 1995].
The issue of representation proved to be more challenging in
this domain than in the world of CS 224. Here, there is no
regular breakdown of the world into cells, with obstacles
appearing only at the edges of cells. The strength of the CS 224
approach was that the representation specified world state at the
same level of granularity at which goals were specified and also
at the same level at which obstacles affected maze connectivity.
This same desirable quality is captured in Dervish's world by
choosing a representation at the same level of abstraction as the
topological map. The difference between this approach and that
of CS 224 is that Dervish's cells cannot be constrained to a fixed
size. Fig. 15 demonstrates the manner in which our
representation effectively quantizes the geometric world of Fig.
14(a) using the topological information of 14(b).
Unlike CS 224, obstacles can appear anywhere in an office
environment. But the point of quantizing space into cells is to
ignore geometric position at any finer level of granularity than
the cells themselves. The solution is to represent obstacles as
being on cell edges despite the inherent inaccuracy. This is
acceptable because there is no need to model obstacles at a finer
level of detail than that of changing the connectivity of adjacent
cells. Of course, one must take care to place the obstacle at the
correct edge. For instance, an obstacle at cell 23 in Fig. 15 can
block either the path from 2 to 23 or the path from 3 to 23.
Having made this simplification, we are able to represent
nodes in the office environment exactly as in CS 224: as a set of
explicit robot positions (in terms of cell and orientation) and
explicit positions of known blockades.
Given a representation identical to that of CS 224, one would
rightly expect the highlevel actions and percepts to be similar, as
they indeed are: Dervish's actions are moveforward, turnleft,
and turnright. Moveforward is a complex version of the same
written for CS 224: it is charged with the task of maintaining
hallway centerline, avoiding small obstacles, discovering
impassable obstacles, and preserving a continuous sense of
hallway width and orientation.
The highlevel percepts Dervish uses are more complex and
much more difficult to implement than those used in CS 224.
Dervish detects hallways, closed doors, and open doors on each
side as it moves down a hallway. The perceptgenerating code
makes extensive use of histories of sensor data, comparing
expected sonar readings based on hallway dimensions and
encoder counts with actual sonar data. (Nourbakhsh, 1995;
Nourbakhsh et. al., 1995).
The reliability of these percepts is crucial because the
termination of the moveforward action depends upon perceptual
"events." Dervish's primary action is hallwayfollow, and this
action must remain in force until the robot reaches an intersection
where it will turn. But Dervish can receive a variety of
percepts over the course of a single moveforward episode, as a
result of perceptual false positives. There is a temporal
mismatch between the frequency of new perceptual inputs and
the duration of actions.
With this mismatch in mind, we combined perceive and effect
(i.e. steps 4 and 5 in Algorithm 2) into a single method for
updating the robot's state set using the current percept. We call
this approach stateset progression. As Dervish moves down a
hall, whenever it discerns a percept it updates the state set by
progressing each state in the state set based on the percept and
the current action.
Progression involves two steps: (1) removal of a state and
replacement with all possible subsequent states, based on the
direction of travel (i.e., nondeterministic effect); and (2)
attribution of a likelihood to each new state in the state set based
on the current percept (i.e., perceive with imperfect
discrimination).
To accomplish the computation of perceive, we include a
certainty matrix that associates a likelihood with every realworld
feature and possible perceptual input. For example, in the
Computer Science Department at Stanford, Dervish has a 60%
chance of detecting a closed door correctly and a 40% chance of
mistaking the closed door for a wall.
The representation thus associates a likelihood with every
robot position; the node is literally represented as a list of robot
positionlikelihood pairs and a list of blockade positions.
Dervish uses Continuous Selection (Algorithm 2) to reach the
goal from its specified initial conditions. The planning in step 2
is performed using an sperceive and seffect that are much more
reliable than perceive and effect: sperceive models perfect
sensors and, in the same vein, seffect models deterministic
actions (i.e., moveforward is assumed to always terminate
successfuly at the next node).
Dervish's sel function chooses a singleton state set,
corresponding to complete information, in which the robot's
position is simply the position in the state set with the highest
associated likelihood. Furthermore, the world's connectivity is
maximized by assuming that there are no blockades except for
those directly sensed, identical to the internal maze walls of CS
224.
Table 3 compares Dervish's assumptive performance in the actual
AAAI contest finals maze with the theoretical performance of a
conditional advanceplanner. Note that although the total
number of states is much smaller in this domain than in the
CS224 maze world, the conditional advanceplanning approach
is prohibitively costly because of the solution length required
when planning with perceive and effect rather than sperceive and
seffect.
In its final contest run, Dervish actually encountered one closed
door to the goal room, a blocked hallway and three
false positive percepts and nevertheless executed correctly for
approximately 15 minutes to reach the goal room through
another entrance.
Even though Dervish replanned frequently, its planning time
was insignificant compared to its execution time. Recall that the
sel function forced the planner's initial condition to be one with
complete information and that seffect and sperceive are
deterministic and perfectly reliable, thus resulting in very low
planning costs. Yet Dervish's behavior in the final run was
(coincidentally) identical to the optimal behavior we would
expect from a complete conditional plan produced offline. As
with CS 224, the office building domain is a kind world:
incorrect assumptions are generally discovered at execution time
before the invalid assumption leads to suboptimal execution
behavior.
9 Discussion
The assumptive architecture enables interleaving of planning and
execution by planning with an unrealistic pretense of
information. Much of the recent work on interleaving planning
and execution falls in a separate category called deferred
planning, in which the planner terminates search before reaching
the goal conditions. Several researchers have developed domain
specific heuristics for deferred planning that are based on
information gain.
Olawsky et. al. provide methods for terminating search at
plan fringes that condition based on sensory input (Olawsky,
Krebsbach & Gini, 1993). The resulting domainspecific
heuristics allow nonoptimal but correct interleaving in both fully
reversible domains and more complex domains in which specific
heuristics are generated offline to avoid catastrophic early
execution.
Dean et. al. introduce an interleaving system for a mobile
robot navigation problem (Dean et. al., 1990). Again, early
planning termination occurs when the planner discovers actions
that provide executiontime information gain (i.e. robot position
information). The termination heuristic presented is limited to
reversible domains in which information gain is always the right
option to choose; however, no formal analysis of the method is
presented.
Durfee & Lesser present a deferred planning system that
has a subtle assumptive element (Durfee & Lesser, 1988). Their
control architecture forms a particular hypothesis about an
object's trajectory, acting in accordance with that hypothesis until
receiving information to the contrary. The method is a special
case of the manner in which assumptive algorithms detect
assumption errors and then consistently modify the assumptive
set as a result.
Two domainindependent analyses of deferred planning and
execution are presented by AmbrosIngerson & Steel and
Genesereth & Nourbakhsh (AmbrosIngerson & Steel, 1988;
Genesereth & Nourbakhsh, 1993). AmbrosIngerson & Steel
present two methods for interleaving. (1) Variable instantiation
in a STRIPSbased representation is seen as information gain and
is a point for planning deferral and early execution. (2) Runtime
errors cause reinvokation of the planner, as required, to modify
the intended path to the goal. This criteria is useful only when
the executable actions are in fact less reliable than operator
descriptions used during planning. This method is therefore
similar to the assumptive system's use of seffect .
Although AmbrosIngerson & Steel incorporate elements of
both deferred and assumptive planning in their architecture, they
do not support conditional planning. In problem domains
requiring plantime conditioning because of high degrees of
incomplete information and irreversibility, this architecture can
prove to be incomplete.
Interleaving planning and execution is just one way of
addressing a more fundamental problem: the synthesis of robot
systems that perform capably in realworld domains with incomplete
information. Another class of solutions focusing on this problem
involves the offline design of
situated robots, avoiding the runtime planning problem altogether
(Nilsson, 1992; Brooks, 1986; Firby, 1987; Slack, 1993). These
methods have demonstrated success when the offline design is
performed carefully and the domain is wellknown.
Yet, this approach ignores the potential
value added in highlevel planning. Such planning can imbue a
robot with more goaldirected behavior and can be crucial in
preserving goal reachability, especially in domains with
irreversibility and incomplete information.
A natural progression is to create systems that plan, or
design, situated robot systems. Both Kaelbling & Rosenschein
and Schoppers have presented compilers that produce, as output,
a universal plan mapping the robot's information states to actions
(Kaelbling & Rosenschein, 1990; Schoppers, 1987). In both
cases, knowledge state is represented in terms that map
semantically to state sets, making the representations instances
of the "nodes" defined here. However, the above work
requires complete universal plans to be generated before
execution begins. In contrast, the assumptive architecture takes
advantage of information gain during early execution to plan
only for cases that the robot actually encounters.
Yet another class of solutions involves the use of planners
and planning schedulers that reason about the availability of
limited resources, and thus about the value of planning versus
executing. Dean & Boddy present a formal framework for the
anytime planning approach and Zilberstein & Russell have
implemented such an anytime system to solve a mobile robot
problem (Dean & Boddy, 1988; Zilberstein & Russell, 1993). Anytime
planning is not a solution in itself, however. An
advanceplanning system using an anytime planner still fails to
interleave planning and execution. Zilberstein & Russell create
an interleaving system by allowing multiple episodes of anytime
planning. Ironically, the anytime planner requires domain
information about the relationship of plan quality to planning
time despite the fact that the implementation is motivated by a
need to plan and act in cases of incomplete domain information.
Drummond has made progress in the construction of a planner
that fulfills the anytime requirement of monotonic plan quality
improvement while specifying an approach that is wellsuited to
robot domains (Drummond, 1993).
In another research arena, the finiteautomata learning
community has provided the most complete framework
concerning the interaction of a robot with its environment
in cases of incomplete information. Rivest & Shapire formalize
both the interaction paradigm involving
action and perception and the related concepts of perceptual
aliasing and state uncertainty (Rivest & Shapire, 1993).
Furthermore, they present expectationtracking mechanisms and
the subsequent discovery of perceptual counterexamples (or
assumption errors). Both Shen and McCallum have described
algorithms for learning finite automata by assuming
simple versions of those automata and recognizing
assumption errors during execution (Shen, 1993; McCallum, 1995).
These methods are restricted to environments whose statespace
diagrams are strongly connected and, thus, devoid of irreversibility.
In contrast, the present work has the aim of applying simplifying
assumptions to dangerous domains using selection criteria that
preserve goal reachability.
A research topic that has not been addressed is the automatic
synthesis of assumptions that simplify planning while imposing
little or no executiontime penalty. For example, one can
imagine an analysis of the stateset graph of a domain in order to
uncover sel rules that are safe and useful. Another approach may
be to learn sel functions for a particular domain using multiple,
simulated execution runs. Until such automated sel design is
possible, the robot designer shoulders the burden of considering a
domain's specific properties in choosing an assumptive selection
strategy that allows frequent but safe early planning termination
and execution.
10 Conclusion
We have presented a formal framework for explicitly
representing common types of incomplete information. The
assumptive algorithms presented use this formalism in a
representation and domainindependent manner to effect
interleaving planning and execution. The algorithms make no
assumptions about reachability and connectedness in the state
space graph. These interleaving algorithms are unusual in that
they are sound and complete under clearly specified conditions; this
is a property that cannot be met by architectures that do not support
general conditional planning. Finally, both algorithms
have been demonstrated on real robots operating in nontrivial
domains.
The ultimate goal of robotics is to produce autonomous
robots that interact usefully with the real world. Such robots
will inherit from all of their ancestors: reactive, deferred,
assumptive and anytime systems. The assumptive architecture
makes its contribution by providing a means for behaving
safely and deliberately in spite of incomplete information.
Acknowledgements
We would like to thank Lise Getoor, Steve Ketchpel and David
Andre for insightful comments and questions on various versions
of this paper. The anonymous referees also deserve our gratitude
as their comments have led to a much improved paper.
References
AmbrosIngerson, J. and Steel, S. 1988. Integrating Planning,
Execution and Monitoring. In Proceedings of the Seventh
National Conference on Artificial Intelligence, pp. 8388.
Brooks, R. 1986. A robust layered control system for a mobile
robot. IEEE Journal of Robotics and Automation., 2:1423.
Dean, T., Basye, K., Chekaluk, R., Hyun, S., Lejter, M. and
Randazza, M. 1990. Coping with Uncertainty in a Control
System for Navigation and Exploration. In Proceedings of the
Eighth National Conference on Artificial Intelligence, pp.
10101015.
Dean, T. and Boddy, M. 1988. An Analysis of TimeDependent
Planning. In Proceedings of the Seventh National Conference
on Artificial Intelligence, pp. 4954.
Doyle, R.J., Atkinson, D.J., and Doshi, R.S. 1986. Generating
Perception Requests and Expectations to Verify the Execution
of Plans. In Proceedings of the Fifth National Conference on
Artificial Intelligence.
Drummond, M. 1993. ReactionFirst Search. In Proceedings of
the International Joint Conference on Artificial Intelligence,
pp. 140814.
Drummond, M. 1985. Refining and Extending the Procedural
Net. In Proceedings of the Ninth International Joint
Conference on Artificial Intelligence, pp. 10101012.
Durfee, E. and Lesser, V. 1988. Incremental Planning to Control
timeconstrained BlackboardBased Problem Solver. IEEE
Transactions on Aerospace and Electronic Systems. 24(5):
64762.
Etzioni, O., Hanks, S., Weld, D., Draper, D., Lesh, N. and
Williamson, M. 1992. An Approach to Planning with
Incomplete Information. In Proceedings of the 3rd
International Conference on Principles of Knowledge
Representation and Reasoning, pp. 115125.
Fikes, R. 1971. Monitored Execution of Robot Plans Produced
by STRIPS. Technical Note 55, Stanford Research Institute.
Fikes, R. and Nilsson, N. 1971. Strips: A New Approach to the
Application of Theorem Proving to Problem Solving. Artificial
Intelligence., 2:189208.
Firby, R.J. 1987. An Investigation into Reactive Planning in
Complex Domains. In Proceedings of the Sixth National
Conference on Artificial Intelligence, pp. 202206.
Gat, E. 1992. Integrating Planning and Reacting in a
Heterogeneous Asynchronous Architecture for Controlling
RealWorld Mobile Robots. In Proceedings of the Tenth
National Conference on Artificial Intelligence.
Genesereth, M. and Nourbakhsh, I. 1993. Timesaving Tips for
Problem Solving with Incomplete Information. In Proceedings
of the 11th National Conference on Artificial Intelligence, pp.
724730.
Genesereth, M. 1993. Discrete Systems. Course notes for CS
222. Stanford, CA: Stanford University.
Hsu, J. 1990. Partial Planning with Incomplete Information.
Presented at the AAAI Spring Symposium on Planning in
Uncertain, Unperdictable, or Changing Environments, 2729
March, Stanford, California.
Kaelbling, L.P. and Rosenschein, S.J. 1990. Action and planning
in embedded agents. Robotics and Autonomous Systems. 6:35
48.
Latombe, J.C., Lazanas, A. and Shekhar, S. 1991. Robot
Motion Planning with Uncertainty in Control and Reasoning.
Artificial Intelligence 52(1): 1 47.
RMcCallum, R.A. 1995. InstanceBased Utile Distinctions for
Reinforcement Learning with Hidden State. In Proceedings of
the XII International Conference on Machine Learning, pp.
387395.
Nilsson, N. 1992. Toward Agent Programs with Circuit
Semantics. Technical Report CSTR921412. Stanford,
California: Academic.
Nourbakhsh, I., Powers, R. and Birchfield, S. 1995. Dervish, An
OfficeNavigating Robot. AI Magazine 16(2).
Nourbakhsh, I. 1995. The Sonars of Dervish. The Robot
Practitioner 1(4).
Olawsky, D., Krebsbach, K., and Gini, M. 1993. An Analysis of
SensorBased Task Planning. Technical Report #9394.
Minneapolis, Minn.: Academic.
Peot, M. and Smith, D. 1992. Conditional Nonlinear Planning.
In Proceedings of the First Conference on AI Planning
Systems, pp. 189197.
Rivest, R.L. and Shapire R.E. 1993. Inference of Finite Automata
Using Homing Sequences. Information and Computation.
103:299347.
Rosenschein, S.J. and Kaelbling, L.P. 1995. A situated view of
representation and control. Artificial Intelligence. 73:149173.
Sacerdoti, E. 1974. Planning in a hierarchy of abstraction spaces.
Artificial Intelligence, 5:115135.
Schoppers, M.J. 1987. Universal Plans for Reactive Robots in
Unpredictable Environments. In Proceedings of the
International Joint Conference on Artificial Intelligence, pp.
10391046.
Shen, W.M. 1993. Learning Finite Automata Using Local
Distinguishing Experiments. In Proceedings of the Thirteenth
International Joint Conference on Artificial Intelligence, pp.
10881093.
Slack, M. 1993. Navigation Templates: Mediating Qualitative
Guidance and Quantitative Control in Mobile Robots. IEEE
Transactions on Systems, Man, and Cybernetics 23(2): 452
466.
Takeda, H. and Latombe, J.C. 1994. Planning the motions of a
mobile robot in a sensory uncertainty field. IEEE Transactions
on Pattern Analysis and Machine Intelligence. 14(10): 1002
17.
Warren, D. 1976. Generating Conditional Plans and Programs.
In Proceedings of the AISB Summer Conference, 344354.
Zilberstein, S. and Russell, S. 1993. Anytime Sensing, Planning
and Action: A Practical Model for Robot Control. In
Proceedings of the International Joint Conference on Artificial
Intelligence, pp. 140207.
Footnotes
Affiliation of the Authors:
Computer science department. Stanford University. Stanford,
CA.
Illah Nourbakhsh: doctoral candidate in Computer Science
Michael Genesereth: associate professor of Computer Science
1 In the case of maintenance goals, the problem can often
expressed as first transforming the initial world state to a member
of G, then guaranteeing that all successive transitions are only to
other states in G.
2 A more complex version of this problem involves stateset
unreachability, in which the uncertainty inherent in a
combination of states prevents the existence of any solution plan.
The same type of analysis applies to this more subtle problem.