Assumptive Planning and Execution: a Simple, Working Robot Architecture Illah R. Nourbakhsh illah@cs.stanford.edu Department of Computer Science, Stanford University, Stanford, CA 94305 Michael R. Genesereth genesereth@cs.stanford.edu Department of Computer Science, Stanford University, Stanford, CA 94305 Abstract. In this paper, we present a simple approach to interleaving planning and execution based on the idea of making simplifying assumptions. Since assumptions can be tragically wrong, this assumptive approach must ensure both that the robot does not believe it has solved a problem when it has not and that it does not take steps that make a problem unsolvable. We present an assumptive algorithm that preserves goal-reachability and in addition we specify conditions under which the assumptive architecture is sound and complete. We have successfully implemented the assumptive architecture on several real-world robots. Students in an introductory robot lab at Stanford University implement an assumptive system on robots that have incomplete information about their maze world. Dervish, our winning entry in the 1994 AAAI National Robot Competition, implements an assumptive architecture to cope with partially specified environments and unreliable effectors and sensors. Keywords: interleaving planning and execution, mobile robotics, assumptions, robot architecture, planning with incomplete information Figure captions: Fig. 1. The interaction paradigm between the robot and the environment. Fig 2. State-set transitions using (a) actions and (b) percepts. Fig 3. Property sets and corresponding sets of possible world states. Fig. 4. Positional uncertainty for a mobile robot system. Fig. 5. A conditional graph segment and an embedded conditional plan (emphasized). Fig. 6. An initial uncertainty region (a) and its simplification (b). Fig. 7. Arc transitions corresponding to (a) effect and (b) seffect.. Fig. 8. Arc transitions corresponding to (a) perceive and (b) sperceive. Fig. 9. A conditional plan (a) displayed graphically and (b) as a lisp program. Fig. 10. The overlay of the conditional plan from Fig. 9 using the actual initial conditions I. Fig. 11. An abstract example of the overcommitment problem. Fig. 12. CS 224 students developing navigation code in a small maze. Fig. 13. The CS224 assumptive student evaluation maze. Fig. 14. An office environment represented (a) geometrically and (b) as a topological map. Fig. 15. A quantization of the office environment of Fig. 14 based on connectivity. Table titles: Table 1. A comparison of the cost of conditional and assumptive planning. Table 2. A comparison of the conditional and assumptive search spaces for CS 224. Table 3. A comparison of the conditional and assumptive search spaces for the final contest at the AAAI 1994 National Robot Competition. Contact author information: Illah R. Nourbakhsh 415-725-8086 office 415-497-0511 home 415-725-7411 fax Computer Science Department Stanford University Stanford, CA 94305 electronic mail: illah@cs.stanford.edu 1 Introduction We plan and act in the world with the help of assumptions that enable us to ignore many absurdly complex worst-case scenarios. Yet we often program mobile robots to create complete plans from initial conditions that detail every conceivable case. In domains with pathological but improbable possibilities, this planning task quickly becomes intractable. Rather than foregoing planning because of the complexity of complete planning under these conditions, we propose that the robot use assumptions to make the planning task more tractable. When such simplifying assumptions are correct, the robot will reach the goal after executing the resultant plan. But if assumptions are chosen naively and turn out to be invalid, then the robot's actions may be disastrous. In this paper we present an architecture that incorporates simplifying assumptions, yet preserves the robot's ability to reach the goal even when the assumptions are wrong. This assumptive architecture is one method for interleaving planning and execution. Interleaved planning and execution was first introduced in the context of plan execution monitoring (Fikes, 1971). This work used the robot's run-time inputs to modify plan execution; however, the replanning was limited to modifying the program counter's position with respect to a fixed sequence of actions. Sacerdoti outlined the use of multiple episodes of execution preceding completion of the planning task, offering a paradigm that embodies the conventional view of interleaving planning and execution (Sacerdoti, 1974). A number of recent approaches to interleaving planning and execution have involved cutting a planning session short and executing early so that the robot can receive sensory feedback from the environment as early as possible. The resulting information gain can then be used to simplify the remaining planning task (Ambros-Ingerson & Steel, 1988; Dean et. al, 1990; Olawsky et. al., 1993; Genesereth & Nourbakhsh, 1993). The present work expands upon current research in this area by providing a formal framework for expressing incomplete information and presenting sound and complete algorithms for interleaving, even in domains with irreversibilities. After presenting the formal framework in section 2, we describe two assumptive algorithms in detail and compare their computational costs to those of a classical advance-planning system. Finally, we describe two successful implementations of the assumptive architecture on real-world robots in nontrivial domains. 2 A Representation for Action and Perception We begin by describing a formal model that expresses complete information about the interaction of a robot with its environment. We then show how common types of incomplete information can be formally represented within this model Finally, we demonstrate the representation-independent nature of the model by summarizing two instantiations using different representations. 2.1 A Model of Complete Information A robot is an agent with one (aggregate) input line and one (aggregate) output line. We represent the set of possible robot input vectors, or percepts, as P. The robot's set of possible outputs, or actions, is O. We refer to a particular action (i.e. a member of O) as a; and to a particular percept as p. The robot's behavior at each point in time may be viewed as a function from a history of percepts to an action: p*->a. As shown in Fig. 1, the robot is connected to its environment through its percepts and actions. The robot's outputs are inputs to the environment, which we assume is otherwise closed. The environment's output is coupled to the robot's input, allowing the environment's state s to determine the robot's perceptual input vector, or percept. A percept is a complete description of the robot's input. It corresponds to a vector of the values of every input line. This model of perception contrasts with the common view of robot perception as an environmental event or a perceptual "process" (Etzioni et. al., 1992; Olawsky et. al., 1993). In our model, a percept is not an active percept. We view a perceptual "process" as a sequence of actions that transform the state of the environment such that the robot's subsequent percept conveys meaningful information (Rivest and Shapire, 1993; Rosenschein and Kaelbling, 1995). We formally define the relationship between the robot's percepts and actions and the states S of the environment using two functions: result(a,s) is-a-member-of S see(s) is-a-member-of P Result maps an action and a state into the state set that results from the application of the given action in the given state. See maps each environment state into its corresponding percept. In the complete information case, the robot's goal can be expressed as the transformation of an initial state i into any member of a set of desirable states, G.[footnote 1] Formally, a robot will achieve its goal when execution of its plan from i eventually leads to a state in G. We express this concept by temporally indexing the world state: s_1 = i Exists n s_n is-a-member-of G The above formalism describes the complete relationship between the robot and the world. Unfortunately, robots do not always have such complete information. 2.2 A Model of Incomplete Information Incomplete information commonly occurs in three places: initial conditions, actions and percepts. For each type of incomplete information, the model is modified to allow partial specification of state and behavior. Incomplete information about initial conditions involves partial specification of the initial world state, in which case the robot must plan and act to reach the goal from any initial state consistent with the specification. We can partially specify the initial conditions using an initial state-set, I, rather than an initial state i. The state set denotes the logical disjunction of its member states. The concept of goal fulfillment changes accordingly to allow the agent to achieve the goal despite incomplete state information. Here S_n represents a state set, where s_i was a single state: S_1 = I Exists n S_n subset-of G This model of goal achievement means that the robot has achieved the goal when its state set is a subset of the goal set, G. The robot may still have incomplete information about its actual state; nevertheless, it must be at some goal state when this subset condition holds. Incomplete information about the result of actions can capture the behavior of both unreliable actions as well as truly nondeterministic actions. In both cases, the results of performing an action are only partially specified. Incomplete information about the discerning power of sensors can also capture both concepts of sensory unreliability and poor sensory resolution. In the case of poor sensory resolution, multiple states may be consistent with a single percept. This type of uncertainty, called "perceptual aliasing," can complicate planning if the robot's uncertainty encompasses more than one such state (McCallum, 1995). McCallum also notes that a single state may be consistent with multiple percepts. This corresponds to the case of sensory unreliability. Taken together, these two types of uncertainty cause many-to-many relationships between possible states and possible percepts. The new functions, results and sees, express the above types of incomplete information using states sets in lieu of single states, where U is some state set: results(a,U) subset-of S sees(p,U) subset-of U Note that results is identical to result, if the parameters are changed from state sets to states. Sees is less intuitive, however. It is essentially a filter function, returning that subset of the specified state set that is consistent with the given percept. The return value of sees is a state set whereas the return value of see is a percept. Fig. 2a depicts an action arc in the case of incomplete information: . The grab action transforms the state set containing states a and b into a larger state set containing four new states. The fact that the sink set is larger implies that the grab action is nondeterministic. For example, e and f may correspond to states in which the grab fails while g and h may correspond to different cases of successful grabs. Fig. 2b depicts perceptual branching in the case of incomplete perceptual information. The two arcs are labeled with percepts used to conduct the filtering operation. Therefore, the sink of the arc labeled p1 corresponds to sees(p1, {a,b,c}). In this case, only states a and b are consistent with p1. Note that state b is consistent with both p1 and p2. This means that this pair of percepts is only capable of discriminating state a from state c. If the solution from a to the goal differs greatly with the solution from c to the goal, this perceptual branch may permit a solution whereas a plan that ignores sensory input may fail. 2.3 Representation-Independence Robots frequently take advantage of efficient, domain-specific representations that do not require explicit enumeration of state sets. Of course, a formal mapping from the special-purpose representation to state sets will exist. We abstract the above definitions to a representation-independent level in order to emphasize the model's applicability to any domain-specific representation. The central objects transformed by results and sees are state sets in the model we have presented. We abstract these objects to a representation-independent concept called nodes. There must be a well-defined mapping from nodes to state sets, including the state-set specific functions results and sees. We define these mappings using the states function and the new functions effect and perceive, which transform a node n rather than a state set: states(effect(a,n)) = results(a,states(n)) states(perceive(p,n)) = sees(p,states(n)) Perceive and effect refer to the domain-specific implementations of the robot's input and output functions, respectively. The above formulas specify the correspondence between perceive and sees and between effect and results. The number of possible domain-specific representations is of course unlimited. We discuss two representations below to clarify the relationship between state sets and more specific representation schemes. 2.4 Property Set Representation One well-known representation technique has its origins in STRIPS and involves the specification of nodes as sets of properties, which formally denote the intersection of the corresponding state sets (Fikes & Nilsson, 1971). The property set {r1,r2,r3} denotes the state set . Within the property set approach, a common way to implement effect and perceive involves defining add lists and delete lists. In that case, effect(a,n) is defined as . Fig. 3 depicts two property sets and their corresponding state sets. Note that Fig. 3b maps to a singleton state set and thus represents complete information about the world state. Using a property set formalism obviates the need to translate down into the explicit state set representation when implementing the assumptive architecture because we will specify the assumptive architecture in terms of nodes, effect, and perceive. Yet, a precise mapping from property sets to state sets does exist because a property can be mapped to a state set and, therefore, a set of properties can be mapped to the intersection of a set of state sets. 2.5 Geometric Representation In the case of mobile robotics, a useful way of capturing positional uncertainty (i.e., world state) is to geometrically describe the smallest two-dimensional region, Ur, certain to contain the robot. Fig. 4 depicts an example of Ur for a robot traveling down an outdoor arcade. A clear mapping to state sets again exists: Ur represents the state set containing every state in which the robot is positioned inside the region delimited by Ur. Of course, in practice a finite mapping would require a discretization of the continuous geometric space. In this geometric space, effect and perceive can be defined in terms of polygonal transformations to effect highly efficient expansion of the nodes during search, allowing for fast motion planning algorithms. (Latombe et. al., 1991). 2.6 Representation Requirements These are just two of many possible representation schemes that are computationally practical for specific classes of problems. We go on to describe assumptive architectures without regard to any particular representation. In order for our algorithm to be applicable to a particular representation, two conditions must be met. (1) The representation must define effect and perceive. (2) The representation must define subsume, which can be specified as follows: subsumes(n_1,n_2) iff states(n_1) superset-of states(n_2) For example, the following formula is true for Fig. 3: subsumes({on(a,b)},{(on(a,b),on(b,c)}) 3 Advance-Planning Architectures The traditional approach to problem-solving is easy to understand because it clearly subdivides the general problem into two subproblems: planning and executing. We call this advance- planning architecture because the problem-solving method is a two-step process: (1) Plan from the initial conditions to the goal conditions; (2) Execute the plan. This approach has the benefit of allowing the system designer to avoid wrestling with the complexities of interleaving planning and execution. However, the incurred cost is that the planning episode must result in a solution from the initial conditions all the way to the goal. In this section, we motivate the use of conditional plans and describe a general conditional advance- planner using the formalisms provided in section 2. Although the resulting planner will be similar to standard state-set based planners, this exercise will make the role of incomplete information explicit in the search process, facilitating the shift to interleaving architectures. 3.1 Conditional Planning Finding a sequential solution from the initial conditions to the goal is often impossible when the robot is faced with incomplete information. The solution is to construct conditional plans that take advantage of run-time perceptual feedback as information gain. Several classical instantiations of general conditional planning have been described in the literature (Warren, 1976; Drummond, 1985; Genesereth, 1993). Unfortunately, robot problems often cannot take advantage of general conditional planning because its search space continues to be prohibitively large, as a conditional search engine branches not only on possible actions to perform but also on possible percepts that the robot may encounter. More recently researchers have introduced limited forms of conditionality into planning, combining more tractable search methods with the powerful idea of run-time information gain (Kaelbling & Rosenschein, 1987; Doyle, Atkinson & Doshi, 1986; Etzioni et. al., 1992; Peot & Smith, 1992). In domains where irreversibility is rampant and optimal performance is required, this form of advance-planning is sometimes the only option. 3.2 A Node-Space Conditional Planner Consider the search space of a node-space planner. Because of the incompleteness of its information, the robot must use run- time perceptual inputs as feedback in order to potentially achieve information gain. The planner must allow for such perceptual discrimination in advance by searching a graph that not only branches based on possible actions but also branches based on possible perceptual inputs. Thus, the planner searches a conditional graph. A conditional graph is formally a directed bipartite graph with alternating perceptory nodes and effectory nodes. Fig. 5 depicts such a graph, using squares to indicate effectory nodes and circles to indicate perceptory nodes. A perceptory node has action-labelled arcs leading to it and percept-labelled arcs emanating from it. An effectory node has percept arcs leading to it and action arcs emanating from it. An action arc labelled with action a connects node n1 to node n2 iff effect(a,n1)=n2. Similarly, a percept arc labelled with percept p connects n1 to n2 iff perceive(p,n1)=n2. Note that this means that the sink of a percept arc is always subsumed by its source: subsumes(n1,perceive(p,n1)). Of course, building the entire conditional graph is unreasonably expensive since the number of nodes is 2^|S|. Therefore, a planner searches for a conditional solution by incrementally constructing the graph using effect and perceive. The search process is identical to an and-or graph search because the conditional plan must select actions sequences (or-branch), but must do so over all percept sequences that the system can encounter during execution (and-branch). Unfortunately, complete conditional planning quickly becomes too costly. For a system with |O| actions and |P| percepts, the cost is exponential based on the length of the solution, k: O(|P|^k|O|^k). Of course, |P| and |O| represent theoretical worst-case branching factors. In instances where some percepts are inconsistent with world conditions or some actions have no impact on world state, the effective branching factors may be much smaller. 4 An Assumptive Approach Assumptive architectures decrease total search cost by incorporating simplifying assumptions that make the planning task tractable. We consider three types of assumptions in this paper: assumptions about initial conditions, assumptions about the behavior of actions, and assumptions about the behavior of percepts. After describing these assumptions, we provide an assumptive algorithm that interleaves planning and execution, then provide soundness and completeness results. 4.1 Assumptions about Initial Conditions One of the forms of a robot's uncertainty can be incomplete knowledge about its exact initial conditions. For example, Fig. 6a depicts a navigation robot faced with a high degree of positional uncertainty. Rather than trying to produce a plan guaranteed to work from every possible position captured by Ur, the robot can make simplifying assumptions to "virtually" reduce its initial positional uncertainty and produce plans based on these assumptions, as in Fig. 6b. Since the number of cases being considered under the assumption is a proper subset of the actual initial conditions, any plan from the actual conditions will certainly work from the assumed conditions. Therefore, the solution from the assumed conditions is guaranteed to be the same length or shorter than the solution from the actual conditions. We define a selection function sel that is constrained to simplify a node by making assumptions that correspond to informatio gain: Definition. sel returns a node representing information gain: subsumes(I,sel(I)). We will refer to the assumed initial conditions, sel(I), as A. 4.2 Assumptions about Actions A robot often has available actions that are reliable almost always. One way to capture this unreliability is to model the action, via effect, as nondeterministic. For example, consider a STRIPS gripper that sometimes slips. Fig. 7a depicts the result of the grab operation from a particular node. Note that the nodes in Fig. 7 are specified as explicitly enumerated state sets. The robot can make simplifying assumptions about the reliability of its actions in order to reduce or even eliminate this nondeterministic property. Fig. 7b depicts the result of grab from the same source node when the planner assumes away the potential for gripper failure. Definition. The function seffect represents a simplification of effect iff: subsumes(effect(a,n),seffect(a,n)) 4.3 Assumptions about Percepts Analogous to unreliable actions, the percepts as defined by perceive may be incapable of distinguishing between states of the world because of the existence of false positives or false negatives. Given the robot's set of percepts {p1,p2,..,pn}, we know from the definition of see and perceive that the sink nodes corresponding to all perceptual arcs from a specific source node must together cover the source node. Fig. 8a depicts this situation for a robot with two percepts. Note that the two sink nodes are not mutually exclusive in Fig. 8a. The fact that the sink nodes share states b and d indicates that the robot's sensors cannot discriminate these states, perhaps because of false positives. Simplification of perceive is more complex than the simplification of effect. Definition. Given a robot with percepts P = {p1,..,pm}, the function sperceive is a simplification of perceive iff: Forall i subsumes(perceive(p_i,n), sperceive(p_i,n)) AND UNION(sperceive(p_1,n),...,sperceive(p_m,n)) = n This formula allows sperceive to exhibit better discrimination than perceive, but requires all percepts, together, to cover all states. Fig. 8b is an example of such a simplification because state b appears in only one of the two sink nodes, and all source node states can be found in at least one sink node. Insofar as 8b eliminates one false positive, the simplification represents greater information and may make planning cheaper. 4.4 The Assumptive Algorithm The assumptive algorithm we present uses a conditional planner and implements the simplifying assumptions described above. Of course, the assumptions may turn out to be wrong. In order to deal with this possibility, the robot monitors the execution of its plan. If the robot's actual percept contradicts its expectation, the robot knows that it made a bad assumption. In that case, it must take the new sensory data into account to change its assumptions and replan. Algorithm 1: Assumptive Algorithm 1. A = sel(I) 2. Construct a plan C from A to G using seffect and sperceive 3. Execute the next step of C 4. Update I and A using the action just executed, a: I = effect(a,I) A = seffect(a,A) 5. Update I and A using the current percept, p: I = perceive(p,I) A =perceive(p,A) 6. If A = {}, goto step 1 (perceptual contradiction) 7. Execution of C complete? Terminate 8. Goto step 3 An important facet of this algorithm is the necessity of updating both I and A based upon the actions taken and percepts encountered. A must be updated in order to detect any contradiction between the assumptions resulting from the sel call and the percepts actually encountered. I must be updated in case such a contradiction is encountered. If a perceptual contradiction occurs, A will become the empty set in step 5. Then, step 6 will lead back to step 1, where the robot will redefine A in terms of the current value of I, which takes into account the partial plan executed thus far and the percepts encountered along the way. 4.5 Conserving Reachability The greatest risk of making assumptions is that the ensuing execution may be tragic if the assumptions are inappropriate. Formally, the execution of an assumptive plan may transform some into state r such that there is no path from r to a goal state. This means that the assumptive algorithm presented thus far is in fact incomplete.[footnote 2] One solution is to force sel to be conservative by including all potentially irreversible states in sel(I): Definition. path_k(s,r) denotes that there exists an action-labeled path of length k or less in the state-space graph from state s to r. Definition. Given a problem domain, k, and I, sel is strong iff: forall s member-of states(I) exists r {path_k(s,r) AND not(path_k(r,s))} implies s is-a-member-of states(sel(I)) Note that strong selection is a constraint on the behavior of sel and not a constructive method for specifying a selection function. One approach to designing a sel is to define sel without regard to irreversibility issues, then augment the resulting node until strong selection is satisfied. In practice, the strong selection check is relatively inexpensive. Strong selection involves a search conducted on the state space graph, which denotes complete information and is therefore less expensive than search on the node-space graph. 4.6 False Goal Detection Before any claims can be made about the completeness of this assumptive architecture, one further problem must be addressed. This more subtle problem occurs because an incorrect assumption may go undiscovered during execution, leading the robot to believe it has reached the goal when in fact it has not. Definition. Given a conditional plan C, if construction of a conditional graph from I using effect and perceive results in some fringe node n s.t. not(subsumes(a,n)) then C contains a false goal. Definition. False goal detector prunes any plan containing a false goal. The existence of false goals is a result of the fact that the planner can use simplifying assumptions that are sufficiently extreme as to make the planning episode unsound. Thus, actual execution of the resultant plan can lead to fringe nodes containing unforeseen and indistinguishable situations. Consider a robot in initial node I={a,b,c} with the goal set G={x,y}. Furthermore, assume that sel(I) = {a,b}. Fig. 9a graphically depicts the result of invoking a planner using sel(I), seffect, and sperceive. Fig. 9b is the resultant plan extracted from this conditional graph. The next step is to construct a conditional graph from I using effect, perceive and the plan in Fig. 9b. The result is the plan overlay shown in Fig. 10. Constructing this plan overlay is inexpensive because it involves no search; it consists only of the application of effect and perceive functions to "simulate" plan execution. Note that the fringe node of this conditional graph unexpectedly contains state w. This is the unsound false goal case. What else could have happened to state c? If a perceptual contradiction were possible, then state f would have fallen off the condition plan overlay at the perceptual branch point because it would have been consistent only with percept p3, which is not an expected input during plan execution. 4.7 Results Soundness and completeness results follow if we allow simplifying assumptions to be made only about the initial conditions. Of course, since there may be multiple episodes of planning and acting, there may be multiple points at which the sel function can be invoked and therefore multiple "initial" conditions. Theorem 1 Algorithm 1 with strong sel and false goal detection is sound and preserves goal reachability if seffect = effect, sperceive = perceive. The proof is actually two separate proofs: one for goal reachability, and another for soundness. The goal reachability proof is based on the following four assumptions: Assumption 1: The planner used in Algorithm 1 step 2 is sound and complete. Assumption 2: The planner used in Algorithm 1 step 2 uses a false goal detector (Definition 7). Assumption 3: The sel function is consistent with the Strong Selection constraint (Definition 5). Assumption 4: State reachability implies state-set reachability (Footnote 2) The goal reachability proof uses induction on the number of planning episodes j (i.e. the number of times step 2 of Algorithm 1 is executed). The basecase is the single planning episode case: j = 1. Since the proof is about preservation of goal reachability, we are interested only in the case when a conditional plan D does exist from I to G. The proof involves showing that, during execution of Algorithm 1, there is always a plan from each current state s to the goal. Once this is shown, the state-set reachability follows by Assumption 4. For each such s, it is the progression of some state i from I. By case analysis, i is either (1) a member of sel(I); or it is (2) a member of I - sel(I). In case (1), there must be a path from s to the goal by Assumption 1. In case (2), there is obviously a path from i to s. Therefore, there must be a path from s to i (Definition 5; Assumption 3). The induction step is easy to see by showing that, at the end of each planning and execution episode, the resultant state set preserves goal reachability. Thus, the basecase argument applies to every subsequent planning and execution episode. The soundness proof, which is based on Assumption 2, is extremely simple. The proof shows that, when Algorithm 1 terminates (step 7), the updated fringe node I is a subset of G. Step 7 only invokes termination upon plan completion. In the case of plan completion, then the current node I is in fact equal to some plan overlay fringe node F (Fig. 10). Definition 7 guarantees that all plan overlay fringe nodes of an executed plan are subset of G, and so the soundness proof is complete. Note that soundness does not require seffect and sperceive to be equal to effect and perceive, respectively. Although algorithm 1 preserves goal reachability, it is not necessarily complete. This is because of the potential for the robot to enter a pathological loop consisting of mistaken assumptions, partial execution, mistaken assumptions, and so on. This can only occur if the robot's actions (as specified by effect) are nondeterministic and the state-set contraction caused by perceptual discrimination is overwhelmed by state-set expansion caused by effectory uncertainty. Theorem 2 Algorithm 1 with strong sel and false goal detection is sound and complete if seffect=effect, sperceive=perceive and |states(effect(a,n)| less-than-or-equal-to |states(n)|. The only additional step in this proof, beyond the proof of Theorem 1, is to show that Algorithm 1 eventually terminates under the additional assumption of deterministic actions. Under Assumption 1, we know that the planner is complete and therefore the only source of incompleteness involves the potential for the number of planning episodes, j, to be unbounded. In fact, we can show that j must be less than or equal to |states(I)|. This uses the fact that a planning episode can only end prematurely if a perceptual contradiction occurs (Algorithm 1, step 6). In this case, at least one member of the initial conditions I has been shown to be incorrect since states(sel(I)) must contain at least one state. Therefore, the consistent subset K of I is subsumed by I. Therefore, the current node N must correspond to fewer states than I since |states(N)| <= |states(K)|. Since the corresponding state set shrinks in size by one as a result of every planning and execution episode, the result follows that there can be no more than |states(I)| episodes. 5 Continuous Selection Continuous selection architecture is born out of a desire to maximize the use of run-time perceptual inputs. We begin by demonstrating that Algorithm 1 will, in some cases, fail to make optimal use of run-time perception. Then, we describe an algorithm that maximizes the use of run-time feedback by replanning after every action execution. Finally, we present soundness and completeness results for the continuous selection architecture. 5.1 Maximizing the Use of Information Three forms of information gain appear in Algorithm 1: (1) the conditional planner branches based on possible perceptual inputs, breaking one large problem into several smaller problems, each of which is easier because of information gain; (2) the assumptive architecture introduces "virtual" information gain by making simplifying assumptions; and (3) the robot uses execution-time perceptual inputs as a direct form of information gain to both update and "narrow" I. Given that Algorithm 1 depends to such a large extent on the concept of information gain, it is surprising to note that the algorithm does not make maximal use of perceptual inputs during run-time. In particular, Algorithm 1 has the ability to overcommit to a particular assumption, despite perceptual feedback that could be used to change the assumption. For example, assume that I = {a,b,c,d} for a robot that has a sel function consistent with the following: sel(a,b,c,d,e}={a,b} and sel(g,h,i}={g,h}. Fig. 11 depicts the scenario for overcommitment in this situation. The planner will use A={a,b}=I1 for planning and will update both I and A during execution. Suppose that during execution the robot reaches node In as shown in Fig. 11. The shaded area indicates An, which is always subsumed by In. If the robot were to reinvoke sel on node In, the result would be {g,h} and not the singleton state set {h}. Algorithm 1 would never conduct such reselection unless and until An became wholly empty because of contradictory perceptual inputs. The basic idea is that selection and progression are not necessarily commutative. If we index A and I temporally, the problem can be expressed simply: A_n not-equal sel(I_n) 5.2 Continuous Selection Architecture We propose an alternative algorithm that favors the right side of this inequality by reselecting and replanning continuously. Algorithm 2: Continuous Selection 1. A = sel(I) 2. Construct a plan C from A to G using seffect and sperceive 3. Execute the next step of C 4. Update I using the action just executed, a I = effect(a,I) 5. Update I using the current percept, p : I = perceive(p,I) 6. I subset-of G? return Success 7. Goto 1 Algorithm 2 will invoke the planner far more frequently than Algorithm 1. However, this characteristic has advantages and disadvantages. On one hand, the total conditional planning cost of Algorithm 2 may be much higher than that of Algorithm 1. But Algorithm 2 forces a tighter interleaving of planning and execution. In domains with incomplete information, where execution-time percepts are crucial, delaying planning longer can make the future planning episodes much less expensive. An additional benefit of this high planning frequency is that only the first step of a plan is ever used. This allows the strong selection constraint that preserves goal reachability to be weakened significantly: Definition. Given a problem domain and I, sel is weak iff: forall s member-of states(I) exists r{path_1(s,r) AND not(path_1(r,s))} IMPLIES s member-of states(sel(I)) The intuition behind this weak selection constraint is that since only a single step of a given plan is executed, the reversibility horizon is never deeper than one step. Of course, this weakening of the selection constraint is not free. In particular, the robot will fail to recognize deep, useless plan sequences until it has wandered much further down such a path at execution time. Therefore, weak selection admits a potential for greater execution-time backtracking than strong selection. 5.3 Results Theorem 3 Algorithm 2 with weak sel and false goal detection is sound and preserves goal reachability if seffect = effect, sperceive = perceive. The proof of Theorem 3 is almost identical to that of Theorem 1. The difference is that only one step of a plan is ever executed. Therefore, in proving the basecase of the induction, the strong selection criteria is only required to depth 1, which is precisely weak selection. An important advantage of Algorithm 2 involves the constant depth of weak selection. Because of this fixed depth, the cost of weak selection is constant and therefore, unlike strong selection, independent of the solution search depth. 6 The Price of Assumptions An exhaustive comparison of Algorithm 1 and Algorithm 2 is very difficult because the relative costs depend on the actual percepts encountered and the specific sel functions used. If the two algorithms use identical sel functions (modulo strong and weak selection), it is the case that the initial node A in Algorithm 2 will be subsumed by the initial node of Algorithm 1. Therefore, the first planning episode of Algorithm 1 will always be of equal or greater cost. Although this analysis applies only to the first planning episode, it is significant from an information-gain perspective. Any ability to stop planning early and execute is an opportunity to gain information through execution and thus simplify future planning episodes. A more quantitative cost comparison is possible between assumptive planning in general and conditional advance planning, which exhibits no interleaving of planning and execution. We present two types of cost comparisons between conditional and assumptive planninga: planning complexity cost and execution-length cost. 6.1 Planning Complexity For the analysis of planning complexity we introduce two new variables: l represents the search depth for the first assumptive episode; and j refers to the total number of incorrect assumption episodes. Note that j is identical to the counterexample measure defined by [Rivest and Shapire, 1993]. We have looked at the relative costs of conditional and assumptive architectures in three classes of problem domains. Double whammy represents a problem domain that is so unforgiving that every step executed under an incorrect assumption puts the robot two more steps away from the goal. Furthermore, double whammy assumes that assumption errors are not detected until the very last step of the conditional plan being executed. Under these extreme conditions, the assumptive approach is at a clear disadvantage because of the exponential backtracking cost. Ceiling distance captures what may appear to lie at the other extreme: domains in which the robot cannot get further than a certain ceiling, c, from the goal. In this case, l turns out to dominate the total assumptive planning cost. A further intuition behind this domain lies in the possibility that actions taken under incorrect assumptions can still be benign and even useful. In real-world domains, we have discovered that both of these analyses can overestimate the assumptive planning cost. One reason is that the probability distribution over the states in I is usually not uniform. Therefore, the assumption made by sel tends to be correct. In addition, execution under an invalid assumption still tends to take the robot closer to the goal. In the kind world case, we assume that every l actions executed by the robot place it l steps closer to the goal. 6.2 Execution Length The second type of comparison we make between conditional advance planners and assumptive architectures involves the number of steps actually executed. This comparison requires the introduction of two new domain characteristics: m represents the reversibility ratio of the specified domain. For every action executed from a state s in the domain, m is an upper bound on the number of actions required to return to state s. n represents an upper bound on the number of actions executed under an invalid assumption before the robot encounters an assumption error. Given |I|, m, and n we can express an upper bound for the competitive ratio of the number of steps executed by the conditional pre-planner versus the number of steps executed by an assumptive robot: (|I|-1)mn. This is a weak bound in that it assumes the (unrealistic) worst- case scenario in which every assumption error requires full execution-time reversal. More commonly, assumptive execution results in nonoptimal progress toward the goal, despite an invalid assumption; therefore, m can be much smaller in practice than the actual reversibility ratio. Taken to the extreme, assumptive architecture optimality occurs precisely when the effective value of m drops to zero, indicating that the execution-lengths of the assumptive and conditional approaches converge. The relationship of such an optimal assumptive plan to an optimal conditional plan is easy to describe: the plan overlay of the optimal assumptive plan from I is a subplan of the optimal conditional plan. Surprisingly, if an assumptive architecture produces such optimal assumptive plans at every planning episode, then the robot as a whole will have optimal execution behavior. This follows from the fact that both the assumptive architecture (i.e. Algorithms 1 and 2) and the conditional planner progress I based on perceptual inputs. Thus, the assumptive architecture will encode the same information gain during execution that the conditional planner uses for its conditional decision points during planning. These conditions may seem unrealistic in real-world problem domains. Yet, the next section describes an implementation that has both resulted in optimal assumptive execution behavior. 7 The Robots of CS 224 CS 224 is a hands-on introduction to robot programming for undergraduate and graduate students. The course covers the theory of perception and action as well as pragmatic aspects of robot programming, culminating in robots that plan and act in the real world despite incomplete information. The assumptive architecture is one of several interleaving techniques implemented by the students midway through the course, after several weeks of constructing low-level reactive systems. All student assignments for CS 224 involve programming four actual robots; the students never use simulated robots. These Nomad 150 robots have sixteen sonars, arranged radially, and motor encoders as their only inputs. The robots are fully self-contained, housing Macintosh Powerbooks running Common Lisp as the development environment. The robots' domain is a maze world built with three-foot high, 40 inch long cardboard walls. The position of the wall segments is highly constrained. All maze walls are rectilinear-- that is, either parallel to the x-axis or the y-axis-- and are located on the boundaries of uniformly spaced grid lines that subdivide the maze. The size of these mazes is limited only by the space available. By the end of the quarter, mazes are typically 8 x 5 grid lines large, with 35 internal walls. Fig. 12 pictures one of this year's robot teams developing robot navigation code in a small maze. The constrained nature of this maze world -- most importantly, the fact that walls must occupy the edges of cells-- allows a clear abstraction to a highly granular representation of world state. In particular, one can specify robot position and orientation in terms of a cell and a cardinal direction without losing reachability information. This means that whenever a solution exists in the maze world at the fine level of degrees and inches, then there must be a solution at the higher level in terms of cells and the four cardinal orientations. Although this abstraction is the basis of the robots' high level planning systems, the students have to first implement reliable abstract actions and percepts using the robot's much more fine- grained outputs and sensors. The students implement 16 high- level percepts, corresponding to every possible configuration of walls immediately surrounding the robot (i.e. walls on the edges of the cell containing the robot). Of course, a real robot is rarely aligned perfectly with the maze world and centered exactly in a cell. The students must design a sensor interpreter that is reliable despite reasonable orientation and position errors. This is a nontrivial challenge since sonar echoes that strike walls at shallow angles tend to coherently bounce away, providing no depth information. Given a robot with only 16 sonars, this leaves very few sonars with valid readings. The right set of high-level actions is easy to predict: move forward one cell, turn left (90 degrees), turn right. The move- forward action may make certain assumptions about the robot's initial conditions: the degree to which the robot is centered and aligned within a cell. But the action must also guarantee that, at termination, the robot will fulfill those assumptions. The students generally use bounds of +/- 5 inches from the cell center and +/- 23 degrees from rotational alignment. Once the students have robust high-level actions and percepts, they go on to design an assumptive architecture for travelling from a known initial position in a maze to a specified goal position. The challenge is that the robot has no knowledge about the positions of the internal walls of the maze. The number of possible initial conditions grows quickly with the size of the maze. For a 4 x 5 maze, the number of possible initial configurations is slightly larger than 2^27. The assumptive approach is an excellent candidate in this "safe" world that has far too many possible initial conditions. But even the assumptive approach requires keeping track of I, not just A. Explicitly enumerating the initial state set is obviously not an option. Most teams choose a property-based representation in which the set of possible states is partitioned based on the existence (or lack) of a wall on a particular cell edge. This is an extremely cheap representation method since a placeholder for each cell edge suffices, indicating whether the existence of a wall at that location is known or unknown. As the robot moves through the world, it adds properties because it gains information about the existence of walls at particular cell edges, monotonically decreasing the number of possible maze configurations. Note that this property-based representation, although practical, gives up some degree of expressibility. With an explicit state-set enumeration, one could express the condition that there is either one wall at position x1 OR one wall at position x2. This type of disjunction is impossible in the above representation because the properties of cell edges are taken to be independent. The robot's position is justifiably orthogonal to the maze configuration; therefore, it is specified as a separate tuple. In the case of incomplete information about robot position, one simply uses a list of such tuples. The perceive function is easy to implement as a lisp function that takes a node and a high level percept and returns a new node in which the existence of walls at cell edges adjacent to the robot is known. This maps, in the state-set formalism, to a function that filters a set of states, returning only the subset that is consistent with the current percept. The effect function simply changes the orientation of the robot in the case of turn-left and turn-right. But when called with move-forward, effect may progress the robot to the next cell, if there is no wall blocking that path; or, it may keep the robot at its current position if there is such a blockade. A final issue is the specification of the sel function. The most commonly used sel function is one that chooses a node representing complete information (i.e. a singleton state set) in which the number of internal walls is minimized to only those walls that must exist because they have been encountered. Fig. 13 contrasts the actual maze used to test students' assumptive solutions (13a) with the maze information given to the robots at the beginning of each robot's run (13b). Table 2 compares the properties of the search space of a conditional planner for the particular maze in Fig. 13 with the search space of the assumptive solution described below.
A surprising result is that, for this specific problem, the assumptive Algorithm 1 with the wall-minimizing sel function is optimal. The optimal conditional plan must include a trace of execution that moves the robot from its current position straight to the goal since such a path may exist according to the initial conditions. The assumptive architecture is indeed generating this path; therefore, an overlay of its action sequence matches this conditional trace perfectly. Of course, the total planning cost of the assumptive approach is trivial compared to the execution time while the total planning cost of conditional advance-planning is 4^k16^k for a k-step solution. The students' assumptive robots in the maze of Fig. 13 planned for a total of less than 1 second total and typically achieved the goal position within 9 minutes, after executing 18 to 27 actions. 8 A Whirling Dervish The 1994 AAAI National Robot Competition consisted of two events, the first of which challenged robots to reliably navigate a simulated office environment. The office environment was built with partition walls similar to those used to create cubicles in modern offices. The building also contained real-world furniture, such as chairs, tables and trash cans. Additionally, some hallways or foyers would be filled with large boxes or clusters of furniture, making them impassable. The robot would be given a topological map of the environment; that is, purely nonmetric information about the connectivity of rooms and hallways. Fig. 14 geometrically depicts a segment of an office environment (14a) and the corresponding topological map (14b). The robot would start in a known initial room and was required to travel to a goal room specified at the beginning of the run. The challenge was to reach the goal room reliably in spite of obstacles that render the supplied topological map inaccurate. We chose to enter this contest because it provided us with an excellent motivation for applying lessons learned from outdoor navigation to a problem we had avoided: indoor office navigation. Like the students of CS 224, we would avoid simulation altogether. This meant developing and testing our robot in several office buildings at Stanford University despite the fact that the contest would be in a "simulated" office building constructed with plastic partitions. Dervish's hardware was born out of the modification of a four-year old Nomad 100 mobile robot. We removed infrared and tactile (bumper) sensors, then rearranged the Nomad's sonars, which were initially positioned in a traditional, radial 16- sonar ring. The sonars were repositioned in two ways: clusters of sonars pointed in approximately the same direction, providing redundancy and therefore more reliable freespace information; and, clusters of sonars were pointed up and down at radical angles in order to detect ledges and tabletops that may decapitate Dervish [Nourbakhsh, 1995]. The issue of representation proved to be more challenging in this domain than in the world of CS 224. Here, there is no regular breakdown of the world into cells, with obstacles appearing only at the edges of cells. The strength of the CS 224 approach was that the representation specified world state at the same level of granularity at which goals were specified and also at the same level at which obstacles affected maze connectivity. This same desirable quality is captured in Dervish's world by choosing a representation at the same level of abstraction as the topological map. The difference between this approach and that of CS 224 is that Dervish's cells cannot be constrained to a fixed size. Fig. 15 demonstrates the manner in which our representation effectively quantizes the geometric world of Fig. 14(a) using the topological information of 14(b). Unlike CS 224, obstacles can appear anywhere in an office environment. But the point of quantizing space into cells is to ignore geometric position at any finer level of granularity than the cells themselves. The solution is to represent obstacles as being on cell edges despite the inherent inaccuracy. This is acceptable because there is no need to model obstacles at a finer level of detail than that of changing the connectivity of adjacent cells. Of course, one must take care to place the obstacle at the correct edge. For instance, an obstacle at cell 2-3 in Fig. 15 can block either the path from 2 to 2-3 or the path from 3 to 2-3. Having made this simplification, we are able to represent nodes in the office environment exactly as in CS 224: as a set of explicit robot positions (in terms of cell and orientation) and explicit positions of known blockades. Given a representation identical to that of CS 224, one would rightly expect the high-level actions and percepts to be similar, as they indeed are: Dervish's actions are move-forward, turn-left, and turn-right. Move-forward is a complex version of the same written for CS 224: it is charged with the task of maintaining hallway centerline, avoiding small obstacles, discovering impassable obstacles, and preserving a continuous sense of hallway width and orientation. The high-level percepts Dervish uses are more complex and much more difficult to implement than those used in CS 224. Dervish detects hallways, closed doors, and open doors on each side as it moves down a hallway. The percept-generating code makes extensive use of histories of sensor data, comparing expected sonar readings based on hallway dimensions and encoder counts with actual sonar data. (Nourbakhsh, 1995; Nourbakhsh et. al., 1995). The reliability of these percepts is crucial because the termination of the move-forward action depends upon perceptual "events." Dervish's primary action is hallway-follow, and this action must remain in force until the robot reaches an intersection where it will turn. But Dervish can receive a variety of percepts over the course of a single move-forward episode, as a result of perceptual false positives. There is a temporal mismatch between the frequency of new perceptual inputs and the duration of actions. With this mismatch in mind, we combined perceive and effect (i.e. steps 4 and 5 in Algorithm 2) into a single method for updating the robot's state set using the current percept. We call this approach state-set progression. As Dervish moves down a hall, whenever it discerns a percept it updates the state set by progressing each state in the state set based on the percept and the current action. Progression involves two steps: (1) removal of a state and replacement with all possible subsequent states, based on the direction of travel (i.e., nondeterministic effect); and (2) attribution of a likelihood to each new state in the state set based on the current percept (i.e., perceive with imperfect discrimination). To accomplish the computation of perceive, we include a certainty matrix that associates a likelihood with every real-world feature and possible perceptual input. For example, in the Computer Science Department at Stanford, Dervish has a 60% chance of detecting a closed door correctly and a 40% chance of mistaking the closed door for a wall. The representation thus associates a likelihood with every robot position; the node is literally represented as a list of robot position-likelihood pairs and a list of blockade positions. Dervish uses Continuous Selection (Algorithm 2) to reach the goal from its specified initial conditions. The planning in step 2 is performed using an sperceive and seffect that are much more reliable than perceive and effect: sperceive models perfect sensors and, in the same vein, seffect models deterministic actions (i.e., move-forward is assumed to always terminate successfuly at the next node). Dervish's sel function chooses a singleton state set, corresponding to complete information, in which the robot's position is simply the position in the state set with the highest associated likelihood. Furthermore, the world's connectivity is maximized by assuming that there are no blockades except for those directly sensed, identical to the internal maze walls of CS 224.
Table 3 compares Dervish's assumptive performance in the actual AAAI contest finals maze with the theoretical performance of a conditional advance-planner. Note that although the total number of states is much smaller in this domain than in the CS224 maze world, the conditional advance-planning approach is prohibitively costly because of the solution length required when planning with perceive and effect rather than sperceive and seffect. In its final contest run, Dervish actually encountered one closed door to the goal room, a blocked hallway and three false positive percepts and nevertheless executed correctly for approximately 15 minutes to reach the goal room through another entrance. Even though Dervish replanned frequently, its planning time was insignificant compared to its execution time. Recall that the sel function forced the planner's initial condition to be one with complete information and that seffect and sperceive are deterministic and perfectly reliable, thus resulting in very low planning costs. Yet Dervish's behavior in the final run was (coincidentally) identical to the optimal behavior we would expect from a complete conditional plan produced off-line. As with CS 224, the office building domain is a kind world: incorrect assumptions are generally discovered at execution time before the invalid assumption leads to suboptimal execution behavior. 9 Discussion The assumptive architecture enables interleaving of planning and execution by planning with an unrealistic pretense of information. Much of the recent work on interleaving planning and execution falls in a separate category called deferred planning, in which the planner terminates search before reaching the goal conditions. Several researchers have developed domain- specific heuristics for deferred planning that are based on information gain. Olawsky et. al. provide methods for terminating search at plan fringes that condition based on sensory input (Olawsky, Krebsbach & Gini, 1993). The resulting domain-specific heuristics allow nonoptimal but correct interleaving in both fully reversible domains and more complex domains in which specific heuristics are generated off-line to avoid catastrophic early execution. Dean et. al. introduce an interleaving system for a mobile robot navigation problem (Dean et. al., 1990). Again, early planning termination occurs when the planner discovers actions that provide execution-time information gain (i.e. robot position information). The termination heuristic presented is limited to reversible domains in which information gain is always the right option to choose; however, no formal analysis of the method is presented. Durfee & Lesser present a deferred planning system that has a subtle assumptive element (Durfee & Lesser, 1988). Their control architecture forms a particular hypothesis about an object's trajectory, acting in accordance with that hypothesis until receiving information to the contrary. The method is a special case of the manner in which assumptive algorithms detect assumption errors and then consistently modify the assumptive set as a result. Two domain-independent analyses of deferred planning and execution are presented by Ambros-Ingerson & Steel and Genesereth & Nourbakhsh (Ambros-Ingerson & Steel, 1988; Genesereth & Nourbakhsh, 1993). Ambros-Ingerson & Steel present two methods for interleaving. (1) Variable instantiation in a STRIPS-based representation is seen as information gain and is a point for planning deferral and early execution. (2) Run-time errors cause reinvokation of the planner, as required, to modify the intended path to the goal. This criteria is useful only when the executable actions are in fact less reliable than operator descriptions used during planning. This method is therefore similar to the assumptive system's use of seffect . Although Ambros-Ingerson & Steel incorporate elements of both deferred and assumptive planning in their architecture, they do not support conditional planning. In problem domains requiring plan-time conditioning because of high degrees of incomplete information and irreversibility, this architecture can prove to be incomplete. Interleaving planning and execution is just one way of addressing a more fundamental problem: the synthesis of robot systems that perform capably in real-world domains with incomplete information. Another class of solutions focusing on this problem involves the off-line design of situated robots, avoiding the run-time planning problem altogether (Nilsson, 1992; Brooks, 1986; Firby, 1987; Slack, 1993). These methods have demonstrated success when the off-line design is performed carefully and the domain is well-known. Yet, this approach ignores the potential value added in high-level planning. Such planning can imbue a robot with more goal-directed behavior and can be crucial in preserving goal reachability, especially in domains with irreversibility and incomplete information. A natural progression is to create systems that plan, or design, situated robot systems. Both Kaelbling & Rosenschein and Schoppers have presented compilers that produce, as output, a universal plan mapping the robot's information states to actions (Kaelbling & Rosenschein, 1990; Schoppers, 1987). In both cases, knowledge state is represented in terms that map semantically to state sets, making the representations instances of the "nodes" defined here. However, the above work requires complete universal plans to be generated before execution begins. In contrast, the assumptive architecture takes advantage of information gain during early execution to plan only for cases that the robot actually encounters. Yet another class of solutions involves the use of planners and planning schedulers that reason about the availability of limited resources, and thus about the value of planning versus executing. Dean & Boddy present a formal framework for the anytime planning approach and Zilberstein & Russell have implemented such an anytime system to solve a mobile robot problem (Dean & Boddy, 1988; Zilberstein & Russell, 1993). Anytime planning is not a solution in itself, however. An advance-planning system using an anytime planner still fails to interleave planning and execution. Zilberstein & Russell create an interleaving system by allowing multiple episodes of anytime planning. Ironically, the anytime planner requires domain information about the relationship of plan quality to planning time despite the fact that the implementation is motivated by a need to plan and act in cases of incomplete domain information. Drummond has made progress in the construction of a planner that fulfills the anytime requirement of monotonic plan quality improvement while specifying an approach that is well-suited to robot domains (Drummond, 1993). In another research arena, the finite-automata learning community has provided the most complete framework concerning the interaction of a robot with its environment in cases of incomplete information. Rivest & Shapire formalize both the interaction paradigm involving action and perception and the related concepts of perceptual aliasing and state uncertainty (Rivest & Shapire, 1993). Furthermore, they present expectation-tracking mechanisms and the subsequent discovery of perceptual counterexamples (or assumption errors). Both Shen and McCallum have described algorithms for learning finite automata by assuming simple versions of those automata and recognizing assumption errors during execution (Shen, 1993; McCallum, 1995). These methods are restricted to environments whose state-space diagrams are strongly connected and, thus, devoid of irreversibility. In contrast, the present work has the aim of applying simplifying assumptions to dangerous domains using selection criteria that preserve goal reachability. A research topic that has not been addressed is the automatic synthesis of assumptions that simplify planning while imposing little or no execution-time penalty. For example, one can imagine an analysis of the state-set graph of a domain in order to uncover sel rules that are safe and useful. Another approach may be to learn sel functions for a particular domain using multiple, simulated execution runs. Until such automated sel design is possible, the robot designer shoulders the burden of considering a domain's specific properties in choosing an assumptive selection strategy that allows frequent but safe early planning termination and execution. 10 Conclusion We have presented a formal framework for explicitly representing common types of incomplete information. The assumptive algorithms presented use this formalism in a representation- and domain-independent manner to effect interleaving planning and execution. The algorithms make no assumptions about reachability and connectedness in the state- space graph. These interleaving algorithms are unusual in that they are sound and complete under clearly specified conditions; this is a property that cannot be met by architectures that do not support general conditional planning. Finally, both algorithms have been demonstrated on real robots operating in nontrivial domains. The ultimate goal of robotics is to produce autonomous robots that interact usefully with the real world. Such robots will inherit from all of their ancestors: reactive, deferred, assumptive and anytime systems. The assumptive architecture makes its contribution by providing a means for behaving safely and deliberately in spite of incomplete information. Acknowledgements We would like to thank Lise Getoor, Steve Ketchpel and David Andre for insightful comments and questions on various versions of this paper. The anonymous referees also deserve our gratitude as their comments have led to a much improved paper. References Ambros-Ingerson, J. and Steel, S. 1988. Integrating Planning, Execution and Monitoring. In Proceedings of the Seventh National Conference on Artificial Intelligence, pp. 83-88. Brooks, R. 1986. A robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation., 2:14-23. Dean, T., Basye, K., Chekaluk, R., Hyun, S., Lejter, M. and Randazza, M. 1990. Coping with Uncertainty in a Control System for Navigation and Exploration. In Proceedings of the Eighth National Conference on Artificial Intelligence, pp. 1010-1015. Dean, T. and Boddy, M. 1988. An Analysis of Time-Dependent Planning. In Proceedings of the Seventh National Conference on Artificial Intelligence, pp. 49-54. Doyle, R.J., Atkinson, D.J., and Doshi, R.S. 1986. Generating Perception Requests and Expectations to Verify the Execution of Plans. In Proceedings of the Fifth National Conference on Artificial Intelligence. Drummond, M. 1993. Reaction-First Search. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1408-14. Drummond, M. 1985. Refining and Extending the Procedural Net. In Proceedings of the Ninth International Joint Conference on Artificial Intelligence, pp. 1010-1012. Durfee, E. and Lesser, V. 1988. Incremental Planning to Control time-constrained Blackboard-Based Problem Solver. IEEE Transactions on Aerospace and Electronic Systems. 24(5): 647-62. Etzioni, O., Hanks, S., Weld, D., Draper, D., Lesh, N. and Williamson, M. 1992. An Approach to Planning with Incomplete Information. In Proceedings of the 3rd International Conference on Principles of Knowledge Representation and Reasoning, pp. 115-125. Fikes, R. 1971. Monitored Execution of Robot Plans Produced by STRIPS. Technical Note 55, Stanford Research Institute. Fikes, R. and Nilsson, N. 1971. Strips: A New Approach to the Application of Theorem Proving to Problem Solving. Artificial Intelligence., 2:189-208. Firby, R.J. 1987. An Investigation into Reactive Planning in Complex Domains. In Proceedings of the Sixth National Conference on Artificial Intelligence, pp. 202-206. Gat, E. 1992. Integrating Planning and Reacting in a Heterogeneous Asynchronous Architecture for Controlling Real-World Mobile Robots. In Proceedings of the Tenth National Conference on Artificial Intelligence. Genesereth, M. and Nourbakhsh, I. 1993. Time-saving Tips for Problem Solving with Incomplete Information. In Proceedings of the 11th National Conference on Artificial Intelligence, pp. 724-730. Genesereth, M. 1993. Discrete Systems. Course notes for CS 222. Stanford, CA: Stanford University. Hsu, J. 1990. Partial Planning with Incomplete Information. Presented at the AAAI Spring Symposium on Planning in Uncertain, Unperdictable, or Changing Environments, 27-29 March, Stanford, California. Kaelbling, L.P. and Rosenschein, S.J. 1990. Action and planning in embedded agents. Robotics and Autonomous Systems. 6:35- 48. Latombe, J.-C., Lazanas, A. and Shekhar, S. 1991. Robot Motion Planning with Uncertainty in Control and Reasoning. Artificial Intelligence 52(1): 1- 47. RMcCallum, R.A. 1995. Instance-Based Utile Distinctions for Reinforcement Learning with Hidden State. In Proceedings of the XII International Conference on Machine Learning, pp. 387-395. Nilsson, N. 1992. Toward Agent Programs with Circuit Semantics. Technical Report CS-TR-92-1412. Stanford, California: Academic. Nourbakhsh, I., Powers, R. and Birchfield, S. 1995. Dervish, An Office-Navigating Robot. AI Magazine 16(2). Nourbakhsh, I. 1995. The Sonars of Dervish. The Robot Practitioner 1(4). Olawsky, D., Krebsbach, K., and Gini, M. 1993. An Analysis of Sensor-Based Task Planning. Technical Report #93-94. Minneapolis, Minn.: Academic. Peot, M. and Smith, D. 1992. Conditional Nonlinear Planning. In Proceedings of the First Conference on AI Planning Systems, pp. 189-197. Rivest, R.L. and Shapire R.E. 1993. Inference of Finite Automata Using Homing Sequences. Information and Computation. 103:299-347. Rosenschein, S.J. and Kaelbling, L.P. 1995. A situated view of representation and control. Artificial Intelligence. 73:149-173. Sacerdoti, E. 1974. Planning in a hierarchy of abstraction spaces. Artificial Intelligence, 5:115-135. Schoppers, M.J. 1987. Universal Plans for Reactive Robots in Unpredictable Environments. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1039-1046. Shen, W.-M. 1993. Learning Finite Automata Using Local Distinguishing Experiments. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1088-1093. Slack, M. 1993. Navigation Templates: Mediating Qualitative Guidance and Quantitative Control in Mobile Robots. IEEE Transactions on Systems, Man, and Cybernetics 23(2): 452- 466. Takeda, H. and Latombe, J.-C. 1994. Planning the motions of a mobile robot in a sensory uncertainty field. IEEE Transactions on Pattern Analysis and Machine Intelligence. 14(10): 1002- 17. Warren, D. 1976. Generating Conditional Plans and Programs. In Proceedings of the AISB Summer Conference, 344-354. Zilberstein, S. and Russell, S. 1993. Anytime Sensing, Planning and Action: A Practical Model for Robot Control. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 1402-07. Footnotes Affiliation of the Authors: Computer science department. Stanford University. Stanford, CA. Illah Nourbakhsh: doctoral candidate in Computer Science Michael Genesereth: associate professor of Computer Science 1 In the case of maintenance goals, the problem can often expressed as first transforming the initial world state to a member of G, then guaranteeing that all successive transitions are only to other states in G. 2 A more complex version of this problem involves state-set unreachability, in which the uncertainty inherent in a combination of states prevents the existence of any solution plan. The same type of analysis applies to this more subtle problem.