15-887*: Planning, Execution, and Learning
Schedule, Notes, Readings
Fall 2010
- Wednesday, September 8: Introduction
- Lecture Notes
- Points to remember from the discussion in class:
- Planning is equivalent to "problem solving," formalized in terms of states, goals, and actions.
- Planning is "thinking" and not executing. It needs models of states and actions.
- Execution performs the actions of a plan, and monitors their applicability on the perceived states.
- Learning improves planning and execution through experience.
- Different planning problems require different assumptions
about state and actions, which can make a problem easier or harder to solve.
- Remember the mutilated checkerboard example.
- No assigned readings besides lecture notes.
- Monday, September 13: Representation and Search
- Lecture Notes
- Points to remember from the discussion in class:
- State space grows exponentially in number of state variables
- Factoring / decomposing state space key to planning efficiency
- Strips-assumption (add/delete lists) deals with Frame Problem
- Writing a planning domain in terms of a set of predicates,
variables, and operators is a hard task.
- Planners characterized in terms of soundness, completeness,
optimality, and computational efficiency
- Many ways to find solutions to planning problems -- differ
in complexity and how well they scale with respect to size
of search space
- Search for solutions to planning problems can occur in
both state space and plan space
- No assigned readings besides lecture notes.
- Wednesday, September 15: Plan-Space Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Search space is space of partial plans;
search actions modify plans (refine them)
- Partial-order planners work by adding causal links from effects
of one action to precondition of another, and then by eliminating
threats. Threats can be eliminated by promotion, demotion, or
separation. Backtracking occurs if a threat cannot be eliminated.
- Use of Least Commitment -- delaying choices of temporal
constraints and parameter bindings unless forced.
- Partial-order planners can easily (and often optimally) deal
with goal interactions
- Readings:
- Monday, September 20: State-Space Linear and Nonlinear Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Linear planning maintains the state and a goal stack: it's efficient but can
generate unoptimal solutions, and it is incomplete.
- Nonlinear state-space planning uses a set of goals, backtracks over
plan orderings decisions. It's complete.
- State-space planning commits on plan step orderings
with the main goal of getting an updated state and hopefully
a "closer" state to the goal. Such commitments may lead to
backtracking, but may also lead to informed state-based choices
of actions.
- Readings:
- Wednesday, September 22: Comparison of Planners - State-space and Plan-space
- Lecture Notes
- Points to remember from the discussion in class:
- Planning involves search. Search means choices points.
Commitments, driven by heuristics or nondeterminiscatilly,
need to be made at choice points.
- State-space planning commits on plan step orderings
with the main goal of getting an updated state and hopefully
a "closer" state to the goal. Such commitments may lead to
backtracking, but may also lead to informed state-based choices
of actions.
- Plan-space planning does not commit on plan step orderings,
until it needs to solve threats. It is
therefore seen as a least-commitment search
approach. Plan-space planning does not backtrack over threat
resolution.
- Plan-space planning commits to causal links
leading to linkability problems and to the need to backtrack
over linking commitments. In the linkability sense,
plan-space planning is a strong-commitment planner.
- Readings:
- Monday, September 27: Graph-Based and Sat-Based Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
- Wednesday, September 29: Heuristic Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Heuristics can be automatically derived from planning domain
- Most planning heuristics come from solving relaxed form of problem
- Many heuristics are derived from planning graph
- While no single heuristic is best, they may be combined to good affect
- Readings:
-
Planning as Heuristic Search, B. Bonet and H. Geffner. Artificial Intelligence, Special issue on Heuristic Search. Vol 129 (1-2) 2001.
-
Admissible Heuristics for Optimal Planning P. Haslum H. Geffner. Proc. 5th
Int. Conf. on AI Planning and Scheduling (AIPS 2000).
-
The FF Planning System: Fast Plan Generation Through Heuristic Search,
J. Hoffmann, B. Nebel. Journal of Artificial Intelligence Research,
vol 14, pp. 253-302, 2001
-
Landmarks Revisited, S. Richter, M. Helmert, M. Westphal, AAAI, 2008.
-
The More, the Merrier: Combining Heuristic Estimators for Satisficing
Planning, G. Roger and M. Helmert, ICAPS, 2010
- Monday, October 4: Hierarchical Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Problem reduction search decomposes tasks into subtasks
- Take advantage of structure inherent in domain
- Domain definition -- "how" rather than "what"
- Works best when problem is complex, but few unintended
goal interactions
- Readings:
-
SHOP2: An HTN Planning System
Nau, D.S., Au, T.C., Ilghami, O., Kuter, U., Murdock, J.W., Wu, D. and
Yaman, F, Volume 20, pages 379-404, 2003.
-
O-Plan: the Open Planning Architecture,
Ken Currie and Austin Tate.
Artificial Intelligence Vol. 52, pp 49-86, 1991
- Wednesday, October 6: Explanation-Based Learning
- Lecture Notes
- Points to remember from the discussion in class:
- Points to remember from the discussion in class:
- Control learning "explains" a search episode and outputs
an explanation that can be used in future search to avoid failure.
- Generating "correct" explanations may be expensive,
an alternative is to acquire explanations incrementally.
- Readings:
- Monday, October 11: Planning under Uncertainty I
- Lecture Notes
- Points to remember from the discussion in class:
- Uncertainty arises from state uncertainty, non-deterministic effects,
and sensor noise.
- Conformant planning produces non-branching plans that deal with
uncertainty by forcing the world into known states
- Conditional planning produces branching plans that use sensing actions
during execution to determine which branch to take
- Sensing actions can be used to distinguish between different possible worlds
- There are three BDD-based planning algorithms to
know: strong, strong cyclic, and optimistic.
- Readings:
-
Conformant Graphplan, D. Weld and D. Smith, Proceedings of AAAI, July 1998.
-
Extending Graphplan to Handle Uncertainty and Sensing Actions,
D. Weld, C. Andersonand D. Smith, Proceedings of AAAI, July 1998.
-
Strong Planning in Non-Deterministic Domains via Model Checking,
A. Cimatti, M. Rovero, P. Traverso.
In Proceedings of AIPS-98.
-
Automatic OBDD-based Generation of Universal Plans in
Non-Deterministic Domains,
A. Cimatti, M. Rovero, P. Traverso.
In Proceedings of AAAI-98.
-
OBBD-based Universal planning for synchronized agents in non-deterministic
domains, Rune Jensen and Manuela Veloso, Journal of Artificial Intelligence Research, 13,
pages 189-226, 2000.
- Wednesday, October 13: Planning under Uncertainty II
- Lecture Notes
- Points to remember from the discussion in class:
- Planning by determinizing and replanning during execution can be
effective, though incomplete.
- Different determinization approaches have different strengths and
weakneesses
- Pruning using plan dominance is effective strategy in planning to
maximize expected reward/utility
- Optimal strategy is to work on plan with highest upper bound
- Readings:
- Monday, October 18: Markov Decision Processes
- Lecture Notes
- Points to remember from the discussion in class:
- MDPs for planning under uncertainty
- MDPs reason about states and actions and rewards.
- Rewards capture goal achievement.
- A policy maps states to actions. An MDP with a
specific policy is a Markov Chain.
- We have seen value iteration and policy iteration
to solve Marchov Chains and MDPs.
- Readings:
- Wednesday, October 20: Reinforcement Learning
- Lecture Notes
- Points to remember from the discussion in class:
- The Q function captures the unknown reward and transition probability.
- Each step of the Q-learning algorithm updates the Q(s.a) of the particular
state the learner is in and the action the learner takes.
- During Q-learning, actions are chosen based on a balanced exploration and
exploitation.
- Readings:
- Chapter 13 - Machine Learning, Tom Mitchell
- Monday, October 25: Planning by Analogy and Policy Reuse
- Lecture Notes
- Points to remember from the discussion in class:
- Planning efficiency can be improved by using example of past planning episodes.
- Planning by analogy involves creating a planning episode, store it, retrieve it,
and replay it.
- Multiple past planning episodes can be retrieved as partial matches, and merged at
replay time.
- Readings:
- Flexible strategy
learning: Analogical replay of problem solving episodes,
Manuela M. Veloso.
In Proceedings of AAAI-94, the Twelfth National Conference on
Artificial Intelligence, pages 595-600, Seattle, WA, August 1994. AAAI
Press.
-
Integrating planning and learning: The Prodigy architecture, (Sections 3 and 4)
Manuela M. Veloso, Jaime Carbonell, M. Alicia P\'erez, Daniel Borrajo,
Eugene Fink, and Jim Blythe.
Journal of Experimental and Theoretical
Artificial Intelligence, 7(1):81--120, 1995.
-
Probabilistic policy reuse in a reinforcement learning agent,
Fernando Fernandez and Manuela Veloso.
In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems , Hakodate, Japan, May 2006.
- Wednesday, October 27: POMDPs I
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
- Monday, November 1: POMDPs II
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
- Wednesday, November 3: Execution Architectures
- Lecture Notes
- Points to remember from the discussion in class:
- Importance of modularity and hierarchy (temporal, behavioral,
functional)
- Layered architectures very important - upper layers provide goals;
commands sent to lower layers
- Many modern hybrid architectures have three layers - planning,
executive, behavioral - each with different roles and using
different representations
- Readings:
-
Experiences with an Architecture for Intelligent, Reactive Agents,
R. P. Bonasso, R. J. Firby, E. Gat, D. Kortenkamp, D. Miller and M. Slack
Journal of Experimental and Theoretical Artificial Intelligence, 9(2),
1997.
-
A Task Description Language for Robot Control,
R. Simmons and D. Apfelbaum,
In Proceedings of Conference on Intelligent Robotics and Systems, Vancouver Canada, October 1998.
- Monday, November 8: Path Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Replanning is needed when execution fails.
- Given a plan that failed on a step, two alternatives for replanning
are: to plan from scratch to the goal from the new state; and to plan
from the new state to try reach one of the steps of the past plan. Both
options are not ideal
- ERRT with a search bias to the past plan is the perfect solution
for replanning.
- Readings:
- Wednesday, November 10: Variable-Level-of-Detail Physics-Based Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Modeling the real world and effects of actions on the real world is challenging.
- Physics-based planning (Zickler 2010) plans using a physics engine to model
the physical world.
- Planning with a variable level of detail is very effective to interleave
planning and execution, so that it models the world with decreasing detail
as a function of time.
- Readings:
- Monday, November 15: Multi-robot planning, execution, and learning
- Lecture Notes I
- Lecture Notes II
- Points to remember from the discussion in class:
- Readings:
-
Plays as team plans for coordination and adaptation,
Michael Bowling, Brett Browning, and Manuela Veloso.
In Proceedings of the 14th International Conference on Automated
Planning and Scheduling (ICAPS-04), Vancouver, June 2004.
-
Prioritized Multi-Hypothesis Tracking by a Robot with Limited Sensing,
Paul E. Rybski and Manuela M. Veloso.
EURASIP Journal on Advances in Signal Processing,
Vol. 2009, Article ID 284525, 2009.
-
Dynamic multi-robot coordination,
Douglas Vail and Manuela Veloso,
In Multi-Robot Systems: From Swarms to Intelligent Automata,
A. Schultz, L. Parker, and F. Schneider (eds.),
Volume II, pages 87--100. Kluwer Academic Publishers, 2003.
- Wednesday, November 17: Multi-agent Planning - DEC-POMDPs
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
- Monday, November 22: Social Navigation Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
- Wednesday, November 24: Thanksgiving, No Class
- Monday, November 29: Project Presentations
- Wednesday, December 1: Project Presentations