15-887*: Planning, Execution, and Learning
Schedule, Notes, Readings
Fall 2012
- Mondday, September 10: Introduction
- Lecture Notes
- Points to remember from the discussion in class:
- Planning is equivalent to "problem solving," formalized in terms of states, goals, and actions.
- Planning is "thinking" and not executing. It needs models of states and actions.
- Execution performs the actions of a plan, and monitors their applicability on the perceived states.
- Learning improves planning and execution through experience.
- Different planning problems require different assumptions
about state and actions, which can make a problem easier or harder to solve.
- No assigned readings besides lecture notes.
- Wednesday, September 12: Representation and Search
- Lecture Notes
- Points to remember from the discussion in class:
- Many ways to represent states and actions; have
significant effects on planning expressiveness and complexity
- State space grows exponentially in number of state variables
- Strips-assumption (add/delete lists) deals with Frame Problem
- Many ways to find solutions to planning problems -- differ
in complexity and how well they scale with respect to size
of search space
- People bring lots of real-world knowledge to bear when
they solve planning problems; most planners do not have
that luxury
- Writing a planning domain in terms of a set of predicates,
variables, and operators is a hard task.
- Planners characterized in terms of soundness, completeness,
optimality, and computational efficiency
- No assigned readings besides lecture notes.
- Monday, September 17: State-Space Linear and Nonlinear Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Linear planning maintains the state and a goal stack: it's efficient but can
generate unoptimal solutions, and it is incomplete.
- Nonlinear state-space planning uses a set of goals, backtracks over
plan orderings decisions. It's complete.
- State-space planning commits on plan step orderings
with the main goal of getting an updated state and hopefully
a "closer" state to the goal. Such commitments may lead to
backtracking, but may also lead to informed state-based choices
of actions.
- Readings:
- Wednesday, September 19: Plan-Space Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Search space is space of partial plans;
search actions refine plans (add action, ordering, binding information)
- Partial-order planners work by adding causal links from effects
of one action to precondition of another, and then by eliminating
threats.
- Threats can be eliminated by promotion, demotion,
separation, or confrontation. Backtracking occurs if a threat cannot be eliminated.
- Use of Least Commitment: delaying choices of temporal
constraints and parameter bindings unless forced.
- Partial-order, least commitment planners can easily (and often optimally) deal
with goal interactions
- Readings:
- Monday, September 24: Comparison of Planners - State-space and Plan-space
- Lecture Notes
- Points to remember from the discussion in class:
- Planning involves search. Search means choices points.
Commitments, driven by heuristics or nondeterminiscatilly,
need to be made at choice points.
- State-space planning commits on plan step orderings
with the main goal of getting an updated state and hopefully
a "closer" state to the goal. Such commitments may lead to
backtracking, but may also lead to informed state-based choices
of actions.
- Plan-space planning does not commit on plan step orderings,
until it needs to solve threats. It is
therefore seen as a least-commitment search
approach. Plan-space planning does not backtrack over threat
resolution.
- Plan-space planning commits to causal links
leading to linkability problems and to the need to backtrack
over linking commitments. In the linkability sense,
plan-space planning is a strong-commitment planner.
- Readings:
- Wednesday, September 26: Graph-Based and Sat-Based Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Two phases: forward extension and backwards plan search.
- Optimal in terms of number of time steps.
- Readings:
- Monday, October 1: Heuristic Planning I
- Lecture Notes
- Points to remember from the discussion in class:
- Search can be uninformed or informed (heuristic); heuristics should be
admissible and informed
- Most planning heuristics come from solving relaxed form of problem
- Heuristics can be automatically derived from planning graph
- While no single heuristic is best, they may be combined to good affect
- Readings:
- Wednesday, October 3: Heuristic Planning II
- Lecture Notes
- Points to remember from the discussion in class:
- Landmarks are formulae that must hold in any plan to achieve
a given set of goals
- Finding all landmarks is intractable, but good polynomial
algorithms exist, based on back-chaining, planning graphs,
and domain transition graphs
- Landmarks can be used as either subgoals or heuristic
estimates
- Admissible heuristics can be constructed from landmarks
- Readings:
- Monday, October 8: Explanation-Based Learning
- Lecture Notes
- Points to remember from the discussion in class:
- Control learning "explains" a search episode and outputs
an explanation that can be used in future search to avoid failure.
- Generating "correct" explanations may be expensive,
an alternative is to acquire explanations incrementally.
- Readings:
- Wednesday, October 10: Homework 1 Review
- Monday, October 15: Planning under Uncertainty I - Conditional and
Conformant Planning
- Lecture Notes
- Points to remember from the discussion in class:
- Uncertainty arises from state uncertainty, non-deterministic effects,
and sensor noise.
- Conformant planning produces non-branching plans that deal with
uncertainty by forcing the world into known states
- Conditional planning produces branching plans that use sensing actions
during execution to determine which branch to take
- Sensing actions can be used to distinguish between different possible worlds
- Graphplan can be easily extended to reason about possible worlds
- There are three BDD-based planning algorithms to
know: strong, strong cyclic, and optimistic.
- Readings:
-
Conformant Graphplan, D. Weld and D. Smith, Proceedings of AAAI, July 1998.
-
Extending Graphplan to Handle Uncertainty and Sensing Actions,
D. Weld, C. Anderson and D. Smith, Proceedings of AAAI, July 1998.
-
Automatic OBDD-based Generation of Universal Plans in
Non-Deterministic Domains,
A. Cimatti, M. Rovero, P. Traverso.
In Proceedings of AAAI-98.
- Wednesday, October 17: Planning under Uncertainty II -
Determinization and Replanning
- Lecture Notes
- Points to remember from the discussion in class:
- Planning by determinizing and replanning during execution can be
effective, though incomplete.
- Different determinization approaches have different strengths and
weakneesses
- Pruning using plan dominance is effective strategy in planning to
maximize expected reward/utility
- Optimal strategy is to work on plan with highest upper bound
- Readings:
- Monday, October 22: Markov Decision Processes
- Lecture Notes
- Points to remember from the discussion in class:
- MDPs for planning under uncertainty
- MDPs reason about states and actions and rewards.
- Rewards capture goal achievement.
- A policy maps states to actions. An MDP with a
specific policy is a Markov Chain.
- We have seen value iteration and policy iteration
to solve Marchov Chains and MDPs.
- Readings:
- Wednesday, October 24: Reinforcement Learning
- Lecture Notes
- Points to remember from the discussion in class:
- The Q function captures the unknown reward and transition probability.
- Each step of the Q-learning algorithm updates the Q(s.a) of the particular
state the learner is in and the action the learner takes.
- During Q-learning, actions are chosen based on an exploration vs
exploitation strategy.
- Readings:
- Monday, October 29: Planning by Analogy and Policy Reuse
- Lecture Notes
- Points to remember from the discussion in class:
- Planning efficiency can be improved by using example of past planning episodes.
- Planning by analogy involves creating a planning episode, store it, retrieve it,
and replay it.
- Multiple past planning episodes can be retrieved as partial matches, and merged at
replay time.
- Readings:
- Flexible strategy
learning: Analogical replay of problem solving episodes,
Manuela M. Veloso.
In Proceedings of AAAI-94, the Twelfth National Conference on
Artificial Intelligence, pages 595-600, Seattle, WA, August 1994. AAAI
Press.
-
Probabilistic policy reuse in a reinforcement learning agent,
Fernando Fernandez and Manuela Veloso.
In Proceedings of the Fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems , Hakodate, Japan, May 2006.
- Wednesday, October 31: Path Planning
- Lecture Notes
- Points to remember from the discussion in class:
- The planning space can be sampled instead of discretized.
- PRM generates a map of random points that are then connected to an initial position and final goal.
- RRT searches the space by sampling towards the goal and to random points building a connected tree as found needed.
- Replanning is needed when execution fails.
- Given a plan that failed on a step, two alternatives for replanning
are: to plan from scratch to the goal from the new state; and to plan
from the new state to try reach one of the steps of the past plan. Both
options are not ideal
- ERRT with a search bias to the past plan is the perfect solution
for replanning.
- Readings:
- Monday, November 5: POMDPs I
- Lecture Notes
- Points to remember from the discussion in class:
- POMDP is an extension of MDP with uncertainty over the
state one is in
- POMDP can be transformed into continuous-space MDP (over
belief space)
- POMDP policy (alpha vectors) can be represented as upper
surface of linear subspaces, which represent taking some
action and following fixed policy afterwards
- POMDPs are highly intractable to solve
- Readings:
- Wednesday, November 7: POMDPs II
- Lecture Notes
- Points to remember from the discussion in class:
- Focusing on reachable beliefs improves efficiency
tremendously
- Using tree-search and heuristics improves efficiency
significantly
- Greedy methods solve underlying MDP, assume that world
is observable after one step lookahead
- Readings:
- Monday, November 12: Oracular POMDPs (OPOMDPs), and Humans as Observation Providers in POMDPS (HOP-POMDPs)
- No handouts for lecture notes - please use readings.
- Points to remember from the discussion in class:
- There are methods to try to reduce state/belief uncertainty:
- Get more sensing (example of spinning in "confused" robot)
- Ask an oracle - Oracular POMDP (need to weigh the benefit and cost of information)
- Ask a human - HOP-POMDPs (include humans in planning model, and need to model humans as "special sensors," i.e., humans as observation providers)
- Readings:
-
Oracular Partially Observable Markov Decision Processes: A Very Special Case,
Nicholas Armstrong-Crews and Manuela Veloso.
In Proceedings of ICRA'2007, Rome, Italy, April 2007.
-
An Approximate Algorithm for Solving Oracular POMDPs,
Nicholas Armstrong-Crews and Manuela Veloso.
In Proceedings of ICRA'2008, Pasadena, CA, 2008.
-
Modeling humans as observation providers using POMDPs,
Stephanie Rosenthal and Manuela Veloso.
In Proceedings of the 20th IEEE International Symposium on Robot and
Human Interactive Communication (RO-MAN), Atlanta, GA, August 2011.
- Wednesday, November 14: Planning for Manipulation
- Monday, November 19: Execution Architectures
- Lecture Notes
- Points to remember from the discussion in class:
- Importance of modularity and hierarchy (temporal, behavioral,
functional)
- Layered architectures very important - upper layers provide goals;
commands sent to lower layers
- Many modern hybrid architectures have three layers - planning,
executive, behavioral - each with different roles and using
different representations
- No one architecture developed (yet) that is ideal for every application.
- Readings:
- Wednesday, November 21: Thanksgiving, No Class
- Monday, November 26: Execution Monitoring and Diagnosis
- Lecture Notes
- Points to remember from the discussion in class:
- Readings:
-
A Model-based Approach to Reactive Self-Configuring Systems
B.C. Williams, P.P. Nayak, U. Nayak, In Proceedings of National
Conference on Artificial Intelligence, 1996.
-
Strengthening Schedules Through Uncertainty Analysis,
L.M. Hiatt, T.L. Zimmerman, S.F. Smith and R. Simmons. In Proceedings of
International Joint Conference on Artificial Intelligence (IJCAI),
Pasadena CA, July 2009.
-
Risk-Variant Policy Switching to Exceed Reward Thresholds
B. Kane and R. Simmons. In Proceedings ICAPS 2012.
-
Thresholded Rewards: Acting Optimally in Timed, Zero-Sum Games,
C. McMillen and M. Veloso.
In Proceedings of AAAI'07, (Outstanding Paper Award), Vancouver,
Canada, July 2007.
- Wednesday, November 28: Multi-Robot State Estimation and Planning
- Lecture Notes I
- Points to remember from the discussion in class:
- Coordination between robots can be represented as multi-robot plays.
- Readings:
- Monday, December 3: Multi-Robot Planning and Coordination
- Lecture Notes II
- Points to remember from the discussion in class:
- Robots can communicate and share state in order to locally make their role assignment.
- Readings:
-
Prioritized Multi-Hypothesis Tracking by a Robot with Limited Sensing,
Paul E. Rybski and Manuela M. Veloso.
EURASIP Journal on Advances in Signal Processing,
Vol. 2009, Article ID 284525, 2009.
-
Dynamic multi-robot coordination,
Douglas Vail and Manuela Veloso,
In Multi-Robot Systems: From Swarms to Intelligent Automata,
A. Schultz, L. Parker, and F. Schneider (eds.),
Volume II, pages 87--100. Kluwer Academic Publishers, 2003.
- Wednesday, December 5: Review