15887*: Planning, Execution, and Learning
Schedule, Notes, Readings
Fall 2016
 Wednesday, September 7: (LV) Introduction
 Lecture Notes
 Points to remember from the discussion in class:
 No assigned readings besides lecture notes.
 Monday, September 12: (V) Representation for higher level planning
 Lecture Notes
 Points to remember from the discussion in class:
 Many ways to represent the states and actions of a problem,
significantly affecting the planning expressiveness and complexity.
 Closedworld assumption: What is not true in the state, is false.
 STRIPS action representation (add / delete lists).
 Many ways to find solutions to planning problems  differ
in complexity and how well they scale with respect to size
of search space.
 Humans use realworld knowledge when they solve planning problems;
most planners do not have that luxury.
 Readings:
 Wednesday, September 14: (V) Algorithms for classical planning
 Lecture Notes
 Points to remember from the discussion in class:
 Planners are characterized in terms of soundness, completeness,
optimality, and computational efficiency.
 Linear Planning: Work on one goal until completely solved. Only then move to another goal.
 GPS Algorithm: Sound, Not Optimal, Not Complete
 Prodigy Algorithm: Nonlinear planner, Sound
 Readings:
 Monday, September 19: (L) Graph representation for lower level planning
 Lecture Notes
 Points to remember from the discussion in class:
 Interleaving of graph construction and search is critical.
 CSpace Transform: makes planning faster, but constructing Cspace can be expensive.
 Multiple methods for constructing the graph of the planning problem.
 State discretization typically reduces computational complexity of motion planning at the expense of surrendering strict completeness.
 Readings:
 No assigned readings besides lecture notes.
 Wednesday, September 21: (V) Planning Graphs for Planning and Heuristic Search
 Lecture Notes
 Points to remember from the discussion in class:
 Graphplan has two phases: forward extension and backwards plan search.
 Exclusivity relations key to the efficiency of Graphplan.
 Graphplan is optimal in terms of number of time steps.
 Readings:
 Monday, September 26: (L) Heuristic A*; Weighted A*
 Lecture Notes
 Points to remember from the discussion in class:
 An heuristic applied to a state returns an estimate of the cost of the cheapest path from that state to the goal.
 An admissible heuristic never overestimates the cost of reaching the goal. A consistent heuristic satisfies the triangle inequality.
 A* is guaranteed to return an optimal path.
 Using heuristics may reduce the number of expanded states.
 Weighted A* trades off optimality for speed.
 No assigned readings besides lecture notes.
 Wednesday, September 28: (L) Heuristic functions, Multiheuristic A*
 Lecture Notes
 Points to remember from the discussion in class:
 When dealing with high dimensional problems, very often it is a good idea to solve a lower dimensional approximation of the problem and use it as an heuristic.
 The design of heuristics is critical in heuristic searchbased planning.

 Readings:
 Monday, October 3: (V) Probabilistic path planning
 Lecture Notes
 Points to remember from the discussion in class:
 PRM "discretizes" a continuous space by sampling points in the space.
 RRT searches the space by sampling towards the goal and to random points building a connected tree as found needed.
 Replanning is needed when execution fails.
 Given a plan that failed on a step, the two alternatives for replanning
are: to plan from scratch to the goal from the new state; and to plan
from the new state to try to reach one of the steps of the past plan. Both
options are not ideal.
 ERRT with a search bias to the past plan is the perfect solution
for replanning:
 ERRT expands the search tree to the goal with probability p, to a past found waypoint for reuse, with probability q, and a random target with probability 1pq.
 Readings:
 Wednesday, October 5: (V) Planning under uncertainty, replanning and decisiontheoretic planning, uncertainty of action
 Lecture Notes
 Points to remember from the discussion in class:
 Uncertainty arises from state uncertainty, nondeterministic effects,
and sensor noise.
 Conformant planning produces nonbranching plans that deal with
uncertainty by forcing the world into known states
 Planning models can change as a function of the planning "distance" to the state.
 Shortsighted probabilistic planning bridges the gap between policy planning and replanning.
 Readings:
 Monday, October 10: (L) Execution: heuristic search, anytime, replanning, realtime heuristic search I
 Lecture Notes
 Points to remember from the discussion in class:
 Planning is a repeated process. So, there is a need to replan fast.
 Anytime Heuristic Search: returns best plan possible within T msecs
 Incremental Heuristic Search: speed up search by reusing previous efforts
 Readings:
 Wednesday, October 12: (L) Execution: heuristic search, anytime, replanning, realtime heuristic search II
 Lecture Notes
 Points to remember from the discussion in class:
 Realtime Heuristic Search: plan a few steps towards the goal and replan later.
May lead to smaller total planning time, but the result may also be highly suboptimal.
 Freespace assumption: assume the path is clear unless known otherwise.
 Readings:
 Monday, October 17: (V) Planning under uncertainty: MDPs
 Lecture Notes
 Points to remember from the discussion in class:
 MDPs for planning under uncertainty
 MDPs reason about states and actions and rewards.
 Rewards capture goal achievement.
 Solving an MDP is finding the optimal policy.
 A policy maps states to actions.
 An MDP with a specific policy is a Markov Chain.
 We have seen value iteration and policy iteration
to solve Marchov Chains and MDPs.
 No assigned readings besides lecture notes.
 Wednesday, October 19: (V) Planning under uncertainty: Reinforcement Learning
 Lecture Notes
 Points to remember from the discussion in class:
 Reinforcement learning is an online learning method that learns a policy to control an agent in a world that is assumed to be an MDP.
 Modelfree RL learns a Q function that captures both unknown reward and state transition functions.
 An online RL agent balances exploitation and exploration.
 Readings:
 Chapter 13  Machine Learning, Tom Mitchell

Reinforcement Learning: A Survey
L.P. Kaelbling, M. Littman, A. Moore, JAIR vol 4, pp
237285, 1996

Probabilistic policy reuse in a reinforcement learning agent,
Fernando Fernandez and Manuela Veloso.
In Proceedings of the Fifth International Joint Conference on Autonomous Agents and MultiAgent Systems , Hakodate, Japan, May 2006.
 Monday, October 24: (L) Learning in Planning: Experience graphs
 Lecture Notes
 Points to remember from the discussion in class:
 Experience Graphs are a method for biasing search towards reuse of previously planned paths and demonstrations.
 They are based on the idea of recomputing heuristics to guide the search towards a set of paths rather than towards a goal.
 Can be useful in domains where a robot has a similar workspace across planning instances.
 Readings:
 Wednesday, October 26: (L) Dependent variables, Markov Proprty, Dominance
 Lecture Notes
 Points to remember from the discussion in class:
 Readings:
 Monday, October 31: (V) Learning in Planning: Explanationbased, Analogybased
 Lecture Notes
 Points to remember from the discussion in class:
 Readings:
 Wednesday, November 2: (L) Learning in Planning: learning a cost function
 Lecture Notes
 Points to remember from the discussion in class:
 Readings:
 Monday, November 7: (V) Deep Learning and Deep Reinforcement Learning
 Lecture Notes
 Points to remember from the discussion in class:
 Readings:
 Wednesday, November 9: (L) Planning under uncertainty: POMDPs
 Lecture Notes
 Points to remember from the discussion in class:
 Readings:
 Monday, November 14: (V) Planning under uncertainty: Shortsighted probabilistic planning
 Lecture Notes
 Points to remember from the discussion in class:
 Readings:
 Wednesday, November 16: (V) Multiagent planning and execution: DecSIMDPs
 Lecture Notes
 Points to remember from the discussion in class:
 Readings:
 Monday, November 21: (V) Multiagent and humanagent planning and execution
 Lecture Notes
 Points to remember from the discussion in class:
 Readings:
 Wednesday, November 23: THANKSGIVING  NO CLASS
 Lecture Notes
 Points to remember from the discussion in class:
 Readings:
 Monday, November 28: (L) Application: planning for manipulation
 Lecture Notes
 Points to remember from the discussion in class:
 Readings:
 Wednesday, November 30: (L) Application: planning for ground and aerial robots
 Lecture Notes
 Points to remember from the discussion in class:
 Readings:
 Monday, December 5: (L) Application: planning for perception (MHA*, PERCH)
 Lecture Notes
 Points to remember from the discussion in class:
 Readings:
 Wednesday, December 7: (LV) Review
 Lecture Notes
 Points to remember from the discussion in class:
 Readings:
mmv@cs.cmu.edu
July 23, 2016