15887*: Planning, Execution, and Learning
Schedule, Notes, Readings
Fall 2016
 Wednesday, September 7: (LV) Introduction
 Monday, September 12: (V) Representation for higher level planning
 Lecture Notes
 Points to remember from the discussion in class:
 Many ways to represent the states and actions of a problem,
significantly affecting the planning expressiveness and complexity.
 Closedworld assumption: What is not true in the state, is false.
 STRIPS action representation (add / delete lists).
 Many ways to find solutions to planning problems  differ
in complexity and how well they scale with respect to size
of search space.
 Humans use realworld knowledge when they solve planning problems;
most planners do not have that luxury.
 Readings:
 Wednesday, September 14: (V) Algorithms for classical planning
 Lecture Notes
 Points to remember from the discussion in class:
 Planners are characterized in terms of soundness, completeness,
optimality, and computational efficiency.
 Linear Planning: Work on one goal until completely solved. Only then move to another goal.
 GPS Algorithm: Sound, Not Optimal, Not Complete
 Prodigy Algorithm: Nonlinear planner, Sound
 Readings:
 Monday, September 19: (L) Graph representation for lower level planning
 Lecture Notes
 Points to remember from the discussion in class:
 Interleaving of graph construction and search is critical.
 CSpace Transform: makes planning faster, but constructing Cspace can be expensive.
 Multiple methods for constructing the graph of the planning problem.
 State discretization typically reduces computational complexity of motion planning at the expense of surrendering strict completeness.
 Readings:
 No assigned readings besides lecture notes.
 Wednesday, September 21: (V) Planning Graphs for Planning and Heuristic Search
 Lecture Notes
 Points to remember from the discussion in class:
 Graphplan has two phases: forward extension and backwards plan search.
 Exclusivity relations key to the efficiency of Graphplan.
 Graphplan is optimal in terms of number of time steps.
 Readings:
 Monday, September 26: (L) Heuristic A*; Weighted A*
 Lecture Notes
 Points to remember from the discussion in class:
 An heuristic applied to a state returns an estimate of the cost of the cheapest path from that state to the goal.
 An admissible heuristic never overestimates the cost of reaching the goal. A consistent heuristic satisfies the triangle inequality.
 A* is guaranteed to return an optimal path.
 Using heuristics may reduce the number of expanded states.
 Weighted A* trades off optimality for speed.
 No assigned readings besides lecture notes.
 Wednesday, September 28: (L) Heuristic functions, Multiheuristic A*
 Lecture Notes
 Points to remember from the discussion in class:
 When dealing with high dimensional problems, very often it is a good idea to solve a lower dimensional approximation of the problem and use it as an heuristic.
 The design of heuristics is critical in heuristic searchbased planning.

 Readings:
 Monday, October 3: (V) Probabilistic path planning
 Lecture Notes
 Points to remember from the discussion in class:
 PRM "discretizes" a continuous space by sampling points in the space.
 RRT searches the space by sampling towards the goal and to random points building a connected tree as found needed.
 Replanning is needed when execution fails.
 Given a plan that failed on a step, the two alternatives for replanning
are: to plan from scratch to the goal from the new state; and to plan
from the new state to try to reach one of the steps of the past plan. Both
options are not ideal.
 ERRT with a search bias to the past plan is the perfect solution
for replanning:
 ERRT expands the search tree to the goal with probability p, to a past found waypoint for reuse, with probability q, and a random target with probability 1pq.
 Readings:
 Wednesday, October 5: (V) Planning under uncertainty, replanning and decisiontheoretic planning, uncertainty of action
 Lecture Notes
 Points to remember from the discussion in class:
 Uncertainty arises from state uncertainty, nondeterministic effects,
and sensor noise.
 Conformant planning produces nonbranching plans that deal with
uncertainty by forcing the world into known states
 Planning models can change as a function of the planning "distance" to the state.
 Shortsighted probabilistic planning bridges the gap between policy planning and replanning.
 Readings:
 Monday, October 10: (L) Execution: heuristic search, anytime, replanning, realtime heuristic search I
 Lecture Notes
 Points to remember from the discussion in class:
 Planning is a repeated process. So, there is a need to replan fast.
 Anytime Heuristic Search: returns best plan possible within T msecs
 Incremental Heuristic Search: speed up search by reusing previous efforts
 Readings:
 Wednesday, October 12: (L) Execution: heuristic search, anytime, replanning, realtime heuristic search II
 Lecture Notes
 Points to remember from the discussion in class:
 Realtime Heuristic Search: plan a few steps towards the goal and replan later.
May lead to smaller total planning time, but the result may also be highly suboptimal.
 Freespace assumption: assume the path is clear unless known otherwise.
 Readings:
 Monday, October 17: (V) Planning under uncertainty: MDPs
 Lecture Notes
 Points to remember from the discussion in class:
 MDPs for planning under uncertainty
 MDPs reason about states and actions and rewards.
 Rewards capture goal achievement.
 Solving an MDP is finding the optimal policy.
 A policy maps states to actions.
 An MDP with a specific policy is a Markov Chain.
 We have seen value iteration and policy iteration
to solve Marchov Chains and MDPs.
 No assigned readings besides lecture notes.
 Wednesday, October 19: (V) Planning under uncertainty: Reinforcement Learning
 Lecture Notes
 Points to remember from the discussion in class:
 Reinforcement learning is an online learning method that learns a policy to control an agent in a world that is assumed to be an MDP.
 Modelfree RL learns a Q function that captures both unknown reward and state transition functions.
 An online RL agent balances exploitation and exploration.
 Readings:
 Chapter 13  Machine Learning, Tom Mitchell

Reinforcement Learning: A Survey
L.P. Kaelbling, M. Littman, A. Moore, JAIR vol 4, pp
237285, 1996

Probabilistic policy reuse in a reinforcement learning agent,
Fernando Fernandez and Manuela Veloso.
In Proceedings of the Fifth International Joint Conference on Autonomous Agents and MultiAgent Systems , Hakodate, Japan, May 2006.
 Monday, October 24: (L) Learning in Planning: Experience graphs
 Lecture Notes
 Points to remember from the discussion in class:
 Experience Graphs are a method for biasing search towards reuse of previously planned paths and demonstrations.
 They are based on the idea of recomputing heuristics to guide the search towards a set of paths rather than towards a goal.
 Can be useful in domains where a robot has a similar workspace across planning instances.
 Readings:
 Wednesday, October 26: (L) Dependent variables, Markov Property, Dominance
 Lecture Notes
 Points to remember from the discussion in class:
 Markov Property: cost and successor state only depend on the current state.
 Dominance relationship can be used to speed up search.
 Readings:
 Monday, October 31: (V) Learning in Planning: Explanationbased, Analogybased
 Lecture Notes
 Points to remember from the discussion in class:
 Generating "correct" explanations is hard, as it may require additional knowledge about the domain for the generalization part.
 Control learning "explains" a search episode and outputs an explanation that can be used by the planner "to speed up" its search for solutions to problems.
 Explanation Based Generalization: Assumes the world provides all relevant features. Has the purpose
 HAMLET: learns control rules without proving that they are correct by generalizing from a single example, but it inductively revises its set of rules from multiple incremental examples.
 Readings:
 Monday, November 2: (V) Deep Learning and Deep Reinforcement Learning
 Lecture Notes
 Points to remember from the discussion in class:
 When statespace is continuous or or infinite, we use Qlearning with approximation.
 Deep QNetwork (DQN): approximate Q function with a neural network.
 DQN: the network has one output node for each action available
 Concept of Replay Memory: necessary for data eficiency and avoiding overfitting.
 Readings:

Playing Atari with Deep Reinforcement Learning,
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra and M. Riedmiller
2013

HumanLevel Control Through Deep Reinforcement Learning,
V. Mnih, K. Kavukcuoglu, D. Silver, A. Rusu, J. Veness, M. Bellemare, A. Graves,
M. Riedmiller, A. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou,
H. King, D. Kumaran, D. Wierstra, S. Legg and D. Hassabis
In Nature 518, 529533, 2015
 Wednesday, November 7: (L) Learning in Planning: learning a cost function
 Lecture Notes
 Points to remember from the discussion in class:
 Learning cost function is a way of learning by demonstration.
 Goal is to learn a cost function that makes demonstrations optimal solutions to the problem
 Performance depends on the basis functions used.
 Readings:
 Wednesday, November 9: (L) Planning under uncertainty: POMDPs
 Lecture Notes
 Points to remember from the discussion in class:
 POMDP is an extension of MDP with uncertainty over the
state one is in
 POMDP can be transformed into continuousspace MDP (over
belief space)
 POMDPs are highly intractable to solve. There are some approximation techniques, but intractability is still a big issue.
 Readings:
 Monday, November 14: (V) MultiRobot Decision Making: State Estimation and Coordination
 Lecture Notes
 Points to remember from the discussion in class:
 Details on the CMDragons SSL team.
 The STEP architecture
 Robots can communicate and share state in order to locally make their role assignment.
 Readings:
 Wednesday, November 16: (V) ShortSighted Probabilistic Planning
 Lecture Notes
 Points to remember from the discussion in class:
 Probabilistic planners FFReplan, UCT, SSiPP
 FFReplan: remove probabilities from actions
 UCT: Limit the maximum number of actions that can be applied to reach the goal, and use sparse sampling to search for the solution.
 SSiPP: Prune states reachable only in the far future.
 Readings:
 Monday, November 21: (V) Review
 Lecture Notes
 Points to remember from the discussion in class:
 Readings:
 Wednesday, November 23: THANKSGIVING  NO CLASS
 Lecture Notes
 Points to remember from the discussion in class:
 Readings:
 Monday, November 28: (L) Application: planning for autonomous driving
 Lecture Notes
 Points to remember from the discussion in class:
 Planning at multiple levels
 Multiresolution lattice in Motion Planner is beneficial
 Anytime replanner is critical
 Design of proper heuristics is key to efficiency.
 Readings:
 Wednesday, November 30: (L) Application: planning for mobile manipulation and articulated robots
 Lecture Notes
 Points to remember from the discussion in class:
 Multiple planners used for both domains.
 Start and goal configurations are often most constrained  this can be exploited by the planners.
 Cost function can be really complex in planning for articulated robots
 Designing proper heuristics is key to efficiency.
 Readings:

Single and DualArm Motion Planning with Heuristic Search,
Benjamin Cohen, Sachin Chitta and Maxim Likhachev
In The International Journal of Robotics Research 2013

Searchbased Planning for a Legged Robot over Rough Terrain,
Paul Vernaza, Maxim Likhachev, Subhrajit Bhattacharya, Sachin Chitta, Aleksandr Kushleyev, Daniel D. Lee
In International Conference on Robots and Automation 2009

Optimization and Learning for RoughTerrain Legged Locomotion,
Zucker, Matt, et al.
In The International Journal of Robotics Research 30.2 (2011): 175191.
 Monday, December 5: (L) Application: planning for perception
 Lecture Notes
 Points to remember from the discussion in class:
 Deliberative Perception: search for best “explanation” of the observed scene
 Discriminative guidance from any and multiple statistical learners
 MultiHeuristic A* (MHA*) for graph search with distinct hypotheses
 Readings:
 Wednesday, December 7: (LV) Project presentations
 Lecture Notes
 Points to remember from the discussion in class:
 Readings:
mmv@cs.cmu.edu
July 23, 2016