15-887*: Planning, Execution, and LearningSchedule, Notes, Readings Fall 2016

• Wednesday, September 7: (LV) Introduction
• Lecture Notes
• Points to remember from the discussion in class:
• No assigned readings besides lecture notes.

• Monday, September 12: (V) Representation for higher level planning
• Lecture Notes
• Points to remember from the discussion in class:
• Many ways to represent the states and actions of a problem, significantly affecting the planning expressiveness and complexity.
• Closed-world assumption: What is not true in the state, is false.
• STRIPS action representation (add / delete lists).
• Many ways to find solutions to planning problems -- differ in complexity and how well they scale with respect to size of search space.
• Humans use real-world knowledge when they solve planning problems; most planners do not have that luxury.

• Wednesday, September 14: (V) Algorithms for classical planning
• Monday, September 19: (L) Graph representation for lower level planning
• Lecture Notes
• Points to remember from the discussion in class:
• Interleaving of graph construction and search is critical.
• C-Space Transform: makes planning faster, but constructing C-space can be expensive.
• Multiple methods for constructing the graph of the planning problem.
• State discretization typically reduces computational complexity of motion planning at the expense of surrendering strict completeness.
• No assigned readings besides lecture notes.

• Wednesday, September 21: (V) Planning Graphs for Planning and Heuristic Search
• Lecture Notes
• Points to remember from the discussion in class:
• Graphplan has two phases: forward extension and backwards plan search.
• Exclusivity relations key to the efficiency of Graphplan.
• Graphplan is optimal in terms of number of time steps.

• Monday, September 26: (L) Heuristic A*; Weighted A*
• Lecture Notes
• Points to remember from the discussion in class:
• An heuristic applied to a state returns an estimate of the cost of the cheapest path from that state to the goal.
• An admissible heuristic never overestimates the cost of reaching the goal. A consistent heuristic satisfies the triangle inequality.
• A* is guaranteed to return an optimal path.
• Using heuristics may reduce the number of expanded states.
• Weighted A* trades off optimality for speed.
• No assigned readings besides lecture notes.

• Wednesday, September 28: (L) Heuristic functions, Multi-heuristic A*
• Lecture Notes
• Points to remember from the discussion in class:
• When dealing with high dimensional problems, very often it is a good idea to solve a lower dimensional approximation of the problem and use it as an heuristic.
• The design of heuristics is critical in heuristic search-based planning.

• Monday, October 3: (V) Probabilistic path planning
• Lecture Notes
• Points to remember from the discussion in class:
• PRM "discretizes" a continuous space by sampling points in the space.
• RRT searches the space by sampling towards the goal and to random points building a connected tree as found needed.
• Replanning is needed when execution fails.
• Given a plan that failed on a step, the two alternatives for replanning are: to plan from scratch to the goal from the new state; and to plan from the new state to try to reach one of the steps of the past plan. Both options are not ideal.
• ERRT with a search bias to the past plan is the perfect solution for replanning:
• ERRT expands the search tree to the goal with probability p, to a past found waypoint for reuse, with probability q, and a random target with probability 1-p-q.

• Wednesday, October 5: (V) Planning under uncertainty, re-planning and decision-theoretic planning, uncertainty of action
• Lecture Notes
• Points to remember from the discussion in class:
• Uncertainty arises from state uncertainty, non-deterministic effects, and sensor noise.
• Conformant planning produces non-branching plans that deal with uncertainty by forcing the world into known states
• Planning models can change as a function of the planning "distance" to the state.
• Short-sighted probabilistic planning bridges the gap between policy planning and replanning.

• Monday, October 10: (L) Execution: heuristic search, anytime, re-planning, real-time heuristic search I
• Lecture Notes
• Points to remember from the discussion in class:
• Planning is a repeated process. So, there is a need to re-plan fast.
• Anytime Heuristic Search: returns best plan possible within T msecs
• Incremental Heuristic Search: speed up search by reusing previous efforts

• Wednesday, October 12: (L) Execution: heuristic search, anytime, re-planning, real-time heuristic search II
• Lecture Notes
• Points to remember from the discussion in class:
• Realtime Heuristic Search: plan a few steps towards the goal and re-plan later. May lead to smaller total planning time, but the result may also be highly sub-optimal.
• Freespace assumption: assume the path is clear unless known otherwise.

• Monday, October 17: (V) Planning under uncertainty: MDPs
• Lecture Notes
• Points to remember from the discussion in class:
• MDPs for planning under uncertainty
• MDPs reason about states and actions and rewards.
• Rewards capture goal achievement.
• Solving an MDP is finding the optimal policy.
• A policy maps states to actions.
• An MDP with a specific policy is a Markov Chain.
• We have seen value iteration and policy iteration to solve Marchov Chains and MDPs.
• No assigned readings besides lecture notes.

• Wednesday, October 19: (V) Planning under uncertainty: Reinforcement Learning
• Lecture Notes
• Points to remember from the discussion in class:
• Reinforcement learning is an online learning method that learns a policy to control an agent in a world that is assumed to be an MDP.
• Model-free RL learns a Q function that captures both unknown reward and state transition functions.
• An online RL agent balances exploitation and exploration.

• Monday, October 24: (L) Learning in Planning: Experience graphs
• Lecture Notes
• Points to remember from the discussion in class:
• Experience Graphs are a method for biasing search towards reuse of previously planned paths and demonstrations.
• They are based on the idea of re-computing heuristics to guide the search towards a set of paths rather than towards a goal.
• Can be useful in domains where a robot has a similar workspace across planning instances.

• Wednesday, October 26: (L) Dependent variables, Markov Proprty, Dominance
• Lecture Notes
• Points to remember from the discussion in class:

• Monday, October 31: (V) Learning in Planning: Explanation-based, Analogy-based
• Lecture Notes
• Points to remember from the discussion in class:

• Wednesday, November 2: (L) Learning in Planning: learning a cost function
• Lecture Notes
• Points to remember from the discussion in class:

• Monday, November 7: (V) Deep Learning and Deep Reinforcement Learning
• Lecture Notes
• Points to remember from the discussion in class:

• Wednesday, November 9: (L) Planning under uncertainty: POMDPs
• Lecture Notes
• Points to remember from the discussion in class:

• Monday, November 14: (V) Planning under uncertainty: Short-sighted probabilistic planning
• Lecture Notes
• Points to remember from the discussion in class:

• Wednesday, November 16: (V) Multi-agent planning and execution: DecSIMDPs
• Lecture Notes
• Points to remember from the discussion in class:

• Monday, November 21: (V) Multi-agent and human-agent planning and execution
• Lecture Notes
• Points to remember from the discussion in class:

• Wednesday, November 23: THANKSGIVING - NO CLASS
• Lecture Notes
• Points to remember from the discussion in class:

• Monday, November 28: (L) Application: planning for manipulation
• Lecture Notes
• Points to remember from the discussion in class:

• Wednesday, November 30: (L) Application: planning for ground and aerial robots
• Lecture Notes
• Points to remember from the discussion in class: