# 15-887*: Planning, Execution, and LearningSchedule, Notes, Readings Fall 2014

• Monday, September 8: Introduction
• Lecture Notes
• Points to remember from the discussion in class:
• Planning is equivalent to "problem solving," formalized in terms of states, goals, and actions.
• Planning is "thinking" and not executing. It needs models of states and actions.
• Execution performs the actions of a plan, and monitors their applicability on the perceived states.
• Learning improves planning and execution through experience.
• Different planning problems require different assumptions about state and actions, which can make a problem easier or harder to solve.
• No assigned readings besides lecture notes.

• Wednesday, September 10: Representation and Search
• Lecture Notes
• Points to remember from the discussion in class:
• Many ways to represent states and actions, significantly affecting the planning expressiveness and complexity
• Strips-assumption (add/delete lists) deals with Frame Problem
• Many ways to find solutions to planning problems -- differ in complexity and how well they scale with respect to size of search space
• People bring lots of real-world knowledge to bear when they solve planning problems; most planners do not have that luxury
• Writing a planning domain in terms of a set of predicates, variables, and operators is a hard task.
• Planners are characterized in terms of soundness, completeness, optimality, and computational efficiency
• No assigned readings besides lecture notes.

• Monday, September 15: Plan-Space Planning
• Lecture Notes
• Points to remember from the discussion in class:
• Search space is space of partial plans; search actions modify plans (refine them)
• Partial-order planners work by adding causal links from effects of one action to precondition of another, and then by eliminating threats. Threats can be eliminated by promotion, demotion, or separation. Backtracking occurs if a threat cannot be eliminated.
• Use of Least Commitment -- delaying choices of temporal constraints and parameter bindings unless forced.
• Partial-order planners can easily (and often optimally) deal with goal interactions
• Modal Truth Criterion: A proposition p is necessarily true at a step s iff two conditions hold: There is a step t equal or necessarily previous to s in which p is necessarily asserted; and for every step C possibly before s and every proposition q possibly codesignating with p which C denies, there is a step W necessarily between C and s which asserts r, a proposition such that r and p codesignate whenever p and q codesignate.

• Wednesday, September 17: State-Space Linear and Nonlinear Planning
• Lecture Notes
• Points to remember from the discussion in class:
• State-space planning is about committing to changing the state of the world while planning: "apply an applicable operator/action."
• Linear planning maintains the state and a goal stack: it's efficient but can generate unoptimal solutions, and it is incomplete.
• Nonlinear state-space planning uses a set of goals, backtracks over plan orderings decisions. It's complete.
• Prodigy4.0 is a nonline state-space planner that uses a set of goals, and also has a choice point on either to apply an applicable operator, or to continue subgoaling on still pending goals.
• State-space planning commits on plan step orderings with the main goal of getting an updated state and hopefully a "closer" state to the goal. Such commitments may lead to backtracking, but may also lead to informed state-based choices of actions.

• Monday, September 22: Graph-Based and Sat-Based Planning
• Wednesday, September 24: Comparison of Planners - State, Plan, Graph
• Lecture Notes
• Points to remember from the discussion in class:
• Planning involves search. Search means choices points. Commitments are made at choice points. When search fails, planners backtrack to choice points and select alternative commitments.
• State-space planning commits on plan step orderings with the main goal of getting an updated state and hopefully a "closer" state to the goal. Such commitments may lead to backtracking, but may also lead to informed state-based choices of actions.
• Plan-space planning does not commit on plan step orderings, until it needs to solve threats. It is therefore seen as a least-commitment search approach. Plan-space planning does not backtrack over threat resolution.
• Plan-space planning commits to causal links leading to linkability problems and to the need to backtrack over linking commitments. In the linkability sense, plan-space planning is a strong-commitment planner.

• Monday, September 29: Heuristic Planning I
• Lecture Notes
• Points to remember from the discussion in class:
• Heuristics applied to a state return the estimated length of the plan from that state to the goal.
• There are several ways to compute the heuristic value of a state. We talked about the additive heuristic, used by HSP, and the FF heuristic based on the number of actions returned by a Graphplan on the relaxed problem corresponding to the actions with no delete effects.
• For the "Planning as Heuristic Search" reading, focus on sections 4 and 5.1, 5.2, and 5.3 Blocksworld.
• For the "FF Planning" reading, focus on sections 3, 4 (4.1), and 5. And 6 if interested in further efficiency.

• Wednesday, October 1: Heuristic Planning II - Landmarks
• Lecture Notes
• Points to remember from the discussion in class:
• Landmarks are formulae that must hold in any plan to achieve a given set of goals
• Finding all landmarks is intractable, but good polynomial algorithms exist
• Landmarks can be used as either subgoals or heuristic estimates
• Admissible heuristics can be constructed from landmarks

• Monday, October 6: Explanation-Based Learning
• Lecture Notes
• Points to remember from the discussion in class:
• Guidance to the planner about its choices can be represented as "control rules" that use the state of the planner as preconditions of the rules. The rules can select, reject, or prefer choices.
• Control learning "explains" a search episode and outputs an explanation that can be used by the planner "to speed up" its search for solutions to problems.
• Generating "correct" explanations is hard, as it may require additional knowledge about the domain for the generalization part.
• HAMLET learns control rules without proving that they are correct by generalizing from a single example, but it inductively revises its set of rules from multiple incremental examples.

• Wednesday, October 8: Planning under Uncertainty I - Conditional and Conformant Planning
• Lecture Notes
• Points to remember from the discussion in class:
• Uncertainty arises from state uncertainty, non-deterministic effects, and sensor noise.
• Uncertainty can be represented through possible worlds
• Sensing actions can be used to represent knowledge effects that distinguish between different possible worlds
• Conformant planning produces non-branching plans that deal with uncertainty by forcing the world into known states
• Conditional planning produces branching plans that use sensing actions during execution to determine which branch to take
• Graphplan can be easily extended to reason about possible worlds
• There are three BDD-based planning algorithms to understand: strong, strong cyclic, and optimistic.

• Monday, October 13: Planning under Uncertainty II - Replanning and Decision-Theoretic Planning
• Wednesday, October 15: Markov Decision Processes
• Lecture Notes
• Points to remember from the discussion in class:
• Russell and Norvig, Chapter 16.

• Monday, October 20: Reinforcement Learning
• Lecture Notes
• Points to remember from the discussion in class:
• Reinforcement learning is an online learning method that learns a policy to control an agent in a world that is assumed to be an MDP.
• Model-free RL learns a Q function that captures both unknown reward and state transition functions.
• An online RL agent balances exploitation and exploration.

• Wednesday, October 22: Planning by Analogy and Policy Reuse
• Monday, October 27: POMDPs I
• Lecture Notes
• Points to remember from the discussion in class:
• POMDP is an extension of MDP with uncertainty over the state one is in
• POMDP can be transformed into continuous-space MDP (over belief space)
• POMDP policy (alpha vectors) can be represented as upper surface of linear subspaces, which represent taking some action and following fixed policy afterwards
• POMDPs are highly intractable to solve

• Wednesday, October 29: POMDPs II
• Monday, November 3: Path Planning
• Lecture Notes
• Points to remember from the discussion in class:
• PRM "discretizes" a continuous space by sampling points in the space.
• RRT searches the space by sampling towards the goal and to random points building a connected tree as found needed.
• Replanning is needed when execution fails.
• Given a plan that failed on a step, the two alternatives for replanning are: to plan from scratch to the goal from the new state; and to plan from the new state to try to reach one of the steps of the past plan. Both options are not ideal.
• ERRT with a search bias to the past plan is the perfect solution for replanning:
• E-RRT expands the search tree to the goal with probability p, to a past found waypoint for reuse, with probability q, and a random target with probability 1-p-q.

• Wednesday, November 5: Variable Level of Detail Planning and Short-Sighted Probabilistic Planning
• Monday, November 10: Multi-Robot Team Planning
• Lecture Notes
• Points to remember from the discussion in class:
• Coordination between robots can be represented as multi-robot plays.