# Midterm 2

## Learning Objectives

### Logic

1. Describe the definition of (Boolean) Satisfiability Problem (SAT).
2. Describe conjunctive normal form (CNF).
3. Understand the algorithm DPLL for solving SAT problems.
4. Describe and create a Successor-State Axiom given a problem setup.
5. Describe and implement SATPlan (Planning as Satisfiability) to real-world problems.

### Classical Planning

1. Compare and contrast classical planning methods with planning via search or propositional logic.
2. Execute linear planning on real-world problems.
3. Identify properties of a given planning algorithm, namely whether it is sound, complete, and optimal.
4. Implement and execute real-world problems given a GraphPlan solver.
5. Create or extend layers of a GraphPlan graph.
6. Identify termination conditions from a GraphPlan graph.

### Markov Decision Processes (MDPs)

1. Describe the definition of Markov Decision Process.
2. Compute utility of a reward sequence given discount factor.
3. Define policy and optimal policy of an MDP.
4. Define state-value and (true) state value of an MDP.
5. Define Q-value and (true) Q value of an MDP.
6. Derive optimal policy from (true) state value or (true) Q-values.
7. Write Bellman Equation for state-value and Q-value for optimal policy and a given policy.
8. Describe and implement value iteration algorithm (through Bellman update) for solving MDPs.
9. Describe and implement policy iteration algorithm (through policy evaluation and policy improvement) for solving MDPs.
10. Understand convergence for value iteration and policy iteration.

### Reinforcement Learning

1. Understand the concept of exploration, exploitation, regret.
2. Describe the relationships and differences between:
1. Markov Decision Processes (MDP) vs Reinforcement Learning (RL)
2. Model-based vs Model-free RL
3. Temporal-Difference Value Learning (TD Value Learning) vs Q-Learning
4. Passive vs Active RL
5. Off-policy vs On-policy Learning
6. Exploration vs Exploitation
3. Describe and implement:
1. Temporal difference learning
2. Q-Learning
3. $$\epsilon$$-Greedy algorithm
4. Approximate Q-learning (Feature-based)
4. Derive weight update for Approximate Q-learning.

## Practice Exams

Please find practice midterms and solutions attached below. We have included some recordings of TAs walking through solutions to some select problems. If you have any questions, please feel free to post on Piazza or ask during Office Hours.

Practice Midterm 2A: blank/sol

Practice Midterm 2B: blank/sol