Due in class at 1:30pm, July 23. You may work in a group of up to three students, but each individual must be involved in every question. Do not assign problems to individuals within a team! Please submit only one solution per group.
Consider the following state space, an extension of the one we did in class. There is a reward of 72 for taking the right (R) action from d. We'll start out in position a.

Fill out a table like the following. (I've done the first few steps for
you.)
| step | 1 | 2 | 3 | 4 | 5 | ... |
| state (s) | a | b | c | d | a | ... |
| action (A) | R | R | R | R | R | ... |
| reward (r) | 0 | 0 | 0 | 72 | 0 | ... |
| new state (s') | b | c | d | a | b | ... |
| Q(a,R) | 0 | 0 | 0 | |||
| Q(b,R) | 0 | 0 | 0 | |||
| Q(c,R) | 0 | 0 | 0 | |||
| Q(d,R) | 0 | 0 | 0 |