**How do you plan efficiently if the results of your actions are
uncertain? There is some remarkably good news, and some some
significant computational hardship. We begin by discussing Markov
Systems (which have no actions) and the notion of Markov Systems with
Rewards. We then motivate and explain the idea of infinite horizon
discounted future rewards. And then we look at two competing approaches
to deal with the following computational problem: given a Markov
System with Rewards, compute the expected long-term discounted rewards.
The two methods, which usually sit at opposite corners of the ring and
snarl at each other, are straight linear algebra and dynamic programming.
We then make the leap up to Markov Decision Processes, and find that
we've already done 82% of the work needed to compute not only the
long term rewards of each MDP state, but also the optimal action to
take in each state.
**

- In addition to these slides, for a survey on Reinforcement Learning, please see this paper or Sutton and Barto's book.
- Visual simulation of Markov Decision Process and Reinforcement Learning algorithms by Rohit Kelkar and Vivek Mehta.

**Powerpoint Format:** The Powerpoint originals of these slides are freely available to anyone
who wishes to use them for their own work, or who wishes to teach using
them in an academic institution. Please email
Andrew Moore at awm@cs.cmu.edu
if you would like him to send them to you. The only restriction is that
they are not freely available for use as teaching materials in classes
or tutorials outside degree-granting academic institutions.

- Detailed List of other Andrew Tutorial Slides
- Short List of other Andrew Tutorial Slides
- Andrew Moore's Homepage

**Advertisment: I have recently joined Google, and am starting up the new Google Pittsburgh office on CMU's campus. We are hiring creative computer scientists who love programming, and Machine Learning is one the focus areas of the office. If you might be interested, feel welcome to send me email: awm@google.com .**