Prof. Tuomas Sandholm
In this homework, you may use any sources that you want but you must cite the sources that you use. You may not use code written by someone else. Teamwork is not allowed. Due Friday, 12/6/2002 11:59pm (can be put under Prof. Sandholm's door). The maximum number of points is 100.
Pat Riley (email@example.com) is the main TA for this assignment.
This homework studies Q-learning for real-time control. The object to be controlled is a cart that moves on a track, and there is a stiff tall pole hinged to the cart (pointing straight up to start out with). The objective is to keep moving the cart so that the pole does not fall, i.e. the idea is to keep balancing the pole.
You should program a Q-learning agent that applies a force to the cart at every time step. The magnitude of the force is fixed, but the agent can choose whether to apply it to the left or to the right. The agent must exert force every step.
The code for the dynamics of the cart and the pole is given, as is the code that runs the simulation. You need only program the Q-learning agent. If you do the homework correctly, you should only have about 50 lines of code (possibly less) in your methods. However, think carefully how to initialize the Q-values, and how to choose actions. You should use (learning rate) and (discount factor).
This homework should take well less than 9 hours to complete. Your program should run rather quickly. Taking random actions, our solution runs through 100,000 failures in 23 seconds. Our best learning code balances the pole in well under a minute, but other reasonable strategies may take a bit longer. If your program runs for more than 20 minutes, there's probably a problem.
When you try your program, you should use a variety of random seeds (see the code section to see how) to insure that performance is consistent over different random sequences. If your learner only successfully balances the pole for a small fraction of the random seeds, it is not an acceptable solution.
/afs/andrew/course/15/381/hw4. JavaDoc generated documentation is available in the
We provide you the following files
CartPoleState.javaDescribes a state of the world for the simulation and can apply an action to the state. You should make no changes to this code. There are several values and methods in which you will be interested.
ACTION_RIGHT: constant for the actions you can take
NUMBER_ACTIONS: the number of actions (2)
NUMBER_STATES: the number of discrete states in the world
int getDiscreteIndex(): returns the discrete index (between 0 and
NUMBER_STATES) for the current state.
boolean isFailure(): If this is true, then the pole has fallen. You do not need to learn what to do in this case, and you should not call
I_QLearner.java: An interface which your code will have to implement.
SimRunner.java: The main function for the simulation. You will have to add a line allocating your
I_QLearner. Also, there are a few experimental parameters with which you may wish to experiment. Lastly, feel free to modify the output to make it easier to generate the hand-in items below. However, these are the only changes you should make to this file.
Where you run your program, you should invoke it as java SimRunner seed where seed is the random seed to use. If you do not specify a seed, a random one is chosen for you.
The following items should be in your write-up. Please turn in a hard-copy in class or under Prof. Sandholm's door by the due date. It is generally not acceptable to email us the writeup.
Feel free to experiment with the parameters (e.g. learning rate ,
discount factor ,
MAX_STEPS and different
initialization and action selection methods if you like.
This document was generated using the LaTeX2HTML translator Version 99.1 release (March 30, 1999)
Copyright © 1993, 1994, 1995, 1996,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 -no_navigation assign.tex
The translation was initiated by Patrick Riley on 2002-11-25