Homework 6: Game Theory

xkcd capsule

Source : http://xkcd.com/601/

Introduction

In this homework, you will implement the randomized weighted majority algorithm for finding the Nash equilibrium mixed strategy profile of a zero sum game.

The code for this project contains the following files, which are available in a zip archive:

Files you will edit
game.py Contains the weighted_majority function, which you are required to implement. This function returns the strategy profiles of the two players, as well as the value for the Row player of the Nash equilibrium..
Files you should read but NOT edit
main.py Your interface with the game solver.
tests.py Contains unit test cases and test suite harness. Also defines the payoff matrices for the games that will be tested.

 

What to submit: You are required to make the test cases pass by implementing the weighted majority algorithm. You will fill in the weighted_majority function of game.py for the assignment. You should submit only this one file by placing it in /afs/andrew.cmu.edu/usr/dannyz/hw6/[your_andrew_id]. Please do not change or submit any of the other files from the distribution.

Evaluation: Your code will be autograded for technical correctness. Please do not change the names of any provided functions within the code, or you will wreak havoc on the autograder. However, the correctness of your implementation -- not the autograder's judgements -- will be the final judge of your score. If necessary, we will review and grade assignments individually to ensure that you receive due credit for your work.

Academic Dishonesty: We will be checking your code against other submissions in the class for logical redundancy. If you copy someone else's code and submit it with minor changes, we will know. We trust you all to submit your own work only; please don't let us down. If you do, we will pursue the strongest consequences available to us.

Getting Help: You are not alone! If you find yourself stuck on something, contact the course staff for help. Office hours and Piazza are there for your support; please use them. If you can't make our office hours, let us know and we will schedule more. We want these projects to be rewarding and instructional, not frustrating and demoralizing. But, we don't know when or how to help unless you ask.

 

Randomized Weighted Majority

As discussed in lecture, randomized weighted majority is an iterative algorithm that finds the mixed-strategy profile for both players by playing repeated games. As a reminder, the algorithm proceeds by iterating the following procedure (assuming that the matrix entries represent payoffs for the Row player, as is the convention):

  1. Column selects a mixed strategy consisting of his current weights on the moves available to him.
  2. Given Column's strategy, Row selects the move with the highest expected value.
  3. Column computes loss values for each possible move he could play. The loss value i for move i is simply the value from the payoff matrix corresponding to Row's move and Column playing move i.
  4. Column adjusts the weights of his mixed strategy to minimize the regret. The formula for reweighting is wi ← wie-εℓi, or equivalently (by the Taylor expasion), wi ← wi(1-εℓi). The weights are then re-normalized. (This formula assumes losses in the range [0,1], and needs to be adjusted slightly for losses outside this range.)
  5. The move picked by Row is recorded.

After sufficient iterations, the time-averaged weights on Column's moves converge to his Nash equilibrium mixed strategy, and the distribution of best moves made by Row converges to his Nash equilibrum strategy. You can tell that the distributions have converged by testing whether the expected values of each player's strategies sum to approximately 0.

To compute the expected value for Player X, compute the expected values for all possible moves of Player Y given Player X's mixed strategy. The value that is best for Player Y corresponds to the move that Y will presumably make, so that value (or its negation, if X is the minimizer) is the expected value of X's strategy. Remember that the minimizer will prefer low expected values, while the maximizer will prefer high expected values.

More details on the algorithm can be found in the first half of this paper (note that in this paper's notation, Q -- the Column player -- is the maximizer). You can find more information on the game theory concepts involved in Jerry Zhu's lecture notes on the topic.

Code

To get started, see the options available to you

python main.py --help

The game theory payoff matrices are stored as test cases, which you can ask the utility to print. For example, you can view the Rock-Paper-Scissors-Lizard-Spock matrix by running:

 python main.py --view rockSpock

To run the test cases, use the --test flag. You can run test case for the Rock-Paper-Scissors-Lizard-Spock game by running

 python main.py --test rockSpock

You can run all tests with:

 python main.py --test all

or simply

 python main.py

You will find that the test cases initially fail with a NotImplementedError.

The weighted_majority function is an abstraction for the weighted majority algorithm. You will need to define your own code and functions for proposing strategies, finding the best response to a mixed strategy, updating weights, assessing stopping criteria, etc. The unit-testing framework interacts with your code only via the weighted_majority function with the correct signature; how you implement the internals is up to you. Please don't hardcode values, as we will be reading your code for algorithmic correctness as well.

Time to code!

 

HTML template borrowed from the Pacman AI projects