First Annual March Madness Machine Learning Competition

Training Data

The competition is adapted from an existing Kaggle competition. That site has much relevant data for training ML models to predict the winner of men's and women's Division 1 basketball games. In paricular, the site has data, in some cases going back decades, for games played both during the regular season and tournaments. Statistics, for both the winning and losing teams, include points scored, field goals attempted and made, 3-pointers attempted and made, free throws attempted and made, offensive and defensive rebounds, assists, turnovers, steals, blocks, and fouls. It also includes data on the rankings of each team by various organizations, such as ESPN, USA Today.

You can use the data in any way you choose to train any model you choose. You are free to use whatever other data you can find to help train the model.

Contact me if you need help accessing the data and definitions of the data formats.

Scoring

Points are awarded as follows:

1 point for first-round and play-in games (note that this differs from regular NCAA brackets, where play-in games are not scored);
2 points for second round;
4 points for third round (elite eight);
8 points for 4th round (quarter-finals);
16 points for 5th round (semi-finals);
32 points for correctly identifying the champion.

A perfect bracket will score 196 points.

Software Infrastructure

We are providing software to create brackets, seed them (which teams initially play each other), enter your predictions, and score the bracket. In particular, you can use the software to create a regular or progressive bracket, for either the men's or women's tournament; seed the bracket for a particular year (e.g., 2023); fill the bracket with your predictions (see the Submission section for the format); add the results for that year's tourney; show a bracket; and score a bracket. The same software will be used in generating and scoring your bracket(s) for the 2025 tournament.

Run python madness.py to see an example (in the download, the predictions are chosen at random, so not such a great bracket). Once you have generated your own predictions (see above) you can use the software to test how your predictions would do in a previous season's tournament. Look at the very end of the file to see how to create, seed, fill, and test a bracket (either regular or progressive).

A snippet of the bracket produced by madness.py is shown below.

It is interpreted as follows: In the first round, 1st seed Connecticutt played 16th seeded Stetson; Connecticutt was predicted to win, and did. Then 8th seed FL Atlantic played 9th seeded Northwester; FL Atlantic was predicted to win, but Northwestern was the actual winner - the incorrect prediction is crossed out and the correct winner appears above. Similarly, San Deigo was correctly predicted as the winner, but Auburn was incorrectly predicted. In the second round, Connecticutt was predicted correctly, but Auburn was not. Note that this is a regular bracket - before the tournament began, it was predicted that Auburn would win all its games; once it lost in the first round, no points could be gotten for the subsequent rounds. If this were a progressive bracket, the second round would be a prediction between San Diego and Yale, and similarly in the third round it would be between Connecticutt and San Diego.

You can download the software from here.

Note: Strikethroughs for incorrect predictions do not work on all terminals. If you are not seeing the strikethroughs, invoke bracket.show() with the use_unicode=True option.

Submission

A submission to the competition consists of a CSV file that contains two columns labeled 'WTeamID' and 'LTeamID' (winning and losing teams identifiers, respectively). We are following the convention of the Kaggle competition to use numeric identifiers to uniquely identify each team. Men's team ids run from 1000-1999 and women's ids from 3000-3999.

The file should contain all possible matchups during the competition. For simplicity, just provide predictions for all possible pairs of ids -- that way, you'll know you have all the bases covered. Name the file 'MTourneyPredictions.csv' or 'WTourneyPredictions.csv' for the men's and women's tournaments, respectively.

There should be 72,010 predictions for the men's tournament (the number of combinations for the 380 Division 1 men's teams) and 71,253 predictions for the women's tournament (the number of combinations for the 378 Division 1 women's teams).

Use this Google form to submit you predictions. You may submit as many times as you want, up until March 18 at 5pm EDT, but only the last submission will be used.

First Annual
March Madness Machine Learning
Competition

New Deadline for submission is March 18 at 5pm EDT

Introduction

Training Data

Scoring

Software Infrastructure

Submission

First AnnualMarch Madness Machine LearningCompetition

New Deadline for submission is March 18 at 5pm EDT

Introduction

Training Data

Scoring

Software Infrastructure

Submission

First Annual
March Madness Machine Learning
Competition