next up previous
Next: Controlled Experiments Up: Results Previous: TAC-01 Competition

TAC-02 Competition

A year after the TAC-01 competition, ATTac-2001 was re-entered in the TAC-02 competition using the models trained at the end of TAC-01. Specifically, the price predictors were left unchanged throughout (no learning). The seeding round included 19 agents, each playing 440 games over the course of about 2 weeks. ATTac-2001 was the top-scoring agent in this round, as shown in Table 11. Scores in the seeding round were weighted so as to emphasize later results over earlier results: scores on day $n$ of the seeding round were given a weight of $n$. This practice was designed to encourage experimentation early in the round. The official ranking in the competitions was based on the mean score after ignoring each agent's worst 10 results so as to allow for occasional program crashes and network problems.

Table 11: Top 8 scores during the seeding round of TAC-02. Each agent played 440 games, with its worst 10 games ignored when computing the rankings.
Agent Mean Weighted, dropped worst 10
ATTac-2001 3050 3131
SouthamptonTAC 3100 3129
UMBCTAC 2980 3118
livingagents 3018 3091
cuhk 2998 3055
Thalis 2952 3000
whitebear 2945 2966
RoxyBot 2738 2855

On the one hand, it is striking that ATTac-2001 was able to finish so strongly in a field of agents that had presumably improved over the course of the year. On the other hand, most agents were being tuned, for better and for worse, while ATTac-2001 was consistent throughout. In particular, we are told that SouthamptonTAC experimented with its approach during the later days of the round, perhaps causing it to fall out of the lead (by weighted score) in the end. During the 14-game semifinal heat, ATTac-2001, which was now restored with its learning capability and retrained over the data from the 2002 seeding round, finished 6th out of 8 thereby failing to reach the finals.

There are a number of possible reasons for this sudden failure. One relatively mundane explanation is that the agent had to change computational environments between the seeding rounds and the finals, and there may have been a bug or computational resource constraint introduced. Another possibility is that due to the small number of games in the semifinals, ATTac-2001 simply got unlucky with respect to clients and the interaction of opponent strategies. However, it is also plausible that the training data from the 2002 qualifying and seeding round data was less representative of the 2002 finals than the was the training data from 2001; and/or that the competing agents improved significantly over the seeding round while ATTac-2001 remained unchanged. The TAC team at the University of Michigan has done a study of the price predictors of several 2002 TAC agents that suggests that the bug hypothesis is most plausible: the ATTac-2001 predictor from 2001 outperforms all other predictors from 2002 on the data from the 2002 semifinals and finals; and one other agent that uses the 2002 data did produce good predictions based on that data [Wellman, Reeves, Lochner, VorobeychikWellman et al.2003b].8

next up previous
Next: Controlled Experiments Up: Results Previous: TAC-01 Competition
Peter Stone 2003-09-24