Discussion

The open-loop and closed-loop strategies of the previous section differ in their handling of price fluctuation. A fundamental way of taking price fluctuation into account is to place ``safe bids.'' A very high bid exposes an agent to the danger of buying something at a ridiculously high price. If prices are in fact stable then high bids are safe. But if prices fluctuate, then high bids, such as the bids of the stable-price strategy, are risky. In TAC, hotel rooms are sold in a Vickrey-style th price action. There is a separate auction for each day of each hotel and these auctions are done sequentially. Although the order of the auctions is randomized, and not known to the agent, when placing bids in one of these auctions the agent assumes that auction will close next. We assumed in the design of our agent that our bids in one auction do not affect prices in other auctions. This assumption is not strictly true, but in a large economy one expects that the bids of a single individual have a limited effect on prices. Furthermore, the price most affected by a bid is the price of the item being bid on; the effect on other auctions seems less direct and perhaps more limited. Assuming bids in one auction do not affect prices in another, the optimal bidding strategy is the standard strategy for a Vickrey auction--the bid for an item should be equal to its utility to the bidder. So, to place a Vickrey-optimal bid, one must be able to estimate the utility of an item. The utility of owning an item is simply the expected final score assuming one owns the item minus the expected final score assuming one does not own the item. So, the problem of computing a Vickrey-optimal bid can be reduced to the problem of predicting final scores for two alternative game situations. We use two score prediction procedures, which we call the stable-price score predictor (corresponding to Equation 5) and the unstable-price score predictor (Equation 4).

**The Stable-Price Score Predictor.** The stable-price score
predictor first estimates the expected prices in the rest of the game
using whatever information is available in the given game situation.
It then computes the value achieved by optimal purchases under the
estimated prices. In an economy with stable prices, this estimate will
be quite accurate--if we make the optimal purchases for the
expected price then, if the prices are near our estimates, our
performance will also be near the estimated value.

**The Unstable-Price Score Predictor.** Stable-price score
prediction does not take into account the ability of the agent to
react to changes in price as the game progresses. Suppose a given
room is often cheap but is sometimes expensive. If the agent can
first determine the price of the room, and then plan for that price,
the agent will do better than guessing the price ahead of time and
sticking to the purchases dictated by that price. The unstable price
predictor uses a model of the *distribution* of possible prices.
It repeatedly samples prices from this distribution, computes the
stable-price score prediction under the sampled price, and then takes
the average of these stable-price scores over the various price
samples. This score prediction algorithm is similar to the algorithm
used in Ginsberg's Ginsberg01 quite successful computer
bridge program where the score is predicted by sampling the possible
hands of the opponent and, for each sample, computing the score of
optimal play in the case where all players have complete information
(double dummy play). While this approach has a simple intuitive
motivation, it is clearly imperfect. The unstable-price score
predictor assumes both that future decisions are made in the presence
of complete price information, and that the agent is free to change
existing bids in auctions that have not yet closed. Both of these
assumptions are only approximately true at best. Ways of compensating
for the imperfections in score prediction were described in
Section 5.

**Buy Now or Decide Later.** The trading agent must decide what
airline tickets to buy and when to buy them. In deciding whether to
buy an airline ticket, the agent can compare the predicted score in
the situation where it owns the airline ticket with the predicted
score in the situation where it does not own the airline ticket but
may buy it later. Airline tickets tend to increase in price, so if
the agent knows that a certain ticket is needed it should buy it as
soon as possible. But whether or not a given ticket is desirable may
depend on the price of hotel rooms, which may become clearer as the
game progresses. If airline tickets did not increase in price, as was
the case in TAC-00, then they should be bought at the last possible
moment [Stone, Littman, Singh, KearnsStone
et al.2001]. To determine whether an airline ticket
should be bought now or not, one can compare the predicted score in
the situation where one has just bought the ticket at its current
price with the predicted score in the situation where the price of the
ticket is somewhat higher but has not yet been bought. It is
interesting to note that if one uses the stable-price score predictor
for both of these predictions, and the ticket is purchased in the
optimal allocation under the current price estimate, then the
predicted score for buying the ticket now will always be
higher--increasing the price of the ticket can only reduce the score.
However, the unstable-price score predictor can yield an advantage for
delaying the purchase. This advantage comes from the fact that buying
the ticket may be optimal under some prices but not optimal under
others. If the ticket has not yet been bought, then the score will be
higher for those sampled prices where the ticket should not be bought.
This corresponds to the intuition that in certain cases the purchase
should be delayed until more information is available.

Our guiding principle in the design of the agent was, to the greatest extent possible, to have the agent analytically calculate optimal actions. A key component of these calculations is the score predictor, based either on a single estimated assignment of prices or on a model of the probability distribution over assignments of prices. Both score predictors, though clearly imperfect, seem useful. Of these two predictors, only the unstable-price predictor can be used to quantitatively estimate the value of postponing a decision until more information is available. The accuracy of price estimation is clearly of central importance. Future research will undoubtedly focus on ways of improving both price modeling and score prediction based on price modeling.