Approximation via Sampling

Next: The Full Approach Up: Simplified Abstraction Previous: Approximate Bidding

Approximation via Sampling

Even assuming that $\mbox{opt}(H,i,Y)$ can be solved in unit time, a literal interpretation of Equation 4 says we'll need to solve this optimization problem for an exponential number of cost vectors (or even more if the probability distributions $\Pr(y_j)$ are continuous). kearns99b kearns99b showed that values of partially observable Markov decision processes could be estimated accurately by sampling trajectories instead of exactly computing sums. littman00 littman00 did the same for stochastic satisfiability expressions. Applying this idea to Equation 4 leads to the following algorithm.

Generate a set of vectors of closing costs according to the product distribution $\Pr(y_i)\times\cdots\times\Pr(y_{n-1})$ .
For each of these samples, calculate $\mbox{opt}(H',i+1,Y)$ as defined above and average the results, resulting in the approximation

$\begin{displaymath} \mbox{value-est}_s(i,H) = \max_{r_i} \sum_{Y\in S} (v(\mbox{opt}(H',i+1,Y)+H')-\mbox{opt}(H',i+1,Y) \cdot Y)/\vert S\vert. \end{displaymath}$ (6)

This expression converges to value-est with increasing sample size.

A remaining challenge in evaluating Equation 6 is computing the real-valued bid that maximizes the value. Note that we want to buy item precisely at those closing prices for which the value of having the item (minus its cost) exceeds the value of not having the item; this maximizes profit. Thus, to make a positive profit, we are willing to pay up to, but not more than, the difference in value of having the item and not having the item.

Formally, let be the vector of current holdings and be the holdings modified to reflect winning item . Let $G_w(Y) = \mbox{opt}(H_w,i+1,Y)$ , the optimal set of purchases assuming item was won, and $G(Y) = \mbox{opt}(H,i+1,Y)$ the optimal set of purchases assuming otherwise (except in cases of ambiguity, we write simply and for and respectively). We want to select to achieve the equivalence

$\begin{displaymath} r_i \ge y_i \equiv \sum_{Y\in S} (v(G_w+H) - G_w \cdot Y) /... ...t - y_i \ge \sum_{Y\in S} (v(G+H) - G \cdot Y) / \vert S\vert. \end{displaymath}$

(7)

Setting

$\begin{displaymath} r_i = \sum_{Y\in S} ([v(G_w+H) - G_w \cdot Y] - [v(G+H) - G \cdot Y]) / \vert S\vert. \end{displaymath}$

(8)

achieves the equivalence desired in Equation 7, as can be verified by substitution, and therefore bidding the average difference between holding and not holding the item maximizes the value.⁴

Next: The Full Approach Up: Simplified Abstraction Previous: Approximate Bidding

Peter Stone 2003-09-24