The Random Shopper Model:
Predicting Preference Flips in Commerce Search

Samuel Ieong, Nina Mishra and Or Sheffet


Traditional approaches to ranking in web search follow the paradigm of rank-by-score: a learned function gives each query-URL combination an absolute score and URLs are ranked according to this score. This paradigm ensures that if the score of one URL is better than another then one will always be ranked higher than the other. Scoring contradicts prior work in behavioral economics that showed that users' preferences between two items depend not only on the items but also on the presented alternatives. Thus, for the same query, users' preference between items A and B depends on the presence/absence of item C. We propose a new model of ranking, the Random Shopper Model, that allows and explains such behavior. In this model, each feature is viewed as a Markov chain over the items to be ranked, and the goal is to find a weighting of the features that best reflects their importance. We show that our model can be learned under the empirical risk minimization framework, and give an efficient learning algorithm. Experiments on commerce search logs demonstrate that our algorithm outperforms scoring-based approaches including regression and listwise ranking.

Personal Comments

As I see it, there are three main ways to build on this work (or its spirit).

First, there's the classical route: trying to find other scenarios where one believe labels come from a combination of Markov chains, build better topologies for such problems and improve on our low accuracy rate.

Second route arises from a practical perspective: Many scenarios in ML aim at learning and predicting users' behavior. I believe that the ML community has much to gain from looking at the existing body of work in behavioral psychology and behavioral economics. This work undermines a prevailing approach in ranking -- that exists one true total ranking over all items that induces the ranking over subsets of items. I believe that many other assumptions, equally as common, can also (and should also) be altered.

Third, from a theoretical perspective, I find the problem of having non-numeric features to be very compelling (though admittedly, not very practical). In particular, suppose every day we're given k graphs and a label corresponding to whether a (known) global property holds or not for some (unknown) combination of the graphs. (E.g. all graphs are weighted and the label indicates whether the graph resulting from a certain linear combination of all input graphs has s-t cut greater than K.) Can we PAC learn the weights of such combination?