Download slides here (gzipped tar archive of PostScript files).
The talk generally addresses the largely-neglected theoretical issues raised by common marketing questions. The segmentation part is based on the STOC '98 paper ``Segmentation problems'', by Jon Kleinberg, Christos Papadimitriou, and Prabhakar Raghavan. The collaborative filtering work is based on joint work with Ravi Kumar, Sridhar Rajagopalan, and Andrew Tomkins.
The first category of marketing questions is customer segmentation. This is a general class of problems involving finding a selection of items that maximizes the sum over customers of the cost of the best item for the customer.
A specific, applied example is the problem of catalog segmentation. Suppose that we have a set of items I, and we know that each particular customer x would like to see in a catalog a subset Sx of I. We can only publish and distribute two catalogs C0 and C1 (subsets of I with fixed cardinality), but we would like to maximize the happiness of our customers. The happiness H(x) of customer x is the maximum of the size of C0 intersect Sx and the size of C1 intersect Sx (since we will mail the maximizing catalog to x). We want to maximize the sum over x of H(x).
The catalog segmentation problem, like almost any other segmentation problem thus far considered, is MAXSNP-complete. There is a simple 2-approximation: Publish only the best catalog for everybody (which can be determined using the greedy method). Nothing better is known for the general case, but it is natural to hope for something better than the naive 2-approximation. (For the dense case, where every customer likes at least a constant fraction of the products, there is a FPTAS for a fixed number of catalogs.) More generally, one might publish k catalogs (the one-catalog algorithm gives a k-approximation), or one might give a cost for publishing each additional catalog.
Another segmentation problem is maximum spanning tree segmentation. As a question from the audience pointed out, this is not necessarily related to any serious marketing problem, but it illustrates the breadth of segmentation problems that have yet to be considered. Here customers have their own sets of edge weights, and we would like to choose two spanning trees so that the sum over customers of the weight of the better of the two spanning trees for the customer's weights.
The other category of problems inspired by marketing questions is collaborative filtering. Here we would like to recommend items based on what customers have purchased. For each customer we have a sample of items that customer frequently buys (milk, orange juice, toast, bananas), and given this data from all customers we would like to recommend to each customer something new to consider (maybe oats for our breakfast-eater).
Even the proper model for collaborative filtering is not obvious, but a natural model is that the items are grouped into several clusters. Each customer is a distribution on clusters, and we want to identify which clusters a customer prefers based on the history.
These marketing-inspired problems - customer segmentation and collaborative filtering - represent important and very open territory for additional algorithmic research.