Given a large dataset of users' ratings of movies, what is the best model to accurately predict which movies a person will like? And how can we prevent spammers from tricking our algorithms into suggesting a bad movie? Is it possible to infer structure between movies simultaneously?
In this paper we describe a unified Bayesian approach to Collaborative Filtering that accomplishes all of these goals. It models the discrete structure of ratings and is flexible to the often non-Gaussian shape of the distribution. Additionally, our method finds a co-clustering of the users and items, which improves the model's accuracy and makes the model robust to fraud. We offer three main contributions: (1) We provide a novel model and Gibbs sampling algorithm that accurately models the quirks of real world ratings, such as convex ratings distributions. (2) We provide proof of our model's robustness to spam and anomalous behavior. (3) We use several real world datasets to demonstrate the model's effectiveness in accurately predicting user's ratings, avoiding prediction skew in the face of injected spam, and finding interesting patterns in real world ratings data.
This is work to be presented at WWW 2014 and done in collaboration with Kenton Murray, Christos Faloutsos, and Alex Smola.
maraujo [atsymbol] cs.cmu.edu