Thesis Committee:
Jamie Callan (Carnegie Mellon University,
Chair)
Jaime Carbonell (Carnegie Mellon
University)
Thomas Minka (Microsoft Research Cambridge)
Yiming Yang (Carnegie Mellon
University)
Stephen Robertson (Microsoft Research
Cambridge)
Abstract
A personal information filtering system monitors an incoming document stream to find the documents that match information needs specified by user profiles. The most challenging aspect in adaptive filtering is to develop a system to learn user profiles efficiently and effectively from very limited user supervision.
In order to overcome this challenge, the system needs to do the following: use robust learning algorithm that can work reasonably well when the amount of training data is small and more effective with more training data; explore what user likes while satisfying a user immediate information need and trade off exploration and exploitation; consider many aspects of a document besides relevance, such as novelty, readability and authority; use multiple forms of evidence, such as user context and implicit feedback from the user, while interacting with a user; and handle various scenarios, such as missing data, robustly.
This dissertation uses the Bayesian graphical modeling approach as a unified framework for filtering. We customize the framework to the filtering domain and develop a set of novel solutions that enable us to build a filtering system with the above desired characteristics in a principled way. We evaluate and justify these solutions on a large and diverse set of standard evaluation data. We also carry out a user study with a real web based personal news filtering system and evaluate the proposed work on the new data set collected in the user study.