Kevyn Collins-Thompson


Graduate Student
School of Computer Science (LTI)

Office: 3612B Newell-Simon Hall
Phone: +1-412-268-7296
Fax: +1-412-268-6298
Email: kct+web AT cs . cmu . edu

Mailing Address:
Carnegie Mellon University
5000 Forbes Avenue
4502 Newell Simon Hall, LTI
Pittsburgh, PA 15213-8213


I've moved to Microsoft Research. Click here for my new MSR homepage.
The page below is out-of-date and no longer being maintained.

My primary research interests involve the application of machine learning to information organization and retrieval problems. I'm also interested in text mining, statistical language modeling, natural language processing, and computer-assisted language learning. My advisor is Jamie Callan.

In my thesis work (available here), I begin by developing methods for quantifying risk at certain important points in the retrieval process by estimating the variance in retrieval model parameters using efficient resampling. Then, by exploiting connections with portfolio theory in computational finance, I develop new models, algorithms, and evaluation methods that can use variance information to account for risk. For example, I show how the reliability of current query expansion algorithms can be greatly improved by formulating query expansion as a Markowitz-type constrained optimization framework that accounts for both risk and reward objectives. (Summary available in my NIPS 2008 paper.) I also discuss applications of such risk optimization frameworks to other areas of information retrieval.

Another research area of interest is in improving search utility by incorporating user profiles into retrieval models, especially for educational tasks. My recent work also includes using simple statistical language models to predict the reading difficulty of web pages and other non-traditional documents, algorithms for novelty detection, and open-domain question answering.

Formerly I was a member of the ePaper group at Microsoft Research in Redmond, Washington.  My work there addressed automatic classification, segmentation, and searching of document images.  Some earlier projects include: the design and implementation of a component-based IR system, developing techniques for compression, storage, and display of large multimedia collections, and a prototype software architecture for building very fast, cache-intensive servers on SMP machines.

