Office: 3612B Newell-Simon Hall
My primary research interests involve the application of machine learning to information organization and retrieval problems. I'm also interested in text mining, statistical language modeling, natural language processing, and computer-assisted language learning. My advisor is Jamie Callan.
In my thesis work (available here), I begin by developing methods for quantifying risk at certain important points in the retrieval process by estimating the variance in retrieval model parameters using efficient resampling. Then, by exploiting connections with portfolio theory in computational finance, I develop new models, algorithms, and evaluation methods that can use variance information to account for risk. For example, I show how the reliability of current query expansion algorithms can be greatly improved by formulating query expansion as a Markowitz-type constrained optimization framework that accounts for both risk and reward objectives. (Summary available in my NIPS 2008 paper.) I also discuss applications of such risk optimization frameworks to other areas of information retrieval.
Another research area of interest is in improving search utility by incorporating user profiles into retrieval models, especially for educational tasks. My recent work also includes using simple statistical language models to predict the reading difficulty of web pages and other non-traditional documents, algorithms for novelty detection, and open-domain question answering.
Formerly I was a member of the ePaper group at Microsoft Research in Redmond, Washington. My work there addressed automatic classification, segmentation, and searching of document images. Some earlier projects include: the design and implementation of a component-based IR system, developing techniques for compression, storage, and display of large multimedia collections, and a prototype software architecture for building very fast, cache-intensive servers on SMP machines.
K. Collins-Thompson. Optimization methods for query model estimation: Applying portfolio theory to mitigate risk in information retrieval. CMU DIR Group Technical Report 2007-09-03. Abstract
M. Heilman, K. Collins-Thompson, J. Callan, and M. Eskenazi. Classroom success of an Intelligent Tutoring System for lexical practice and reading comprehension. Proceedings of Interspeech 2006. Pittsburgh, U.S.A. abstract
K. Collins-Thompson and J. Callan. Query expansion using random walk models. Proceedings of the Fourteenth International Conference on Information and Knowledge Management (CIKM'05). ACM. Bremen, Germany. (pdf)
K. Collins-Thompson, J. Callan. Predicting reading difficulty with statistical language models. Journal of the American Society for Information Science and Technology. Vol. 56, No. 13, 1448-1462.
K. Collins-Thompson, P. Ogilvie and J. Callan. Initial results with structured queries and language models on half a terabyte of text. Proceedings of TREC 2004, National Institute of Standards and Technology, special publication. (pdf)
K. Collins-Thompson and J. Callan. A language modeling approach to predicting reading difficulty. Proceedings of HLT / NAACL 2004, Boston, USA, May 2004. (pdf)
K. Collins-Thompson and J. Callan. Information retrieval for language tutoring: an overview of the REAP project (poster description), Proceedings of SIGIR 2004, Sheffield, UK. July 2004. (pdf)
K. Collins-Thompson, E. Terra, J. Callan, and C. Clarke. The effect of document retrieval quality on factoid question-answering performance (poster description), Proceedings of SIGIR 2004, Sheffield, UK. July 2004. (pdf)
J. Zhang, A. Toth, K. Collins-Thompson, and A. Black. Prominence prediction for super-sentential prosodic modeling based on a new database, ISCA Synthesis Workshop, Pittsburgh, USA, June 2004.
E. Nyberg, T. Mitamura, J. Callan, J. Carbonell, R. Frederking, K. Collins-Thompson, L. Hiyakumoto, Y. Huang, C. Huttenhower, S. Judy, J. Ko, A. Kupsc, L. V. Lita, V. Pedro, D. Svoboda, and B. Van Durme. (2004.) "The JAVELIN question-answering system at TREC 2003: A multi-strategy approach with dynamic planning." Proceedings of the 2003 Text REtrieval Conference (TREC 2003). National Institute of Standards and Technology, special publication. (pdf)
U.S. Patent 6,735,335. M. Liu, K. Collins-Thompson, D. Lawton. Method and apparatus for discriminating between documents in batch scanned document files. May 2004.
U.S. Patent 6,687,697. K. Collins-Thompson, C. Schweizer. System and method for improved string matching under noisy channel conditions. Feb. 2004.
K. Collins-Thompson, P. Ogilvie, Y. Zhang, and J. Callan. Information filtering, novelty detection, and named-page finding. In Proceedings of the 2002 Text REtrieval Conference (TREC 2002). National Institute of Standards and Technology, special publication. 107 - 118.(pdf)
E. Nyberg, T. Mitamura, J. Carbonell, J. Callan, K. Collins-Thompson, K. Czuba, M. Duggan, L. Hiyakumoto, N. Hu, Y. Huang, J. Ko, L. Lita, S. Murtagh, V. Pedro, D. Svoboda. The JAVELIN Question-Answering System. In Proceedings of TREC 2002. NIST, special publication. 128 - 137.
K. Collins-Thompson, R. Nickolov (2002). A clustering-based algorithm for automatic document separation. Proceedings of the SIGIR 2002 Workshop on Information Retrieval and OCR, Tampere, Finland. (pdf)
K. Collins-Thompson, C. Schweizer and S. T. Dumais (2001). Improved string matching under noisy channel conditions. Proceedings of CIKM 2001. Atlanta, USA. 357-364 (pdf)
Last updated on February 14, 2008.