Kevyn Collins-Thompson

Graduate Student
School of Computer Science (LTI)

Office: 3612B Newell-Simon Hall
Phone: +1-412-268-7296
Fax: +1-412-268-6298
Email: kct+web AT cs . cmu . edu

Mailing Address:
Carnegie Mellon University
5000 Forbes Avenue
4502 Newell Simon Hall, LTI
Pittsburgh, PA 15213-8213

I've moved to Microsoft Research. Click here for my new MSR homepage.
The page below is out-of-date and no longer being maintained.

My primary research interests involve the application of machine learning to information organization and retrieval problems. I'm also interested in text mining, statistical language modeling, natural language processing, and computer-assisted language learning. My advisor is Jamie Callan.

In my thesis work (available here), I begin by developing methods for quantifying risk at certain important points in the retrieval process by estimating the variance in retrieval model parameters using efficient resampling. Then, by exploiting connections with portfolio theory in computational finance, I develop new models, algorithms, and evaluation methods that can use variance information to account for risk. For example, I show how the reliability of current query expansion algorithms can be greatly improved by formulating query expansion as a Markowitz-type constrained optimization framework that accounts for both risk and reward objectives. (Summary available in my NIPS 2008 paper.) I also discuss applications of such risk optimization frameworks to other areas of information retrieval.

Another research area of interest is in improving search utility by incorporating user profiles into retrieval models, especially for educational tasks. My recent work also includes using simple statistical language models to predict the reading difficulty of web pages and other non-traditional documents, algorithms for novelty detection, and open-domain question answering.

Formerly I was a member of the ePaper group at Microsoft Research in Redmond, Washington. My work there addressed automatic classification, segmentation, and searching of document images. Some earlier projects include: the design and implementation of a component-based IR system, developing techniques for compression, storage, and display of large multimedia collections, and a prototype software architecture for building very fast, cache-intensive servers on SMP machines.

What's New

I've moved to Microsoft Research. Click here for my new MSR homepage.
The page below is out-of-date and no longer being maintained.

G. Frishkoff, C. Perfetti, K. Collins-Thompson. (To appear.) "Lexical Quality in the Brain: ERP evidence for robust word learning from context". Developmental Neuropsychology, 2009.

2008

M. Heilman, K. Collins-Thompson, J. Callan, M. Eskenazi, A. Juffs, L. Wilson. (To appear.) "Personalization of Reading Passages Improves Vocabulary Acquisition". International Journal of Artificial Intelligence in Education". 2008.
K. Collins-Thompson. "Estimating robust query models with convex optimization". Advances in Neural Information Processing Systems (NIPS), 2008. (pdf)
G. Frishkoff, K. Collins-Thompson, C. Perfetti, J. Callan. (2008) Measuring incremental changes in word knowledge: Experimental validation and implications for learning and assessment. Behavior Research Methods, Vol. 40, No. 4. pp. 907-925.
I've successfully defended my thesis (Aug. 27, 2008) and will be moving to Microsoft Research (Redmond) starting in September, working as a Researcher in the Adaptive Systems and Interaction Group.
M. Heilman, K. Collins-Thompson and M. Eskenazi. "An analysis of statistical models and features for reading difficulty prediction." ACL 2008 BEA Workshop on Innovative Use of NLP for Building Educational Applications. Columbus, Ohio.

2007

K. Collins-Thompson and J. Callan. "Estimation and use of uncertainty in pseudo-relevance feedback." Proceedings of the Thirtieth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2007), Amsterdam. (pdf)
K. Collins-Thompson and J. Callan. "Automatic and human scoring of word definition responses." Proceedings of the NAACL-HLT 2007 Conference. Rochester, U.S.A. pp. 476-483. (pdf)
K. Collins-Thompson. Optimization methods for query model estimation: Applying portfolio theory to mitigate risk in information retrieval. CMU DIR Group Technical Report 2007-09-03. Abstract
M. Heilman, K. Collins-Thompson, J. Callan and M. Eskenazi. "Combining lexical and grammatical features to improve readability measures for first and second language texts." Proceedings of the NAACL-HLT 2007 Conference. Rochester, U.S.A. pp. 460-467. (pdf)

2006

M. Heilman, K. Collins-Thompson, J. Callan, and M. Eskenazi. Classroom success of an Intelligent Tutoring System for lexical practice and reading comprehension. Proceedings of Interspeech 2006. Pittsburgh, U.S.A. abstract
A. Juffs, L. Wilson, M. Eskenazi, J. Callan, J. Brown, K. Collins-Thompson, M. Heilman, T. Pelletreau, and J. Sanders. (2006) "Robust learning of vocabulary: investigating the relationship between learner behaviour and the acquisition of vocabulary" (poster). The 40th Annual TESOL Convention and Exhibit (TESOL 2006).

2005

K. Collins-Thompson and J. Callan. Query expansion using random walk models. Proceedings of the Fourteenth International Conference on Information and Knowledge Management (CIKM'05). ACM. Bremen, Germany. (pdf)
K. Collins-Thompson, J. Callan. Predicting reading difficulty with statistical language models. Journal of the American Society for Information Science and Technology. Vol. 56, No. 13, 1448-1462.
K. Collins-Thompson, P. Ogilvie and J. Callan. Initial results with structured queries and language models on half a terabyte of text. Proceedings of TREC 2004, National Institute of Standards and Technology, special publication. (pdf)

2004

K. Collins-Thompson and J. Callan. A language modeling approach to predicting reading difficulty. Proceedings of HLT / NAACL 2004, Boston, USA, May 2004. (pdf)
K. Collins-Thompson and J. Callan. Information retrieval for language tutoring: an overview of the REAP project (poster description), Proceedings of SIGIR 2004, Sheffield, UK. July 2004. (pdf)
K. Collins-Thompson, E. Terra, J. Callan, and C. Clarke. The effect of document retrieval quality on factoid question-answering performance (poster description), Proceedings of SIGIR 2004, Sheffield, UK. July 2004. (pdf)
J. Zhang, A. Toth, K. Collins-Thompson, and A. Black. Prominence prediction for super-sentential prosodic modeling based on a new database, ISCA Synthesis Workshop, Pittsburgh, USA, June 2004.
E. Nyberg, T. Mitamura, J. Callan, J. Carbonell, R. Frederking, K. Collins-Thompson, L. Hiyakumoto, Y. Huang, C. Huttenhower, S. Judy, J. Ko, A. Kupsc, L. V. Lita, V. Pedro, D. Svoboda, and B. Van Durme. (2004.) "The JAVELIN question-answering system at TREC 2003: A multi-strategy approach with dynamic planning." Proceedings of the 2003 Text REtrieval Conference (TREC 2003). National Institute of Standards and Technology, special publication. (pdf)
U.S. Patent 6,735,335. M. Liu, K. Collins-Thompson, D. Lawton. Method and apparatus for discriminating between documents in batch scanned document files. May 2004.
U.S. Patent 6,687,697. K. Collins-Thompson, C. Schweizer. System and method for improved string matching under noisy channel conditions. Feb. 2004.

2003

K. Collins-Thompson, P. Ogilvie, Y. Zhang, and J. Callan. Information filtering, novelty detection, and named-page finding. In Proceedings of the 2002 Text REtrieval Conference (TREC 2002). National Institute of Standards and Technology, special publication. 107 - 118.(pdf)
E. Nyberg, T. Mitamura, J. Carbonell, J. Callan, K. Collins-Thompson, K. Czuba, M. Duggan, L. Hiyakumoto, N. Hu, Y. Huang, J. Ko, L. Lita, S. Murtagh, V. Pedro, D. Svoboda. The JAVELIN Question-Answering System. In Proceedings of TREC 2002. NIST, special publication. 128 - 137.

2002

K. Collins-Thompson, R. Nickolov (2002). A clustering-based algorithm for automatic document separation. Proceedings of the SIGIR 2002 Workshop on Information Retrieval and OCR, Tampere, Finland. (pdf)

2001

K. Collins-Thompson, C. Schweizer and S. T. Dumais (2001). Improved string matching under noisy channel conditions. Proceedings of CIKM 2001. Atlanta, USA. 357-364 (pdf)

Other Activities

Adjunct Faculty, University of Washington, Information School. Fall 2006.
Program Committee, CIKM 2008; SIGIR 2008; SIGIR 2005, 2006 Workshop on Stylistic Analysis of Text
Reviewer, ACM Transactions on Information Systems
Reviewer, IEEE Transactions on Knowledge and Data Engineering
Reviewer, SIGIR 2007, SIGIR 2005, ICML 2005, WWW 2006

DBLP Bibliography

Last updated on February 14, 2008.