Home | Research | Publications | CV

Research

 

My general area of interest is the application of statistical machine learning techniques to real-world problems. Currently, my research focuses on advancing the state of the art in Question Answering (QA), the task of retrieving accurate answers to natural language questions (e.g. "Who invented the computer?") from information sources.

My early work led to a flexible and extensible QA architecture that supports the integration of multiple search and answer generation strategies, and that has served as a test bed for the development of new QA algorithms. I am the primary author of the Ephyra QA system, which has been evaluated in the Text REtrieval Conference (TREC), an annual workshop organized by NIST that has been the main evaluation forum for English QA research. Ephyra has been released as open source software to the QA community and is now used by researchers all over the world. The open source release, OpenEphyra, lowers the barrier to entry for QA research and facilitates evaluations and comparisons of different algorithms by providing a common platform for experiments. The current system combines a statistical pattern learning and matching approach with answer-type based extraction techniques and a semantic extractor that is based on semantic role labeling. Please take a look at the Ephyra website for more information about this project, or visit the SourceForge project site to download the latest release.

Currently I am working on a statistical approach for automatically expanding document collections with related information from large, unstructured sources to improve their coverage of relevant knowledge and add paraphrases of the information that is already present in the documents. A QA system that uses the expanded text collections as knowledge sources benefits from more relevant search results and additional supporting evidence for identifying correct answers. The source expansion approach provides a principled way of building large, local stores of relevant information. Source expansion also has applications in other natural language processing tasks beyond QA, such as machine reading, where extended source material can facilitate automatic knowledge extraction.

Over the past three years, I have been collaborating with the DeepQA group at IBM Research on Watson, an open-domain question answering system that won against the best human players in the Jeopardy! TV show. My contributions to Watson are described in this Science Newsmaker Interview.

 
Home | Research | Publications | CV