Open-Domain Textual Question Answering
Sanda Harabagiu and
Dan Moldovan
Department of Computer Science and Engineering,
Southern Methodist University
Brief Description
Question Answering (QA) is a fast growing area of research and commercial
interest. The problem of QA is to find answers to open-domain
questions by searching a large collection of documents. Unlike
Internet search engines, QA systems provide short, relevant answers to
questions.
The recent explosion of information available on the World
Wide Web makes question answering a compelling
framework for finding information that closely matches user needs. The
success of QA services, like AskJeeves serves as proof of the
popularity of this technique. Due to the fact that both questions and
answers are expressed in natural language, QA
methodologies deal with language ambiguities and incorporate NLP
techniques. Several current NLP-based technologies are able to provide
the framework that approximates the complex problem of answering
questions from large collections of texts.
Ideal QA systems should have good dialog understanding, rich
knowledge bases and quality text mining methods. They will certainly
incorporate common sense reasoning methods and use good approximations
of world knowledge. Until we have these more advanced tools, we can
approximate QA with NLP enhancements of IR and IE techniques.
The tutorial presents the recent results in QA research and system
implementations.
Detailed Outline
- Introduction
- Problem definition
- Examples of questions and answers
- QA taxonomies
- QA system architectures
- Survey the most important system architecture features in
TREC-8 QA (20 systems) and TREC-9 QA (28 systems)
- Present a generic QA system architecture
- Basic QA
- Question processing
- Document retrieval
- Answer extraction
- Answer ranking
- Accuracy performance
- Advanced QA
- Keyword selection
- Paragraph indexing
- Logic prover for answer extraction
- Answer correctness
- An introduction to answer fusion from several documents
- Interactive Q/A through Dialog
- Time performance
- Open issues in QA
- Briefly survey current research issues in QA such as
multilinguality, context, knowledge acquisition for ontology
construction that will be incorporated into the future QA
systems.
- Concluding remarks
Motivation
Research in the area of open-domain Question Answering generates
considerable interest from both the NLP community and the end-users of
this technology. In 1999, for the first time, National Institute of
Standards and Technology (NIST) has introduced a QA track as part of
the already established TREC competition. In 1999 there were 20
participants in the QA competition and in 2000 the number increased to
28. The participants include university research groups, national
research laboratories and small and large companies. The interest in
QA is world wide as evidenced by the international participation in
the TREC QA.
Open-domain QA is a complex application that encompasses many aspects
of NLP and AI. The current state of the art QA systems can produce
answers only to simple questions. However, the complexity of QA
systems increases from year to year.
This increase in complexity is paralleled by a sustained QA research
activity.