|
General
Purpose
Submissions
Program
Past Years
|
Meaning-Based Retrieval for Human Language Technologies
Matthew Bilotti
As the amount of information at our fingertips grows seemingly without bound, so
too grows the demand for a class of Human Language Technologies (HLT)
applications that facilitate searching, browsing and navigation within large
information spaces. These applications are often built around keyword-based
text retrieval systems, which have long been the de facto standard for searching
for relevant information in a large collection of documents.
Sometimes keywords are not sufficient to represent the concept being searched
for. Suppose a MEDLINE researcher, wanting to investigate the effect of heparin
on the clotting cascade, asks the question, "What does heparin inhibit?" of a QA
system built upon a keyword-based text retrieval system. The question is
converted into the keyword query 'heparin inhibit,' and the following two
documents are retrieved:
"Platelets contain several factors that inhibit heparin."
"Heparin inhibits thrombin via antithrombin III."
The application presents "platelets" and "thrombin" to the user as answers, but
only "thrombin" is correct. Why is the wrong answer returned?
What the user wanted were instances of 'inhibit' events where 'heparin' is the
agent. What the user actually received were documents containing the keywords
'inhibit' and 'heparin,' which is a superset of what the user wanted. The
system is not able to distinguish between these two because it relies on a
keyword-based representation of document meaning that is too weak to represent
the agentive relationship between 'heparin' and 'inhibit.'
If we are to improve text retrieval support for HLT applications, we must
support indexing and retrieval on the linguistic and semantic content of
interest to HLT applications. I propose a novel approach to text retrieval
problems called Meaning-Based Retrieval (MBR) in which text meaning is modeled
by instances of Meaning Types from a domain-specific Meaning Type System (MTS),
which can contain linguistic and semantic annotations in addition to keywords.
Newly-available Information Retrieval technology allows for the indexing and
retrieval of hierarchical annotations on text [3, 4]. MBR maps an arbitrary MTS
onto this technology by representing linguistic and semantic content as
annotations, and allows indexing and retrieval of text at the meaning level.
In this talk, I motivate and describe the MBR algorithm, then discuss
preliminary results from applying the MBR prototype to an experiment over the
Center for Nonproliferation Studies corpus. I compare MBR using an MTS that
supports semantic role annotations versus a keywords-only MTS using standard
precision and recall metrics.
[1] MEDLINE. http://www.nlm.nih.gov/pubs/factsheets/medline.html
[2] Medical Subject Headings (MeSH). http://www.nlm.nih.gov/mesh/
[3] Indri Retrieval System. http:// www.lemurproject.org
[4] Ogilvie and Callan. Hierarchical Language Models for Retrieval of XML
Components. In Proc. INEX 2004.
|