This definition of the lexicon acquisition problem differs from that
given by other authors, including
Riloff and Jones (1999), Siskind (1996), Manning (1993), Brent
others, as further discussed in Section 7.
Our definition of the problem makes some assumptions about the
training input. First, by
making f a function instead of a relation, the definition assumes
that the meaning for each phrase in a sentence appears once in the
representation of that sentence, the single-use assumption.
Second, by making f one-to-one, it assumes exclusivity, that
each vertex in a sentence's representation is due to only one phrase in
the sentence. Third, it assumes that a phrase's meaning is a
connected subgraph of a sentence's representation, not a more
distributed representation, the connectedness assumption. While
the first assumption may not hold for some representation languages,
it does not present a problem in the domains we have considered. The
second and third assumptions are perhaps less problematic with respect
to general language use.
Our definition also assumes compositionality: that the meaning
of a sentence is derived from the meanings of the phrases it contains,
in addition, perhaps to some ``connecting'' information specific to
the representation at hand, but is not derived from external sources
such as noise. In other words, all the vertices of a sentence's
representation are included within the meaning of some word or phrase
in that sentence. This assumption is similar to the linking rules of
Jackendoff (1990), and has been used in previous work on
grammar and language acquisition (e.g.,
While there is some debate in the linguistics community about the
ability of compositional techniques to handle all phenomena
[Fillmore1988,Goldberg1995], making this assumption simplifies
the learning process and works reasonably for the domains of interest here.
Also, since we allow multi-word
phrases in the lexicon (e.g., (``kick the bucket'',
die(_))), one objection to compositionality can be addressed.
This definition also allows training input in which:
Words and phrases have multiple meanings. That is,
homonymy might occur in the lexicon.
Several phrases map to the same meaning.
That is, synonymy might occur in the lexicon.
Some words in a sentence do not map to any meanings,
leaving them unused in the assignment of words to meanings.5
Phrases of contiguous words map to
parts of a sentence's meaning representation.
Of particular note is lexical ambiguity (1 above).
Note that we could have also derived an ambiguous lexicon
from our sample corpus. In this lexicon,
``ate'' is an ambiguous word. The earlier example
minimizes ambiguity resulting in an alternative, more intuitively
pleasing lexicon. While our problem definition first minimizes the
number of entries in the lexicon, our learning algorithm will also
exploit a preference for minimizing ambiguity.
Also note that our definition allows training input in which sentences
themselves are ambiguous (paired with more than one meaning), since a
given sentence in S (a multiset) might appear multiple times
appear with more than one meaning. In
fact, the training data that we consider in Section 5
does have some ambiguous sentences.
Our definition of the lexicon acquisition problem does not fit cleanly
into the traditional definition of learning for classification. Each
training example contains a sentence and its semantic parse, and we
are trying to extract semantic information about some of the phrases
in that sentence. So each example potentially contains information
about multiple target concepts (phrases), and we are trying to pick
out the relevant ``features,'' or vertices of the representation,
corresponding to the correct meaning of each phrase. Of course, our
assumptions of single-use, exclusivity, connectedness, and
compositionality impose additional constraints. In addition to this
``multiple examples in one'' learning scenario, we do not have access
to negative examples, nor can we derive any implicit negatives,
because of the possibility of ambiguous and synonymous phrases.
In some ways the problem is related to clustering, which is also
capable of learning multiple, potentially non-disjoint categories.
However, it is not clear how a
clustering system could be made to learn the phrase-meaning
mappings needed for parsing.
Finally, current systems that learn multiple concepts commonly use
examples for other concepts as negative examples of the concept
currently being learned. The implicit assumption made by doing this
is that concepts are disjoint, an unwarranted assumption in the presence of
Next:The WOLFIE Algorithm and Up:The Lexicon Acquisition Problem Previous:Formal DefinitionCindi Thompson 2003-01-02