Our Solution: WOLFIE

Next: Candidate Generation Phase Up: The WOLFIE Algorithm and Previous: Solving the Lexicon Acquisition

Our Solution: WOLFIE

The above analysis leads us to believe that the Lexicon Acquisition Problem is computationally intractable. Therefore, we can not perform an efficient search for the best lexicon. Nor can we use a standard induction algorithm. Therefore, we have implemented WOLFIE⁷, outlined in Figure 6, which finds an approximate solution to the Lexicon Acquisition Problem. Our approach is to generate a set of candidate lexicon entries, from which the final learned lexicon is derived by greedily choosing the ``best'' lexicon item at each point, in the hopes of finding a final (minimal) covering lexicon. We do not actually learn interpretation functions, so do not guarantee that we will find a covering lexicon.⁸ Even if we were to search for interpretation functions, using a greedy search would also not guarantee covering the input, and of course it also does not guarantee that a minimal lexicon is found. However, we will later present experimental results demonstrating that our greedy approach performs well.

**Figure 6:** WOLFIE Algorithm Overview
$\begin{figure}\hrulefill \begin{small} \begin{tabbing} nn\=nn\=nn\=nn\=nn\=nn... ...phrase, meaning) pairs. \end{tabbing} \end{small} \hrulefill \end{figure}$

WOLFIE first derives an initial set of candidate meanings for each phrase. The algorithm for generating candidates, LICS, attempts to find a ``maximally common'' meaning for each phrase, which biases toward both finding a small lexicon by covering many vertices of a tree at once, and finding a lexicon that actually does cover the input. Second, WOLFIE chooses final lexicon entries from this candidate set, one at a time, updating the candidate set as it goes, taking into account our assumptions of single-use, connectedness, and exclusivity. The basic scheme for choosing entries from the candidate set is to maximize the prediction of meanings given phrases, but also to find general meanings. This adds a tension between LICS, which cover many vertices, and generality, which biases towards fewer vertices. However, generality, like LICS, helps lead to a small lexicon since a general meaning will more likely apply widely across a corpus. Let us explain the algorithm in further detail by way of an example, using Spanish instead of English to illustrate the difficulty somewhat more clearly. Consider the following corpus:

1.: ¿ Cuál es el capital del estado con la población más grande?
answer(C, (capital(S,C), largest(P, (state(S), population(S,P))))).
2.: ¿ Cuál es la punta más alta del estado con la area más grande?
answer(P, (high_point(S,P), largest(A, (state(S), area(S,A))))).
3.: ¿ En que estado se encuentra Texarkana?
answer(S, (state(S), eq(C,cityid(texarkana,_)), loc(C,S))).
4.: ¿ Qué capital es la más grande?
answer(A, largest(A, capital(A))).
5.: ¿ Qué es la area de los estados unitos?
answer(A, (area(C,A), eq(C,countryid(usa)))).
6.: ¿ Cuál es la población de un estado que bordean a Utah?
answer(P, (population(S,P), state(S), next_to(S,M), eq(M,stateid(utah)))).
7.: ¿ Qué es la punta más alta del estado con la capital Madison?
answer(C, (high_point(B,C), loc(C,B), state(B), capital(B,A), eq(A,cityid(madison,_)))).

The sentence representations here are slightly different than the tree representations given in the problem definition, with the main difference being the addition of existentially quantified variables shared between some leaves of a representation tree. As mentioned in Section 2.1, the representations are Prolog queries to a database. Given such a query, we can create a tree that conforms to our formalism, but with this addition of quantified variables. An example is shown in Figure 7 for the

**Figure 7:** Tree with Variables
$\includegraphics[height=3in,angle=-90]{answer.eps}$

representation of the third sentence. Each vertex is a predicate name and its arity, in the Prolog style, e.g., state/1, with quantified variables at some of the leaves. For each outgoing edge (n,m) of a vertex n, the edge is labeled with the argument position filled by the subtree rooted by m. If there is not an edge labeled with a given argument position, the argument is a free variable. Each vertex labeled with a variable (which can occur only at leaves) is an existentially quantified variable whose scope is the entire tree (or query). The learned lexicon, however, does not need to maintain the identity between variables across distinct lexical entries. Another representation difference is that we will strip the answer predicate from the input to our learner,⁹ thus allowing a forest of directed trees as input rather than a single tree. The definition of the problem easily extends such that the root of each tree in the forest must be in the domain of some interpretation function. Evaluation of our system using this representation is given in Section 5.1; evaluation using a representation without variables or forests is presented in Section 5.2. We previously [Thompson1995] presented results demonstrating learning representations of a different form, that of a case-role representation [Fillmore1968] augmented with Conceptual Dependency [Schank1975] information. This last representation conforms directly to our problem definition. Now, continuing with the example of solving the Lexicon Acquisition Problem for this corpus, let us also assume for simplification, although not required, that sentences are stripped of phrases that we know have empty meanings (e.g., ``qué'', ``es'', ``con'', and ``la''). We will similarly assume that it is known that some phrases refer directly to given database constants (e.g., location names), and remove those phrases and their meaning from the training input.

Next: Candidate Generation Phase Up: The WOLFIE Algorithm and Previous: Solving the Lexicon Acquisition

Cindi Thompson
2003-01-02