next up previous
Next: Artificial Data Up: A Database Query Application Previous: A Larger Corpus

LICS versus Fracturing

One component of the algorithm not yet evaluated explicitly is the candidate generation method. As mentioned in Section 4.1, we could use fractures of representations of sentences in which a phrase appears to generate the candidate meanings for that phrase, instead of LICS. We used this approach and compared it to the previously described method of using the largest isomorphic connected subgraphs of sampled pairs of representations as candidate meanings. To attempt a more fair comparison, we also sampled representations for fracturing, using the same number of source representations as the number of pairs sampled for LICS. The accuracy of CHILL when using the resulting learned lexicons as background knowledge are shown in Figure 13. Using fracturing (fractWOLFIE) shows little or no advantage; none of the differences between the two systems are statistically significant.
  
Figure 13: Fracturing vs. LICS: Accuracy
\begin{figure}\centerline{\epsfxsize=4.5in
\epsfbox{frac-acc0.ps}}
\end{figure}

In addition, the number of initial candidate lexicon entries from which to choose is much larger for fracturing than our LICS method, as shown in Figure 14. This is true even though we sampled the same number of representations as pairs for LICS, because there are a larger number of fractures for an arbitrary representation than the number of LICS for an arbitrary pair.
  
Figure 14: Fracturing vs. LICS: Number of Candidates
\begin{figure}\centerline{\epsfxsize=4.5in
\epsfbox{frac-ncand.ps}}
\end{figure}

Finally, WOLFIE's learning time when using fracturing is greater than that when using LICS, as shown in Figure 15, where the CPU time is shown in seconds.
  
Figure 15: Fracturing vs. LICS: Learning Time
\begin{figure}\centerline{\epsfxsize=4.5in
\epsfbox{fract-time.ps}}
\end{figure}

In summary, these differences show the utility of LICS as a method for generating candidates: a more thorough method does not result in better performance, and also results in longer learning times. One could claim that we are handicapping fracturing since we are only sampling representations for fracturing. This may indeed help the accuracy, but the learning time and the number of candidates would likely suffer even further. In a domain with larger representations, the differences in learning time would be even more dramatic.
next up previous
Next: Artificial Data Up: A Database Query Application Previous: A Larger Corpus
Cindi Thompson
2003-01-02