next up previous
Next: Conclusion Up: Learning Concept Hierarchies from Previous: Discussion


Related Work

In this section, we discuss some work related to the automatic acquisition of taxonomies. The main paradigms for learning taxonomic relations exploited in the literature are on the one hand clustering approaches based on the distributional hypothesis [32] and on the other hand approaches based on matching lexico-syntactic patterns in a corpus which convey a certain relation. One of the first works on clustering terms was the one by [34], in which nouns are grouped into classes according to the extent to which they appear in similar verb frames. In particular, he uses verbs for which the nouns appear as subjects or objects as contextual attributes. Further, he also introduces the notion of reciprocal similarity, which is equivalent to our mutual similarity. [46] also present a top-down clustering approach to build an unlabeled hierarchy of nouns. They present an entropy-based evaluation of their approach, but also show results on a linguistic decision task: i.e. which of two verbs $v$ and $v'$ is more likely to take a given noun $n$ as object. Grefenstette has also addressed the automatic construction of thesauri [30]. He presents results on different and various domains. Further, he also compares window-based and syntactic approaches, finding out that the results depend on the frequency of the words in question. In particular, he shows that for frequent words, the syntactic-based approaches are better, while for rare words the window-based approaches are preferable [31]. The work of [24] is also based on the distributional hypothesis; they present an iterative bottom-up clustering approach of nouns appearing in similar contexts. In each step, they cluster the two most similar extents of some argument position of two verbs. Interestingly, this way they not only yield a concept hierarchy, but also ontologically generalized subcategorization frames for verbs. Their method is semi-automatic in that it involves users in the validation of the clusters created in each step. The authors present the results of their system in terms of cluster accuracy in dependency of percentage of the corpus used. [8] also uses clustering methods to derive an unlabeled hierarchy of nouns by using data about conjunctions of nouns and appositions collected from the Wall Street Journal corpus. Interestingly, in a second step she also labels the abstract concepts of the hierarchy by considering the Hearst patterns (see below) in which the children of the concept in question appear as hyponyms. The most frequent hypernym is then chosen in order to label the concept. At a further step she also compresses the produced ontological tree by eliminating internal nodes without a label. The final ontological tree is then evaluated by presenting a random choice of clusters and the corresponding hypernym to three human judges for validation. [5] present an interesting framework and a corresponding workbench - Mo'K - allowing users to design conceptual clustering methods to assist them in an ontology building task. In particular they use bottom-up clustering and compare different similarity measures as well as different pruning parameters. In earlier work we used collocation statistics to learn relations between terms using a modification of the association rules extraction algorithm [41]. However, these relations were not inherently taxonomic such that the work described in this paper can not be directly compared to it. [39] examined different supervised techniques based on collocations to find the appropriate hypernym for an unknown term, reaching an accuracy of around 15% using a combination of a tree ascending algorithm and $k$-Nearest-Neighbors as well as the Skew Divergence as similarity measure. These results are neither comparable to the task at hand. Recently, [50] have presented an application of clustering techniques in the biomedical domain. They evaluate their clusters by directly comparing to the UMLS thesaurus. Their results are very low (3-17% precision depending on the corpus and clustering technique) and comparable to the results we obtained when comparing our clusters directly with our gold standards and which are not reported in this paper though. Furthermore, there is quite a lot of work related to the use of linguistic patterns to discover certain ontological relations from text. Hearst's seminal approach aimed at discovering taxonomic relations from electronic dictionaries [33]. The precision of the isa-relations learned is $61/106$ (57.55%) when measured against WordNet as gold standard. Hearst's idea has been reapplied by different researchers with either slight variations in the patterns used [36], in very specific domains [2], to acquire knowledge for anaphora resolution [48], or to discover other kinds of semantic relations such as part-of relations [10] or causation relations [27]. The approaches of Hearst and others are characterized by a (relatively) high precision in the sense that the quality of the learned relations is very high. However, these approaches suffer from a very low recall which is due to the fact that the patterns are very rare. As a possible solution to this problem, in the approach of [17,18] Hearst patterns matched in a corpus and on the Web as well as explicit information derived from other resources and heuristics are combined yielding better results compared to considering only one source of evidence on the task of learning superconcept relations. In general, to overcome such data sparseness problems, researchers are more and more resorting to the WWW as for example [44]. In their approach, Hearst patterns are searched for on the WWW by using the Google API in order to acquire background knowledge for anaphora resolution. [1], download related texts from the Web to enrich a given ontology. [13] as well as [16] have used the Google API to match Hearst-like patterns on the Web in order to (i) find the best concept for an unknown instance as well as (ii) the appropriate superconcept for a certain concept in a given ontology [20]. [60] present the OntoLearn system which discovers i) the domain concepts relevant for a certain domain, i.e. the relevant terminology, ii) named entities, iii) 'vertical' (is-a or taxonomic) relations as well as iv) certain relations between concepts based on specific syntactic relations. In their approach a 'vertical' relation is established between a term $t_1$ and a term $t_2$, i.e. is-a($t_1$,$t_2$), if $t_2$ can be gained out of $t_1$ by stripping of the latter's prenominal modifiers such as adjectives or modifying nouns. Thus, a 'vertical' relation is for example established between the term international credit card and the term credit card, i.e. is-a(international credit card,credit card). In a further paper [61], the main focus is on the task of word sense disambiguation, i.e. of finding the correct sense of a word with respect to a general ontology or lexical database. In particular, they present a novel algorithm called SSI relying on the structure of the general ontology for this purpose. Furthermore, they include an explanation component for users consisting in a gloss generation component which generates definitions for terms which were found relevant in a certain domain. [52] describe an interesting approach to automatically derive a hierarchy by considering the document a certain term appears in as context. In particular, they present a document-based definition of subsumption according to which a certain term $t_1$ is more special than a term $t_2$ if $t_2$ also appears in all the documents in which $t_1$ appears. Formal Concept Analysis can be applied for many tasks within Natural Language Processing. [49] for example, mentions several possible applications of FCA in analyzing linguistic structures, lexical semantics and lexical tuning. [56] and [47] apply FCA to yield more concise lexical inheritance hierarchies with regard to morphological features such as numerus, gender etc. [3] apply FCA to the task of learning subcategorization frames from corpora. However, to our knowledge it has not been applied before to the acquisition of domain concept hierarchies such as in the approach presented in this paper.
next up previous
Next: Conclusion Up: Learning Concept Hierarchies from Previous: Discussion
Philipp Cimiano 2005-08-04