Up: Learning Concept Hierarchies from
In this section, we discuss some work related to the automatic acquisition of
taxonomies. The main paradigms for learning taxonomic relations exploited
in the literature are on the one hand clustering approaches based on the
distributional hypothesis  and on the other hand approaches
based on matching lexico-syntactic patterns in a corpus which convey a certain
One of the first works on clustering terms was the one by
, in which
nouns are grouped into classes according to the extent to which they appear
in similar verb frames. In particular, he uses verbs for which the
nouns appear as subjects or objects as contextual attributes. Further, he
also introduces the notion of reciprocal similarity, which is equivalent
to our mutual similarity. 
also present a top-down clustering approach to build an unlabeled hierarchy of nouns. They present an entropy-based evaluation of their approach, but also show
results on a linguistic decision task: i.e. which of two verbs and
is more likely to take a given noun as object.
Grefenstette has also addressed the automatic construction of thesauri . He presents results on different and various domains. Further, he also
compares window-based and syntactic approaches, finding out that the results
depend on the frequency of the words in question. In particular, he shows that
for frequent words, the syntactic-based approaches are better, while for rare
words the window-based approaches are preferable .
The work of  is also based on the distributional hypothesis; they
an iterative bottom-up clustering approach of nouns appearing in similar
contexts. In each step, they cluster the two most similar extents of some
argument position of two verbs. Interestingly, this way they not only yield
a concept hierarchy, but also ontologically generalized subcategorization
frames for verbs. Their method is semi-automatic in that it involves users
in the validation of the clusters created in each step. The authors present
the results of their system in terms of cluster accuracy in dependency of
percentage of the corpus used.  also
uses clustering methods to derive an unlabeled hierarchy of nouns by using
data about conjunctions of nouns and appositions collected from the
Wall Street Journal corpus. Interestingly, in a second step she also labels
the abstract concepts of the hierarchy by considering the Hearst patterns
(see below) in
which the children of the concept in question appear as hyponyms. The
most frequent hypernym is then chosen in order to label the concept.
At a further step she also compresses the produced ontological
tree by eliminating internal nodes without a label. The final ontological
tree is then evaluated by presenting a random choice of clusters and the
corresponding hypernym to three human judges for validation.
present an interesting framework and a corresponding workbench - Mo'K -
allowing users to design conceptual clustering methods to assist them in an
ontology building task. In particular they use bottom-up clustering
and compare different similarity measures as well as different
In earlier work we used collocation statistics to learn relations between
terms using a modification of the association rules extraction algorithm
. However, these relations were not inherently taxonomic
such that the work described in this paper can not be directly compared
to it.  examined different supervised techniques
based on collocations to find the appropriate hypernym for an unknown
term, reaching an accuracy of around 15% using a combination of
a tree ascending algorithm and -Nearest-Neighbors as well
as the Skew Divergence as similarity measure. These results
are neither comparable to the task at hand. Recently,
 have presented
an application of clustering techniques in the biomedical domain. They
evaluate their clusters by directly comparing to the UMLS thesaurus.
Their results are very low (3-17% precision depending on the corpus
and clustering technique) and comparable to the results
we obtained when comparing our clusters directly with our gold standards
and which are not reported in this paper though.
Furthermore, there is quite a lot of work related to the
use of linguistic patterns to discover certain ontological relations
from text. Hearst's seminal approach aimed
at discovering taxonomic relations from electronic dictionaries . The precision of the isa-relations learned is (57.55%)
when measured against WordNet as gold standard.
Hearst's idea has been reapplied by different researchers with either
slight variations in the patterns used , in very
specific domains , to acquire knowledge for anaphora resolution
, or to discover other kinds of semantic relations such
as part-of relations  or causation relations
The approaches of Hearst and others are characterized by a (relatively) high precision in the sense that the quality of the learned relations is very high. However, these approaches suffer from a very low recall which is due to the fact that the patterns are very rare. As a possible solution to this problem, in the approach of [17,18]
Hearst patterns matched in a corpus and on the Web as well as explicit
information derived from other resources and heuristics
are combined yielding better results compared to considering only one
source of evidence on the task of learning superconcept relations.
In general, to overcome such data
sparseness problems, researchers are more and more
resorting to the WWW as for example . In their approach,
Hearst patterns are searched for on the WWW by using the Google API in order
acquire background knowledge for anaphora resolution.
, download related texts from the Web to enrich a given
ontology.  as well as  have used the Google API
to match Hearst-like patterns on the Web in order to
(i) find the best concept for an unknown instance as well as (ii) the
appropriate superconcept for a certain concept in a given ontology .
 present the OntoLearn system which discovers
i) the domain concepts relevant for a certain domain, i.e. the relevant
terminology, ii) named entities, iii) 'vertical' (is-a or taxonomic)
relations as well as iv) certain relations between concepts based on
specific syntactic relations. In their approach a 'vertical' relation is
established between a term and a term , i.e. is-a(,), if can be gained out of by stripping of the latter's prenominal
modifiers such as adjectives or modifying nouns. Thus,
a 'vertical' relation is for example established between the term
international credit card and the term credit card, i.e.
is-a(international credit card,credit card). In a further paper
, the main focus
is on the task of word sense disambiguation, i.e.
of finding the correct sense of a word with respect to a general
ontology or lexical database. In particular, they present a novel
algorithm called SSI relying on the structure of the general
ontology for this purpose. Furthermore, they include an
explanation component for users consisting in a gloss generation
component which generates definitions for terms which were
found relevant in a certain domain.
 describe an interesting approach to
automatically derive a hierarchy by considering the document a certain term appears in as context. In particular, they present a document-based definition
of subsumption according to which a certain term is more special
than a term if also appears in all the documents in which
Formal Concept Analysis can be applied for many tasks within
Natural Language Processing. 
for example, mentions several possible applications of FCA in analyzing linguistic structures, lexical semantics and lexical tuning.
 and  apply FCA to yield more concise
lexical inheritance hierarchies with regard to morphological features such
as numerus, gender etc.
 apply FCA to the task of learning
subcategorization frames from corpora.
However, to our knowledge it has not been applied before to the acquisition
of domain concept hierarchies such as in the approach presented in this
Up: Learning Concept Hierarchies from