Next: Comparison Up: Learning Concept Hierarchies from Previous: Evaluation

Results

As already mentioned above, we evaluate our approach on two domains: tourism and finance. The ontology for the tourism domain is the reference ontology of the comparison study presented by [40], which was modeled by an experienced ontology engineer. The finance ontology is basically the one developed within the GETESS project [57]; it was designed for the purpose of analyzing German texts on the Web, but also English labels are available for many of the concepts. Moreover, we manually added the English labels for those concepts whose German label has an English counterpart with the result that most of the concepts (

95%) finally yielded also an English label.⁸ The tourism domain ontology consists of 293 concepts, while the finance domain ontology is bigger with a total of 1223 concepts⁹. Table 2 summarizes some facts about the concept hierarchies of the ontologies, such as the total number of concepts, the total number of leave concepts, the average and maximal length of the paths from a leave to the root node as well as the average and maximal number of children of a concept (without considering leave concepts).

Table 2: Ontology statistics

	Tourism	Finance
No. Concepts	293	1223
No. Leaves	236	861
Avg. Depth	3.99	4.57
Max. Depth	6	13
Max. Children	21	33
Avg. Children	5.26	3.5

As domain-specific text collection for the tourism domain we use texts acquired from the above mentioned web sites, i.e. from http://www.lonelyplanet.com as well as from http://www.all-in-all.de. Furthermore, we also used a general corpus, the British National Corpus¹⁰. Altogether, the corpus size was over 118 Million tokens. For the finance domain we considered Reuters news from 1987 with over 185 Million tokens¹¹.

Subsections

Next: Comparison Up: Learning Concept Hierarchies from Previous: Evaluation

Philipp Cimiano 2005-08-04