As already mentioned above, we evaluate our approach on two domains: tourism and finance.
The ontology for the tourism domain is the reference ontology of the
comparison study presented by [40], which was modeled by an experienced
ontology engineer. The finance ontology is basically the one
developed within the GETESS project [57]; it was designed for the
purpose of analyzing German texts on the Web, but also English labels
are available for many of the concepts. Moreover, we manually added the
English labels for those concepts whose German label has an English
counterpart with the result that most of the concepts (95%) finally yielded also an English label.8 The tourism domain ontology consists of 293 concepts, while
the finance domain ontology is bigger with a total of 1223 concepts9. Table 2 summarizes some facts about the
concept hierarchies of the ontologies, such as the total number of concepts,
the total number of leave concepts, the average and maximal length of the paths from a leave to the root node as well as the average and maximal number of children of a concept (without considering leave concepts).
Table 2:
Ontology statistics
Tourism
Finance
No. Concepts
293
1223
No. Leaves
236
861
Avg. Depth
3.99
4.57
Max. Depth
6
13
Max. Children
21
33
Avg. Children
5.26
3.5
As domain-specific text collection for the tourism domain we use texts acquired
from the above mentioned web sites, i.e. from http://www.lonelyplanet.com
as well as from http://www.all-in-all.de.
Furthermore, we also used a general corpus, the British National Corpus10.
Altogether, the corpus size was over 118 Million tokens. For the finance domain
we considered Reuters news from 1987 with over 185 Million tokens11.
Subsections