Smoothing

Next: Discussion Up: Results Previous: Information Measures

Smoothing

We applied our smoothing method described in section 4 to both datasets in order to find out in how far the clustering of terms improves the results of the FCA-based approach. As information measure we use in this experiment the conditional probability as it performs reasonably well as shown in Section 6.2. In particular we used the following similarity measures: the cosine measure, the Jaccard coefficient, the L1 norm as well as the Jensen-Shannon and the Skew divergences (compare [37]). Table 6 shows the impact of this smoothing technique in terms of the number of object/attribute terms added to the dataset. The Skew Divergence is excluded because it did not yield any mutually similar terms. It can be observed that smoothing by mutual similarity based on the cosine measure produces the most previously unseen object/attribute pairs, followed by the Jaccard, L1 and Jensen-Shannon divergence (in this order). Table 7 shows the results for the different similarity measures. The tables in appendix A list the mutually similar terms for the different domains and similarity measures. The results show that our smoothing technique actually yields worse results on both domains and for all similarity measures used.

Table 6: Impact of Smoothing Technique in terms of new object/attribute pairs

	Baseline	Jaccard	Cosine	L1	JS
Tourism	525912	531041 (+ 5129)	534709 (+ 8797)	530695 (+ 4783)	528892 (+ 2980)
Finance	577607	599691 (+ 22084)	634954 (+ 57347)	584821 (+ 7214)	583526 (+ 5919)

Table 7: Results of Smoothing in terms of F-Measure F'

	Baseline	Jaccard	Cosine	L1	JS
Tourism	44.69%	39.54%	41.81%	41.59%	42.35%
Finance	38.85%	38.63%	36.69%	38.48%	38.66%

Next: Discussion Up: Results Previous: Information Measures

Philipp Cimiano 2005-08-04