Information Measures

As already anticipated in Section 4, the different information measures are also subject of our analysis. Table 5 presents the best results for the different clustering approaches and information measures. It can be concluded from these results that using the PMI or Resnik measures produces worse results on the tourism dataset, while yielding only slightly better results on the finance dataset for the FCA-based approach. It is also interesting to observe that compared to the FCA-based approach, the other clustering approaches are much more sensitive to the information measure used. Overall, the use of the Conditional information measure seems a reasonable choice.

Table 5: Comparison of results for different information measures in terms of F'
  Conditional PMI Resnik
Tourism 44.69% 44.51% 43.31%
Finance 38.85% 38.96% 38.87 %
Complete Linkage
Tourism 36.85% 27.56% 23.52%
Finance 33.35% 22.29% 22.96%
Average Linkage
Tourism 36.55% 26.90% 23.93%
Finance 32.92% 23.78% 23.26%
Single Linkage
Tourism 38.57% 30.73% 28.63%
Finance 32.15% 25.47% 23.46%
Tourism 36.42% 27.32% 29.33%
Finance 32.77% 26.52% 24.00%

