Co-clustering

From ScribbleWiki: Analysis of Social Media

Jump to: navigation, search

A short summary of the following paper.

Inderjit S. Dhillon, Subramanyam Mallela, Dharmendra S. Modha: Information-theoretic co-clustering, KDD2003

In co-clustering, we are given a contigency table of a bipartite graph, the number of row cluster and column cluster; and the goal is to simultaneously find row cluster and column cluster. As its name implies, it is two-side clustering, which differs from the traditional one-side clustering.

In co-clustering, we treat row and clumn as two discrete random variables (X and Y), respectively; and the contigency table (by proper normalization) is treated as the joint distribution between X and Y (p(X,Y)). Simultaneously finding row cluster and column cluster is equivalent to find a mapping function C_X from X to \hat(X), where \hat(X) is the index for row cluster, as well as a mapping function C_Y from Y to \hat(Y), where \hat(Y) is the index for column cluster. Notice that both \hat(X) and \hat(Y) are random variable since they are the function of random variable X and Y, respectively. so, we can compute all desired distributions based on the mapping functions (e.g. the joint distribution between \hat(X) and \hat(Y), conditional distribution of X given \hat(X) and so on). In order to find a good co-cluster, the authors suggest to maximize the mutual information between \hat(X) and \hat(Y). This formulation is equivalent to find an approximate joint distribution between X and Y (q(X,Y)), which minimize the KL-Dist between p(X,Y) and q(X,Y).

The problem they define is NP-Hard. The authors suggest an greedy algorithm to gurantee a locol minimum. The algorithm alternates between updating row cocluster (while fixing column cluster) and updating column cluster (while fixing the row cluster), until a local minimum is found.

Due to the high-dimension, sparsity, and noise in the data, the authors find this two-side clustering usually performs better than one-side clustering. Another advantage of co-clustering is interprebility, - by two-side clustering, not only do we know how to find a cluster (e.g., CS documents), but also we know why they form a cluster (e.g., all these documents are talking about software, 'software', 'database', 'machine learning' etc, which themselves forms a term cluster)

Views
Personal tools
  • Log in / create account