MATLAB code for methods and experiments described in:

A Very Fast Method for Clustering Big Text Datasets,
Frank Lin and William W. Cohen. ECAI 2010, Lisbon, Portugal. 

Noteworthy functions and scripts:

pic_kernel.m
- A function that runs PIC with the specified similarity kernel

x_randcats_list.m
- A function that creates a randomly generated list of datasets where each dataset have clusters of roughly equal size

x_cluster_pairs.m
- A function that runs clustering experiments using a list of category pairs (or tuples)

x_cluster_ecai2010_all.m
- The script that runs all clustering experiments described in the paper

Contact: Frank Lin (frank@cs.cmu.edu)