textminer.clustering
Class kMeans

java.lang.Object
  |
  +--textminer.clustering.AbstractClusterer
        |
        +--textminer.clustering.kMeans
All Implemented Interfaces:
InterOutcomes

public final class kMeans
extends AbstractClusterer

The kMeans class is an implementation of k-means clustering.

The k-means clustering is an algorithm for partitioning (or clustering) N data points into k disjoint subsets s_{j} containing N_{j} data points so as to minimize the sum-of-squares error. It is comprised of a simple re-estimation procedure as follows:

These two steps are alternated until a stopping criterion is met (i.e. when there is no further change in the assignment of the data points.)
For more information, see:

Since:
0.1
Version:
TextMiner 1.1
Author:
Young-Woo Seo (ywseo@cs.cmu.edu)

Field Summary
 
Fields inherited from class textminer.clustering.AbstractClusterer
data_repository, dataset_dir, docvec_repository, num_instances, result_repository, task_alias, task_name, vectorindex
 
Fields inherited from interface textminer.core.InterOutcomes
ext_condensed_index_file, ext_corpus_stat_file, ext_dvec_file, ext_fsmethod_file, ext_index_file, ext_judgment_file, ext_lexicon_file, ext_matrix_file, ext_model_file, ext_output_file, ext_result_file, ext_termdic_file, ext_vec_index_file
 
Constructor Summary
kMeans(SubtaskClustering subtasks)
          Constructor of kMeans
 
Method Summary
 void doClustering()
          Perform kMeans
 void init(int k_size, int size_of_vector, int max_iterations)
          Initialize
 void make_results(java.lang.String filename)
          Write the clustering resutls the specified filename
 void print_clusters()
          Print the clustering results
 
Methods inherited from class textminer.clustering.AbstractClusterer
init_clusterer, log
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

kMeans

public kMeans(SubtaskClustering subtasks)
Constructor of kMeans

Parameters:
subtasks - specification for kMeans clustering
Method Detail

init

public void init(int k_size,
                 int size_of_vector,
                 int max_iterations)
Initialize

Parameters:
k_size - derised number of clusters
size_of_vector - number of components in a document vector
max_iterations - maximum number of iterations

doClustering

public void doClustering()
Perform kMeans

Specified by:
doClustering in class AbstractClusterer

print_clusters

public void print_clusters()
Print the clustering results


make_results

public void make_results(java.lang.String filename)
Write the clustering resutls the specified filename