The cff algorithm runs in two steps. In the first step, a density estimator is used to select the high density points in the data. A point is considered to be a high density point if the number of datapoints contained in a hypersphere centered around the point exceeds a threshold. In the second step, the high density points are clustered using single linkage clustering. However, the clustering stops once the distance connecting two points exceeds a user supplied parameter called epsilon. Note that when running the code, you will have to hit a key at several stages. The graphics will display the original dataset followed by the high density datapoints followed by the final clusters. ########################## # Settable Parameters # ########################## The settable parameters include the following: 1. The data file parameter is the file containing the data to be clustered. The data attributes in this file can be either command separated values or space separated values. It also accepts binary datsets. 2. The -radius argument specifies the radius of the hypersphere used in the density estimation step. Recall that the number of datapoints within this hypersphere determines whether a datapoint is high density or not. 3. The -threshold parameter determines how many datapoints need to be within the hypersphere in step one before a datapoint can be considered to be high density. For example, if the threshold value is 10, then if there are 11 datapoints enclosed within the hypersphere centered around the point under consideration (call this point P), then P is considered a high density point. However, this executable expects the threshold value to be normalized between the values of 0 and 1. This normalization occurs by taking the number of datapoints required to be enclosed in the hypersphere and dividing by the total number of datapoints. For instance, if the number of datapoints required to be inside the hypersphere is 10, and there are 100 datapoints in the entire data set, then the threshold command line value is 0.1. 4. The -epsilon parameter is used in step two to specify the maximum link length between two datapoints. If the distance between two datapoints exceeds this epsilon value, then they are not connected together. Other parameters that may be of interest: min_elements_in_cluster : Specifies the minimum number of elements that need to be connected together before this group is considered a cluster outputfile The name of the file to save the output to. The output can take several forms. See the -outputmode parameter. outputmode There are 2 values for the output mode: 1. outputmode 0 (Datapoint cluster membership) This mode outputs the data point coordinates and the cluster index 2. outputmode 1 (Cluster sizes) This mode outputs the cluster index followed by the number of elements in the cluster seed This sets the random seed for the algorithm