./npt npt in <datafilename> [options]

where datafilename is the name of an ascii file in which 
each datapoint is given on a separated line, and each line
is a series of numbers specified by spaces or commas. 

Optionally the datafile may have the first row be a set of
non-numeric strings correspnding to attribute names.

So, for example, .csv (comma separated values) files are fine.

Options are....

ARGV              num_rows    <int> default 999999999 
     If num_rows is less than #points in dataset, will use a
     random sample of dataset containing "num_rows" points. Useful
     if you're doing experiments vs dataset size.
    
ARGV      all_equal_metric   <bool> default     TRUE 
     Do we use the plain obvious distance metric? (TRUE means
     "don't autoscale the dimensions") (FALSE means "put everything
     inside a unit cube")

ARGV                matcher <matchspec>  default     0.05 
     Specify the n-point predicate to use. The syntax is described
     below.

ARGV        thresh_ntuples <double> default        0
     Skip any n-tuples of kdnodes in which the maximum possible
     number of tuples from them are less than thresh_ntuples. 
     This argument is IGNORED if autofind is set to TRUE

ARGV                     n    <int> default        2 
     The "n" in "n-point"

ARGV              autofind   <bool> default    FALSE 
     If set to TRUE do repeated attempts at solving with successively
     smaller thresh_ntuples values, until you find one in which the
     maximum possible error is within fraction "errfrac" of the true
     count.

ARGV               errfrac <double> default     0.05 
     Only relevant if autofind is TRUE. This is "epsilon" in the paper.

ARGV               verbose <double> default        1 
     Set to 0 unless you want an animation (which'll slow it down)

ARGV                  rmin    <int> default       20 
     The leaf-list size in the kd-tree. Unclear what its general effect is,
     but 'smaller the better subject to fitting in main memory' is NOT
     the way to go. I suggest not playing with this. 20 usually is fine.

ARGV         min_rel_width <double> default   0.0001 
     kdnodes smaller than this are not split no matter how many points.
     I suggest not playing with this.

ARGV        rdraw <bool> default FALSE
     If set to true, pops up a window and does a very fast flickery
     animation of progress in the search

ARGV               winsize    <int> default      512 
       Pixels in ag window edge (size of popup window)

ARGV               binfile    <filename>   default NULL

    (The following courtesy of Nick Konidaris)
    A binfile is a /single line/ that looks like this:
    <field1> <field2> <field3> ... <fieldN>

    Where 
        <fieldi> is a parameter that you would pass to the switch matcher.  An
        example binfile looks like:

        1,2 2,3 3,4 5,6 6,10 10,100

    And it will iteratively go through each of these fields to find matches.

    AWM Notes: If people start using binfiles a lot, it would be worth making
               some pretty simple algorithmic improvements (described in
               the Gray and Moore paper) to make this go much faster
               by searching all bins at once instead of one at a time.

ARGV            rdata <string> default random.csv
     If you want to do an n-point count between two datasets (e.g. data
     and random) then this is the argument with which you can specify the
     random dataset.

ARGV             use_permute <bool> default TRUE
     You should not need to use this.  By setting it to FALSE you can get
     the same behavior that used to appear in the "compound" predicate cases.
     i.e. a set of points is counted multiple times if it matches the 
     template in multiple orderings and individual points can appear in the 
     same tuple multiple times.
     In its default (TRUE) setting, it now counts each set of points only 
     once and a set must consist of unique points.

        *** HORRIBLE WARNING FOR SETTING use_permute TO FALSE ***

         Okay there's something ugly going on. For efficiency, if the
         the code ever sees that the same dataset is used in all components
         of the format, it uses symmetry to avoid wasting time doing
         redundant counting (e.g. in 2-point it doesn't count the same
         pair twice).

         Of course, there's no such kind of symmetry with, say, DR, type
         searching.

         The warning is: MAKE SURE YOU TAKE THIS INTO ACCOUNT IF YOU
         EVER TRY TO DO COMPARISONS OF A DD vs DR vs RR COUNT.


ARGV                format <string> default   (n d's, i.e. "ddd...d" )
     If you are doing 2-point, use format dd for data vs data
                                   format dr for data vs random
                                   format rr for random vs random

     If you are doing 3-point, use (for example) format ddr 
	 for data vs data vs random etc...

     Note: the d's must be at the beginning and the r's must be at the end
     of the string.

ARGV                nweights <int> default 0
     The last nweights columns in the data set are assumed to be weightings
     on the data.  If this argument is non-zero, all npoint operations will 
     return both the counts of matching n-tuples, and the weighted counts.
     Each tuple matching the template will be counted with a weight equal to 
     the product of the weights of the points in the tuple.
     Note: the approximation methods still use the plain counts to guide the
           coarseness of the approximation.
     Caveat: The weighted npoint computation currently only works for n <= 3.
             It can be extended for higher n if necessary.
             Implementation note for future reference:  Some possibilities
             for improving this (ranging from extending the current hack to
             modifying the algorithm) are:
             1. more "corrected counting computation" and caching for higher n.
             2. force recursion until there are no sets of identical leaves
                greater than size 3.
             3. Eliminate current use of upper bounds at every step.

