## exptSeq Data Generator:

`exptSeq -t {int,double} <n> <filename>`

This generator creates a sequence of `n` values with repeats that are
distributed in an exponential distribution and
outputs the in the **sequence file
format**.

In particular it will
first generate `n` possible values `v`_{1},
`v`_{2}, ..., `v`_{n} uniformly at
random from a given range (depending on the type) and then among those
it will pick the `i`^{th} value with probability
`(1/(i ln n))`. The purpose of the distribution is to test
codes on inputs with a varying number of duplicates, and with some
values highly duplicated (e.g., approximately a 1/(ln n) fraction of
the elements will have value `v`_{1}).

The generator supports both double-precision floating-point values and
integers. The integer version selects the `n` possible values
uniformly at random from 0 up to the maximum possible value for a twos
complement 32-bit integer (2,147,483,647). The double precision
version selects the `n` possible values uniformly at random in
the range [0:1].

last modified 15:18, 05 Jun 2012

This project has been funded by the following sources:

Intel Labs Academic Research Office for the Parallel Algorithms for Non-Numeric Computing Program,

National Science Foundation, and

IBM Research.