Problem Based Benchmark Suite

Dictionary (DICT):

This benchmark measures the performance of batch inserting, deleting and searching elements with a dictionary data structure. Each element consists of a key and possibly auxiliary data and the implementation must be based on the following three user supplied functions:

  • A hash function that maps each key to an unsigned integer,
  • a comparison function on the keys defining a total order, and
  • a comparison function on the auxiliary data defining a partial order.
  • The comparison function on the auxiliary data is used to decide which element to insert when multiple elements have the same key--a maximal value with respect to the partial order must be inserted. The specific benchmark batch inserts a sequence of values, then batch deletes another sequence of values, and finally batch searches a sequence of values. All three sequences can have repeated keys. The code must not take advantage of the specific key and auxiliary types beyond the hash and comparison function.

    Input and Output File Formats

    The input and output should be in the sequence file format. For testing the input sequence needs to be partitioned into three equal length parts, the first used for insertion, the second for deletions (after the insertions), and the third for searches (after the insertions and deletions). If the input length is not divisible by 3, then the number of insertions and deletions need to be of length ⌊n/3⌋ and the remaining elements are searches. Note that if the elements consist of both a key and auxiliary data, then the auxiliary data for the deletions and for the searches is ignored. The output sequence consists of the results of all successful searches, including both the key and the auxiliary data. The output sequence should be in the same order as the searches in the input sequence and any repeated successful searches should be included at each occurrence.

    Default Input Distributions

    Each distribution should be run for n=10,000,000. The weights used for average time are given in parentheses.
    last modified 15:18, 05 Jun 2012

    This project has been funded by the following sources:
    Intel Labs Academic Research Office for the Parallel Algorithms for Non-Numeric Computing Program,
    National Science Foundation, and
    IBM Research.