VersaBench, June 07 2004
http://cag.csail.mit/edu/versabench
versabench@cag.csail.mit.edu
MIT Computer Science and Artificial Intelligence Laboratory

The VersaBench  suite is intended  to facilitate research  in flexible
architectures. We provide each benchmark  as a C program, although you
may rewrite  the application in  any another language  (e.g., Fortran,
StreamIt, Verilog, Assembly).  We will post different implementations,
for  each benchmark,  as  they  become available.  We  will also  post
VersaBench  results   for  different  architectures   as  they  become
available.

The VersaBench  suite consists  of fifteen benchmarks,  organized into
five  categories:  desktop  integer, desktop  floating-point,  server,
embedded-streaming, and  embedded-bit-processing.

Please follow the following guidelines for reporting results:

- Benchmarks may be compiled with the best available compiler.

- Benchmarks  may  be  rewritten  in  any  language  (e.g.,  C,  Java,
StreamIt,  Brook,  Verilog)  provided  the  new code  adheres  to  the
original  algorithms. For  example, a  recoding  of bmm  must use  the
blocked  matrix  multiply  algorithm.   The  benchmarks  may  even  be
hand-coded in assembly to suit a particular architecture.

- The Versatility  may be computed using wall  clock times (preferred)
or number of cycles, as long as the method used is clearly specified.
If using a cycle counter, you will find two timing markers 
	/*** VERSABENCH START ***/ and
	/*** VERSABENCH END ***/
that respectively indicate where  cycle-counting should begin and end.
In the  case of the  SERVER benchmarks, report  the total time  to run
twenty-four (24) instances of the benchmark.

- Real architectures are preferred, but simulators may be used.

- Although the  modeling of real  I/O is encouraged, we  recognize the
difficulty  in  doing  so  in  a  prototype  environment.  We  suggest
initializing a  region of  external DRAM with  I/O data,  and flushing
caches   so   they  are   not   primed   prior   to  the   measurement
process. Simulation  environments often  ignore system calls,  in that
they are  treated as magical  instructions that can  atomically update
memory, without  polluting the caches. Alternatively,  a deionizer may
be used  to idealize  I/O. We  went to great  lengths to  minimize the
effects of I/O in the VersaBench suite.

For some  benchmarks, we provide  multiple inputs, please use  the one
designated as the reference input (ref) for timing measurements.

The  evaluation  process affords  a  lot  of  flexibility in  how  the
benchmarks may be coded and executed. However, when reporting results,
the  details  of the  methodology  that  is  adopted must  be  clearly
described.  Some common  parameters (following  the  guidelines above)
include:

- whether a simulator is used 
- the language  and compiler  used in the  implementation (and  if any
hand-coding is done),
- whether  wall clock  times are  used,  or whether  cycles are  being
measured,
- the clock speeds that are assumed for the architecture, 
- whether  I/O  is accurately  simulated,  or  if  the I/O  costs  are
ignored,
- and the speeds  assumed for caches and external  memory, and whether
the external memory is faithfully modeled.

For questions,  technical support, or to report  results, please write
to versabench@cag.csail.mit.edu.
