Compression speed is related to the data format and the machine type. The relationship between application performance and host machine parameters is a research topic that is outside of the scope of this paper. During the experiments, we keep using the same machine for all the compressions, and make sure that our application is the only workload. This way, we can think of compression speed as a function of compression algorithm. The compression speed is also affected by compression buffer size, but we omit this factor by using the same size of buffer, which is 16KB.
The method to get the compression speed and compression ratio is as follows. Take the same type of data, compress them using gzip and record the measured value. By data type, we mean the type of data file, for example, binary code file, postscript file, text file, JPEG file, etc. Table 1 presents the empirical data that we get using this method. For each type of data, we randomly download 100 - 110 files from the Internet, and for each data file, repeat the same compression procedure six times. From Table 1, we can see the standard deviation is quite small, which make us believe file type is a reasonable way to differentiate data when considering the general purpose compression algorithm - gzip. In our experiment, the source data is a big tar file of a collection of binary files, and we simply use the value from Table 1, indexed by data type (file type).