Hybrid Implementation of Error Diffusion Dithering

A 8-bit RGB image at various levels of dithering per color channel. (From left - no dithering, 2 levels of dithering) More Images



Many image filtering operations provide ample parallelism, but progressive non-linear processing of images is among the hardest to parallelize due to long, sequential, and non-linear data dependency. A typical example of such an operation is error diffusion dithering, exemplified by the Floyd-Steinberg algorithm. In this paper, we present its parallelization on multicore CPUs using a block-based approach and on the GPU using a pixel based approach. We also present a hybrid approach in which the CPU and the GPU operate in parallel during the computation. High Performance Computing has traditionally been associated with high end CPUs and GPUs. Our focus is on everyday computers such as laptops and desktops, where significant compute power is available on the GPU as on the CPU. Our implementation can dither an 8K X 8K image on an off-the-shelf laptop with an Nvidia 8600M GPU in about 400 milliseconds when the sequential implementation on its CPU took about 4 seconds.

Accepted in HiPC 2011

Aditya Deshpande, Ishan Misra, and P J Narayanan. Hybrid Implementation of Error Diffusion Dithering, Proceedings of International Conference on High Performance Computing (HiPC), 2011. [PDF] [BibTeX]

Copyright notice