15-853: Algorithms in the Real World (Guy Blelloch)
Entropy and Conditional Entropy
Huffman and Shannon-Fano Coding
Applications of Probability Coding
PPM (Prediction by partial matching)
The Lempel-Ziv Algorithms
LZ77, LZSS and GZIP
LZ78, LZW, Unix compress, and the GIF format
Other Lossless Algorithms
Scalar and vector quantization
JPEG and MPEG
Introduction to Compression
). A draft of the data compression chapter I'm writing for an eventual book.
Recommended Text Books
Data Compression: The Complete Reference
. Springer Verlag, 1998.
Goes through a wide variety of topics and a huge number of specific "real world" algorithms. The theory is not as strong as Sayood's book (below), and the algorithms are sometimes not described in enough depth to implement them, but the number of algorithms covered is impressive, including Burrows-Wheeler, ABC, and about a dozen variants of Lempel-Ziv.
Introduction to Data Compression, Second Edition
Morgan Kaufmann, 2000.
Looks at both Theoretical and practical aspects of data compression. Discusses a reasonably wide range of lossless and lossy compression methods, including fractals, wavelets, and subband coding. The coverage of the most recent best algorithms for text compression is not as good as Salomon's book (above). For example it does not cover PPM, Burrows-Wheeler, ACB, and some of the variants of LZ77 and LZ78 (e.g. gzip). Source code for many of the algorithms is supposed to be
, but the connection timed out last time I tried.
Timothy C. Bell, John G. Cleary, and Ian H. Witten.
Prentice Hall, 1990.
Darrel Hankersson, Greg A. Harris, and Peter D. Johnson Jr..
Introduction to Information Theory and Data Compression
. CRC Press, 1997.
Jerry Gibson, Toby Berger, Tom Lookabaugh, Rich Baker and David Lindbergh.
Digital Compression for Multimedia: Principles & Standards
. Morgan Kaufmann, 1998.
Gilbert Held and Thomas R. Marshall.
Data and Image Compression: Tools and Techniques
Wiley 1996 (4th ed.)
This book is quite basic and does not cover many important topics. On the other hand, it includes source code and a detailed description of most of the basic algorithms.
The Data Compression Book
M&T Books, 1995.
James A. Storer (Editor).
Image and Text Compression.
This a collection of articles on a broad set of topics.
Ian H. Witten, Alistair Moffat, and Timothy C. Bell.
Managing Gigabytes: Compressing and Indexing Documents and Images (Second Edition).
Van Nostrand Reinhold, 1999.
Archive Comparison Test (ACT)
is an excellent collection of up-to-date comparisons of many compression algorithms with both compression ratios, and run times. The overall winning implementations tend to be based on the the
Burrows-Wheeler block sorting algorithm
), variants of PPM (e.g. BOA), or
of various compression codes for text and black and white images. Shows progression of the state-of-the-art over the years. The data is somewhat out of date (e.g. the best bpc for the Calgary Corpus is now around 2).
Further Readings and Links
Overviews and Surveys
Introduction to data compression
by Peter Gutmann.
Overview of Huffman, Shannon-Fano, Arithmetic, LZ78, LZW, and LZ77 coding techniques.
by Debra Lelewer and Daniel S. Hirshberg.
This is a more detained overview. Originally appeared in
19,3 (1987) 261-297.
Image Compression - from DCT to Wavelets : A Review
by Subhasis Saha.
This has little technical content, but gives a broad background of the various techniques.
. This is surely the best source of information on compression available on the web.
The Data Compression Library
. A nice reference on various topics on compression with many pointers to textbooks, course pages and other links on the topics.
(comprehensive, but badly organized).
Bookmarks on Source Coding/Data Compression
Entropy and Information Theory
Entropy on the World Wide Web
LZ78 and LZW
LZW and GIF explained
by Steve Blackstock. A nice 200 line description of Lempel-Ziv Welch compression along with the particular implementation used in the GIF standard.
Terry A. Welch. A Technique for High Performance Data Compression. IEEE Computer, Vol. 17, No. 6, 1984, pp. 8-19.
The GIF format, which uses LZW, is described in
GIF87(5) - GIF 87
GIF89a(5) - GIF 89a
The GIF Licensing Controversy.
The GIF format is patented by ComputServe and the LZW algorithm used by the format is patented by Unisys.
LZ77 (sliding window compression)
E. Fiala and D. Greene. Data compression with finite windows (
A.D. Wyner and J. Ziv. The sliding-window Lempel-Ziv algorithm is asymptotically optimal. (
The PPM algorithm.
Implementing the PPM data compression scheme
has pointers to several papers on PPM.
The Burrows-Wheeler algorithm
Data Compression with the Burrows-Wheeler Transform
by Mark Nelson. A nice quick overview.
The Burrows-Wheeler block sorting algorithm
off of the compression faq.
, an implementation that uses Burrows-Wheeler.
JPEG and MPEG.
The JPEG2000 page
. This is the new JPEG standard, which uses Wavelet transforms instead of the Cosine transform.
JPEG 2000 Still Image Coding System: An Overview
. An good paper on the topic.
. A nice description of the JPEG and MPEG compression algorithms.
FAQS from faqs.org.
MPEG Starting Points and FAQs
MPEG Audio Layer-3
for MPEG Audio
Audio Compression References
lossless audio compression
Introduction to fractal compression
off of the compression faq.
What is the state of fractal compression?
to various resources organized by Yuval Fisher. He also has a
on the topic.
Amara's Wavelet Page
. This page is comprehensive including an online
Introduction to Wavelets
A company that sells products based on wavelet compression and has a nice short overview of Wavelet theory off of their home page.
Another company that sells products based on wavelet compression, with some nice demos. Also includes a
(published at the PICS conference) describing the techniques used at a high level.
A nice example
of the degradation of images as compression is increased.
collection of papers
on wavelets from MathSoft.
the Algorithms in the Real World page.