CS294-3: Algorithms in the Real World (Guy Blelloch, Fall 97)

next up
Topic 1: Compression

[ Topics | Scribe Notes | Readings | Text Books | Links ]


Topic Outline

  • Introduction
  • Entropy and basics of information theory
  • Lossless compression
  • Huffman and Shannon-Fano Coding
  • Arithmetic Coding
  • LZ78 and LZW (GIF, Unix Compress), and LZ77 (gzip, zip lrj, zoo)
  • DMC (Dynamic Markov Coding) and PPM (Prediction by Partial Matching)
  • Context coding (JBIG) and residual compression (lossles JPEG)
  • Lossy compression
  • Scalar and vector quantization
  • Transform coding
  • Block Cosine (JPEG and MPEG)
  • Wavelets
  • Fractal coding
  • Model-based coding
  • Scribe Notes

  • Compression 1 (Ben Zhao)
  • Compression 2 (Gabriel Moy)
  • Compression 3 (draft) (Ben Liblit)
  • Readings

  • Introduction to data compression by Peter Gutmann. Overview of Huffman, Shannon-Fano, Arithmetic, LZ78, LZW, and LZ77 coding techniques.
  • LZW explained by Steve Blackstock. A nice 200 line description of Lempel-Ziv Welch compression along with the particular implementation used in the GIF standard.
  • Image Compression Techniques.
    This has little technical content, but gives a broad background of the various techniques.
  • Video Compression. This has more technical content, but basically only describes JPEG and MPEG.
  • Benchmarks

  • The Archive Comparison Test (ACT) is an exelent collection of up-to-date comparisons of many compression algorithms with both compression ratios, and run times. The overall winning implementations tend to be based on the the Burrows-Wheeler block sorting algorithm (e.g. BZIP), variants of PPM (e.g. BOA), or ACB.
  • A comparison of various compression codes for text and black and white images. Shows progression of the state-of-the-art over the years. The data is somewhat out of date (e.g. the best bpc for the Calgary Corpus is now around 2).
  • Recommended Text Books

    My favorites
  • David Salomon. Data Compression: The Complete Reference. Springer Verlag, 1998.
    Goes through a wide variety of topics and specific "real world" algorithms. The theory is not as strong as Sayood's book (below), and the algorithms are sometimes not described in enough depth to implement them.
  • Khalid Sayood. Introduction to Data Compression. Morgan Kaufmann, 1996.
    Looks at both Theoretical and practical aspects of data compression. Discusses a wide range of lossless and lossy compression methods, including fractals, wavelets, and subband coding.
  • Others (Alphabetical order)
  • Timothy C. Bell, John G. Cleary, and Ian H. Witten. Text compression. Prentice Hall, 1990.
  • Darrel Hankersson, Greg A. Harris, and Peter D. Johnson Jr.. Introduction to Information Theory and Data Compression. CRC Press, 1997.
  • Jerry Gibson, Toby Berger, Tom Lookabaugh, Rich Baker and David Lindbergh. Digital Compression for Multimedia: Principles & Standards. Morgan Kaufmann, 1998.
  • Gilbert Held and Thomas R. Marshall. Data and Image Compression: Tools and Techniques. Wiley 1996 (4th ed.)
    This book is quite basic and does not cover many important topics. On the other hand, it includes source code and a detailed description of most of the basic algorithms.
  • Mark Nelson. The Data Compression Book. M&T Books, 1995.
  • James A. Storer (Editor). Image and Text Compression. Kluwer, 1992.
    This a collection of articles on a broad set of topics.
  • Ian H. Witten, Alistair Moffat, and Timothy C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold, 1994.
  • Further Readings and Links

  • General links
  • comp.compression faq. This is surely the best source of information on compression available on the web.
  • Compression Pointers (comprehensive, but badly organized).
  • An online overview of lossless compression techniques.
  • Data compression links.
  • Yet more links
  • Entropy and Information Theory
  • Entropy on the World Wide Web
  • LZ78 and LZW
  • Terry A. Welch. A Technique for High Performance Data Compression. IEEE Computer, Vol. 17, No. 6, 1984, pp. 8-19.
  • The GIF format, which uses LZW, is described in GIF87(5) - GIF 87 and GIF89a(5) - GIF 89a standards.
  • The GIF Licensing Controversy. The GIF format is patented by ComputServe and the LZW algorithm used by the format is patented by Unisys.
  • LZ77 (sliding window compression)
  • E. Fiala and D. Greene. Data compression with finite windows (abstract).
  • A.D. Wyner and J. Ziv. The sliding-window Lempel-Ziv algorithm is asymptotically optimal. (abstract).
  • The PPM algorithm.
  • A. Moffat. Implementing the PPM data compression scheme. (abstract).
  • Bill Teahan's home page has pointers to several papers on PPM.
  • The Burrows-Wheeler algorithm
  • The Burrows-Wheeler block sorting algorithm off of the compression faq.
  • BZIP, an implementation that uses Burrows-Wheeler.
  • JPEG and MPEG.
  • JPEG FAQ
  • MPEG Starting Points and FAQs.
  • A nice MPEG FAQ.
  • Audio Compression
  • MPEG Audio Layer-3 overview, FAQ, and general resources
  • Another FAQ for MPEG Audio
  • Audio Compression References
  • Information on lossless audio compression.
  • Fractal compression
  • Introduction to fractal compression off of the compression faq.
  • What is the state of fractal compression?.
  • Links to various resources organized by Yuval Fisher. He also has a book on the topic.
  • Wavelet compression
  • Amara's Wavelet Page. This page is comprehensive including an online Introduction to Wavelets.
  • Summus. A company that sells products based on wavelet compression and has a nice short overview of Wavelet theory off of their home page.
  • A nice example of the degragation of images as compression is increased.
  • Some cute wavelet applets
  • An excellent collection of papers on wavelets from MathSoft.

  • Back to the Algorithms in the Real World topics page.
    Guy Blelloch, guyb@cs.berkeley.edu.