Searching for the Limits of Local Error Correction

The Key to Better Algorithms Is Making the Math Work

By Charlotte Hu

Aaron Aupperlee

Senior Director of Media Relations, School of Computer Science
aaupperlee@cmu.edu
412-268-9068

SCS researchers have developed ways to improve error-correction algorithms by breaking apart the math behind them.

Information can be finicky, especially if it has to travel. Whether you're making a phone call over a wireless network, playing music from a CD, or saving a document to a hard drive, when you transform or transmit information from one location to another, it has to go through many channels.

For a wireless call, for example, an information signal in the form of bits — some sequence of zeroes and ones that records everything you say and correlates it with natural language —bounces around your walls and hopefully ends up at your router. Naturally, throughout its journey, it can incur some errors or corruptions from electronic noise or other environmental disturbances. Some zeroes in the sequence could show up as ones, and the original message could become garbled.

Because errors are fairly unavoidable in computing and communications, scientists have long studied strategies to detect and correct them. At Carnegie Mellon University, Pravesh Kothari, formerly an assistant professor in the School of Computer Science and now an assistant professor at Princeton University and adjunct faculty at CMU, worked with Peter Manohar, a Ph.D. student in CMU's Computer Science Department, to develop ways to improve these error-correction algorithms by breaking apart the math behind them. Their most recent paper, "An Exponential Lower Bound for Linear 3-Query Locally Correctable Codes," adds to a growing body of mathematical proofs aimed at fine-tuning error-correction algorithms. Their work could also be used in cryptography, which allows individuals to securely share private information.

Research into error-correcting code stretches back decades. In 1948, MIT professor Claude Shannon came up with a simple trick to reliably communicate through unreliable channels, which improved information transmission rates. The trick? Adding redundancy to the information. In other words, send a bit sequence with cleverly chosen additions so the original message can be pieced together even if corruption occurs.

Error correction takes different forms for different applications. And while these algorithms may vary, the underlying principle is the same. Computation happens on an encoded input — some redundant version of the actual input — to allow for error correction. CD-ROMs contain error-correcting codes, which is why discs can only hold around 700 megabytes of data even though their true capacity is 1,000 megabytes. This discrepancy allows the code to account for excess, redundant information that will allow the CD to play even if scratches on the surface destroy some bits.

The most basic version of error correction is to send each bit multiple times. Let's say you wanted to send 011. You would change the sequence to 000111111 so each block of three bits in this input encodes a single bit that you intended to send. If an error crept into one of the blocks and the message you received is 010111111, you would look at what the majority of the bits in each block say, and use that to suss out the error.

But this is an inefficient scheme, because to correct even a single error you have to blow up the size of the sequence you're sending by three times. Computer scientists — including Kothari and Manohar — have sought ways to make this process more efficient. The challenge is devising a code with the highest level of accuracy and the lowest amount of added redundancy. That boils down to a math problem.

Shannon's original proposal in 1948 suggested that to have a fixed error-correction rate — correcting 1% of errors, for example — the size of the message bits needed to increase by a small multiplicative factor. So the size of the message plus the redundancy scales linearly as message size increases. But to put this idea into an error-correcting code, you need two pieces: an encoding algorithm and a decoding algorithm. The first takes some amount of message bits and stretches them by adding some redundancy. When a message word is run through the encoder, it turns into a code word, which is slightly longer than the message word because it captures some of the redundancy. After the message arrives at its destination, the decoding algorithm takes the possibly corrupted code words and turns them back into the message bits that were originally sent. Reed-Solomon codes, which are used on CDs, apply this basic structure using an algebraic function with some polynomials.

A local decoding algorithm — the type of decoding algorithm Manohar and Kothari focused on in their work — has a special property that allows it to query or examine random bits in the corrupted code word to retrieve the original message bit they code for. In a 3-query locally decodable algorithm, you would examine three random bits in the code word.

"We're not always going to recover the first bit correctly, because you could look at a triple that is corrupted," Manohar said. "But there's some good chance, like a two-thirds chance, that we get it right."

Locally correctable algorithms derive from a similar setup as locally decodable algorithms. They are noted to be stronger than local decoding algorithms because they have to recover a larger chunk of information.

"In local decoding I only want to reliably compute any given bit of the message," Kothari said. "But in local correction, I want to reliably compute any given bit of the code word."

Here's where the math gets interesting. A scheme called the Hadamard code can be used for both local decoding and local correction. It only needs two queries to locally correct, but it exponentially stretches the message bits. Researchers in 2007 and 2009 worked out a way to build a 3-query code of subexponential length — a vastly more efficient encoding than the Hadamard code.

"By making the decoder only a tiny bit more complex, it's making our codes quite a bit more efficient," Kothari said.

But the efficiencies discovered by altering the Hadamard code only translate to 3-query decodable algorithms, not 3-query correctable algorithms. There are no space-saving efficiencies known for the 3-query correctable algorithms.

The new paper that Kothari and Manohar penned mathematically proves that such an improvement is in fact impossible.

"For the first time, we have a provable gap between local decodability and local correctability," Kothari said. "We have a confirmation, a mathematical proof that local correction is indeed a stronger requirement than local decodability. And we have a concrete bound that suggests that the Reed-Muller codes, which are closely related to the Reed-Solomon codes, are the best."

The specific functions that Kothari and Manohar studied are mostly used directly in complexity theory, which calculates limits on the speed of computations. Indirectly, the principles behind 3-query locally decodable and correctable algorithms can be applied to a variety of cryptographic problems such as private information retrieval, which allows the user to download information from a database without the owner of the database knowing what they're looking at; and secret sharing, distributing sensitive, high-stakes information piecemeal across multiple individuals.

Manohar and Kothari will present their work at the Association for Computing Machinery's Symposium on Theory of Computing this June in Vancouver.