Ever since the birth of coding theory almost 60 years ago,
researchers have been pursuing the elusive goal of constructing
the "best codes," whose encoding introduces the minimum possible
redundancy for the level of noise they can correct. In this
article, we survey recent progress in *list decoding* that
has led to *efficient* error-correction schemes with an
*optimal* amount of redundancy, even against *worst-case
errors* caused by a potentially malicious channel. To correct
a proportion ρ(say 20%) of worst-case errors, these codes
only need close to a proportion ρ of redundant symbols. The
redundancy cannot possibly be any lower information
theoretically. This new method holds the promise of correcting a
factor of two more errors compared to the conventional algorithms
currently in use in diverse everyday applications.

Coping with corrupt data is an essential part of our modern day lives; even if most of the time we are blissfully unaware of it. When we watch television, the TV has to deal with signals that get distorted during transmission. While talking on our cellphones, the phone has to work with audio signals that get corrupted during transmission through the atmosphere though we definitely are aware of it when the connection is poor. When we surf the Internet, the TCP/IP protocol has to account for packets that get lost or garbled while being routed. When we watch movies on DVDs, the player has to overcome loss of data due to scratches. Even when we buy our groceries, the scanner has to deal with distorted barcodes on packages.

The key ingredient in coping with errors in these and many
other applications is an *error-correcting code* or just
*code* for brevity. The idea behind codes is conceptually
simple: add redundancy to the information so that even if the
resulting data gets corrupted, e.g. packets get corrupted during
routing or the DVD gets some scratches, the original information
can still be recovered.

Ideally, we would like to add a small amount of redundancy and
at the same time be able to correct many errors. As one might
expect, these are conflicting goals and striking the right
balance is where things get interesting. For example, consider
the code where every information bit is repeated say a 100 times
(this is known as the repetition code). Intuitively, this code
should do well. In particular, the following is a natural
error-recovery procedure or a *decoding algorithm*: for
every consecutive 100 bits of the data, identify whether the
majority of the bits is 0 or 1, and output the corresponding bit.
Unless we happen to be unlucky, this decoding algorithm can
recover from quite a few errors. The downside is that every 100
bits of data contain only *one* bit of information —
imagine how large a DVD would need to be in order to store a
movie with such redundancy. On the other extreme is the parity
code, which appends the parity of the information bits at the end
of the message. This code uses the minimum amount of redundancy
possible but has poor error-recovery capabilities. Indeed, even
if just two bits get flipped, it can go undetected. For example,
0001 gets encoded as 00011 under the parity code. If the first
two bits get corrupted and we receive 11011, we would
misinterpret the original message to be 1101. Imagine Clark Gable
saying at the end of your small parity encoded DVD for *Gone
with the Wind*, "Frankly, my dear, I don't give a ham!"

To capture this inherent tension between the redundancy and
the error tolerance of codes, let us define codes and some key
parameters formally. A code *C* is given by an
*encoding* map of the form *C*:
Σ^{k}→ Σ^{n} (for
integers *k < n*) which encodes a sequence of *k*
symbols from Σ into a larger sequence of *n* symbols.
Given a *message m* Σ^{k},
*C*(*m*) is known as the corresponding *codeword.*
The parameters *k, n*, and Σ are called the
*dimension, block length*, and *alphabet* of *C*,
respectively. We will often use the ratio *R = k/n*, which
is called the *rate* of *C*. Note that *R* exactly
captures the amount of information contained per bit of a
codeword. *The Hamming distance* between two strings in
Σ^{n} is the number of coordinates where
they differ. The *minimum distance* of a code is defined to
be the smallest Hamming distance between two distinct
codewords.

Thus, our question of interest can be now re-stated as
follows: given a code *C* of rate *R*, what is the
maximum fraction of errors (which we will henceforth refer to as
ρ) that can be tolerated by *C*? Now as every codeword
has *k* symbols of information, it is intuitive that in the
worst case at least *k* symbols of a codeword should be
uncorrupted to have any hope of recovering the original
information. In other words, we can only have ρ≤ (*n -
k*)/*n* = 1 - *R*, irrespective of the computational
power of the decoder.

The main focus of this article is the following question:

Can we construct a code

Cof rateRthat can be efficiently decoded from close to a 1 -Rfraction of errors?

Quite surprisingly, we will show the answer to be yes. Thus,
the above simple information-theoretic limit can in fact be
approached. In particular, for small rates, we can recover from
situations where almost all of the data can be corrupted. For
example, we will be able to recover even if Clark Gable were to
spout "alhfksa, hy meap xH don'z hive b hayn!" There are in fact
two parts to the above question. First, the code should be such
that the identity of the message is uniquely (or near uniquely)
pinned down based on the noisy version of its encoding. Second,
we need an *efficient* procedure to recover the message
based on the corrupted codeword, with runtime bounded by a
hopefully small polynomial in the block length *n*.
Brute-force approaches such as searching through all codewords
require time exponential in *n*, and are computationally
prohibitive. The main message of this article is that a variant
of the widely deployed Reed-Solomon codes can in fact be
error-corrected *efficiently*, even as the fraction of
errors approaches the absolute 1 - *R* limit.

We stress that in the above question, the noise model is a
*worst-case* one, where the channel is modeled as an
adversary/jammer that can corrupt the codeword in an arbitrary
manner, subject only to a limit on the total number of errors. No
assumptions are made on how the errors are distributed. The
worst-case model was put forth in Hamming's influential
paper.^{12} An alternate approach,
pioneered by Shannon in his seminal
work,^{16} is to model the noisy channel
as a stochastic process whose behavior is governed by a precisely
known probability law. A simple example is the binary symmetric
channel (BSC* _{ρ}*) where each bit transmitted
gets flipped with a probability ρ, independent of all other
bits. For every such channel, Shannon exactly characterized the
largest rate at which reliable communication is possible.

We note that obtaining *algorithmic* results is more
difficult in worst-case noise model. In fact, traditionally it
was believed that codes designed for worst-case noise faced a
limit of (1 - *R*)/2 fraction of errors, which is factor 2
off from the information-theoretic limit of a (1 - *R*)
fraction of errors. In attempting to correct *e* >
(*n* - *k*)/2 errors, we face a problem: there may be
more than one codeword within Hamming distance *e* from the
received word. In any code mapping *k* symbols to *n*
symbols, there must be two codewords at distance (*n* -
*k* + 1) or less. If one of these codewords is transmitted
and gets distorted by errors to halfway between the two
codewords, unambiguous recovery of the original message becomes
infeasible. This suggests that beyond a fraction (1 - *R*)/2
of worst-case errors, the original message is unrecoverable. This
was indeed the conventional wisdom in coding theory till the
1990s.

A natural strategy, in the event of multiple close-by
codewords, would be to allow the decoder to output a list of
codewords. This model is called *list decoding*. It was
introduced in the late 1950s by Elias^{2}
and Wozencraft,^{19} and revived with an
algorithmic focus for a cryptographic application by Goldreich
and Levin.^{4} Further, it turns out that
the bad case above is rather pathological-for typical patterns of
*e* errors, for *e* much bigger than (*n* -
*k*)/2, the original codeword will be the only codeword
within distance *e*. Thus, list decoding in addition to
handling the bad error patterns for "unique" decoding, also
allows us to uniquely recover the transmitted codeword for
*most* error patterns.

It is interesting to note that even though the list decoding
problem has a long history in the coding theory
world,^{a} a large share of the
algorithmic progress in list decoding has happened in the
theoretical computer science community. One of the reasons for
this happy coincidence is that list decoding has found numerous
applications in cryptography and computational complexity theory
(e.g., see the discussion on randomness extractors in Section
5).

In particular, in the last decade, the subject of list
decoding has witnessed a lot of activity, culminating in
algorithms that correct close to the information-theoretically
optimal 1 - *R* fraction of errors with rate *R*. The
purpose of this article is to discuss this recent body of results
which deliver the full promise of codes against worst-case
errors. We begin in Section 2 by describing a popular family of
codes and a few decoding algorithms for it.

In 1960, Irving Reed and Gustave Solomon described a
construction of error-correcting codes, which are called
Reed-Solomon codes after them, based on polynomials over finite
fields.^{b} Almost 50 years after their
invention, Reed-Solomon codes (henceforth, RS codes) remain
ubiquitous today in diverse applications ranging from magnetic
recording to UPS bar codes to satellite communications. To
describe the simple and elegant idea behind RS codes, imagine
Alice wishes to communicate a pair of numbers (*a*,
*b*) to Bob. We can think of (*a*, *b*) as
specifying a line in the plane (with *X*, *Y* axes)
with equation *Y* = *aX* + *b*. Clearly, to
specify the line, it suffices to communicate just two points on
the line. To guard against errors, Alice can oversample this line
and send more points to Bob, so that even if a few points are
distorted by errors, the collection of points resembles the
original line more closely than any other line (Figure
1). Thus, Bob can hope to recover the correct line, and in
particular (*a*, *b*).

To encode longer messages consisting of *k* symbols via
an RS code, one thinks of these as the coefficients of a
polynomial *f*(*X*) of degree *k* - 1 in a natural
way, and encodes the message as *n* > *k* points
from the curve *Y* - *f*(*X*) = 0. Equivalently,
the encoding consists of the values of the polynomial *f(X)*
at *n* different points. Formally, if F is a finite field
with at least *n* elements, and *S* =
(α_{1},
α_{2}....α_{n}) is a
sequence of *n distinct* elements, the RS encoding
RS_{F, S, n, k} (*m*) of a message *m* =
(*m*_{0}, *m*_{1},...,
*m*_{k}-1) is given by

where *f(X)= m*_{0} +
*m*_{1}*X* + ... +
*m*_{k}-1, *X*^{k}-1. We
stress here that the choice of *S* is up to the code
designer—we will exploit this feature in Section 3.2.

The following basic algebraic fact will be crucial:

A non-zero polynomial

p(X)of degreeDover a fieldFcan have at mostDdistinct roots, i.e., at mostDelements αFcan satisfyp(α) = 0.

This fact implies that the encodings of two distinct messages
*m* and m′ by RS_{F,S,n,k} must differ
in more than *n - k* locations.^{c}
The minimum distance of the RS code is thus at least *n - k
+* 1. It is in fact equal to *n - k +* 1: e.g., consider
encodings of the messages corresponding to the zero polynomial
and the polynomial . A minimum distance of *(n - k
+* 1) is the best possible for a code of dimension *k*,
making RS codes optimal in this regard.

**2.1. Correcting errors in RS codewords**

Why is the above large distance useful? If at most *(n -
k)*/2 errors corrupt the evaluations of a polynomial
*f*(*X*), then the encoding of *f(X)* remains the
best fit of the corrupted data in terms of agreements than the
encoding of any other polynomial *g(X)* of degree less than
*k*. Thus, one can hope to recover *f(X)* and the
correct message even in the presence of *(n - k)*/2 errors.
However, it is not immediate how to isolate the errors and
recover *f(X)* *efficiently.* We do not know the
locations of the errors, and trying all possibilities will need
exponential time.

Back in 1960, even before polynomial running time was
formalized as the notion underlying efficient algorithms,
Peterson^{15} described a polynomial
time algorithm to solve the above problem. We now describe the
high-level idea behind a different algorithm, due to Welch and
Berlekamp,^{18} following the elegant
description in Gemmell and Sudan.^{3}

Assume that the encoding
*(f(α _{1}*),...,

**Error Location via Bivariate Interpolation:** The key is
thus a clever method to locate the set *E* of error
locations quickly. To motivate this, let us cast the problem
geometrically as an equivalent *noisy curve fitting*
problem. We are given *n* points
(α_{i}, y_{i}) *i* =
1,2,..., *n* in the "plane" **F** × **F**.
The goal is to find the unique curve with equation *Y –
f(X) =* 0 where degree(*f*) < *k* that passes
through all but *e* ≤ *(n – k)/2* locations
*i*, namely those *in E.* If there was no noise,
fitting a curve through all *n* points is easy—it is
just polynomial interpolation. We know *Y – f(X)*
passes through *n – e* points, so we can get a curve
that passes through all the points by fitting vertical lines
through the error points along with the curve *Y – f(X)
=* 0; see Figure 2. Algebraically, if we
define

then the curve *P(X, Y) =* 0 passes through all the
points, i.e., *P(α _{i}*,

Given *P(X, Y)*, one can find its factors (which can be
done efficiently) and thus read off the message polynomial
*f(X)* from the *Y – f(X)* factor. But how do we
find *P(X, Y)?* Finding *P(X, Y)* in its factored form
(1) is begging the question, but note that we can also write
*P(X, Y)* in the form *P(X, Y) = D*_{1}(*X) Y
- N*_{1}(*X)* where degree(*D*_{1})
≤ *e* ≤ *(n – k)/*2 and degree (*N*
_{1}) < *e* + *k* ≤ *(n + k)/*2.

Knowing such a polynomial exists, we can try to find a nonzero
bivariate polynomial *Q(X, Y) = D*_{2}(*X)Y
– N*_{2}(*X)* satisfying

- degree(
*D*_{2}) ≤*(n - k)/*2 and degree*(N*_{2}) < (*n + k*)/2 *Q(α*,y_{i}_{i}) = 0 for*i =*1, 2, ...,*n*

This can be done by setting up a system of linear equations
over **F** with unknowns being the coefficients of
*D*_{2}*(X)* and
*N*_{2}*(X)*, and *n* linear constraints
*Q(α _{i}*, γ

One can prove, using elementary algebra, that when the number
of errors *e* ≤*(n – k)/*2, *any*
interpolated *Q(X, Y)* satisfying the above two conditions
*must* have *P(X, Y)* as a factor, and be of the form
*P(X, Y) A(X)* for some nonzero (possibly constant)
polynomial *A(X).* The intuitive reason is that since there
are so few errors in the data compared to the curve *Y –
f(X)* = 0, the *curve P(X,Y)* = 0 is the most economical
way to fit all the data points. The formal proof proceeds by
considering the polynomial *R(X)*^{def}= *Q(X,
f(X))* and arguing it must be identically zero since (i) it
has at least *(n + k)/*2 roots (namely all
α_{i}'s for which
*f*(*α _{i}*) =

**2.2. List decoding Reed–solomon
codes**

We now turn to list decoding of Reed–Solomon codes. The
setup is as before: given *n* points
(*α _{i}*,

Before delving into the algorithms, we pause to consider how
large a number of errors *e* one can target to correct.
Clearly, we need the guarantee there will be only a few (say, at
most polynomially many in *n)* solution polynomials
*(X)* or else there is no hope for the decoder to
output all of them in polynomial time. Using the fact that
encodings of any two polynomials differ in more than *(n
– k)* locations, it can be shown (via the so-called
"Johnson bound") that for , the number of solutions
(called the *list size)* is guaranteed to be polynomially
small. Whether one can prove a polynomial list size bound for
certain RS codes for even larger *e* remains a key open
question.

We now describe the idea underlying Sudan's breakthrough
algorithm for list decoding RS
codes.^{17} Recall that we want to solve
a noisy curve fitting problem, and output all curves *Y –
(X) =* 0 with deg() < *k* that pass
through *n – e* or more of the *n* points
(*α _{i}*,

For the Berlekamp–Welch algorithm, arguing that *Y
– (X)* was a factor of *Q(X, Y)* followed
from the very special structure of *Q(X, Y).* In the list
decoding case, Sudan exploited special properties of
intersections of curves of the form *Y - (X)*
with any interpolated bivariate polynomial *Q(X, Y)* with
appropriate degree constraints. Informally, Sudan's idea is that
given the strong degree constraint on *Q(X, Y)*, every curve
*Y – (X) =* 0 with deg() <
*k* that picks up at least *n – e* of points must
be "used" by the interpolated curve in meeting the requirement to
pass through all *n* points. As an example, in
Figure 3, the goal is to find all lines (i.e., we
have *k =* 2) that pass through all but *e =* 9 of the
*n =*14 input points (there are two such lines, marked in
the figure *as L*_{1}(*X, Y)* and
*L*_{2}*(X, Y)).* There are enough degrees of
freedom in the equation of a degree 4 curve so that one can fit a
degree 4 curve through any set of 14 points. The figure
illustrates one such curve, which is the product of the two lines
with an "ellipse" *E(X, Y).* (Note that the total degree of
*Q(X, Y)* is 4.) Further, we see that the two relevant lines
pop out as factors. This is not a coincidence, and every degree 4
curve passing through the 14 points must have these two lines as
factors. The reason: if a line is not a factor, then it can
intersect a degree 4 curve in at most 4 points. Since each of
these lines intersects any interpolated curve in at least 5
points, it must be a factor.

More formally, the "degree" measure of the interpolated
polynomial *Q(X,Y)* will be the (1, *k –*
1)-degree, which is defined as the maximum of *i* +
(*k* – 1)*j* over all monomials
*X ^{i}*Y

For low rates, this algorithm enables recovery even in
settings when noise overwhelms correct data, and close to 100% of
the symbols may be in error. This feature sets the stage for
several powerful applications in cryptography and complexity
theory. However, the algorithm does not give any improvement over
the (1 – *R*)/2 error fraction corrected by
traditional algorithms for rates > 1/3, and also falls short
of the radius suggested by the combinatorial
bounds.

We now turn to the improved algorithm correcting a fraction
of errors due to Guruswami and
Sudan.^{10} The key new idea is to
insist that the interpolated polynomial *Q(X, Y)* has
*multiple* zeroes at each of the *n* points. To explain
this, we attempt a high-level geometric description. Consider the
example in Figure 4 with *n =* 10 points,
the goal being to output all lines that pass through at least
*n – e =* 4 points. This example cannot be solved by
Sudan's algorithm. Indeed, since there are five solution lines,
if they are all factors of some interpolated curve, the curve
must have degree at least 5. However, there is no guarantee that
an arbitrary degree 5 curve through the points must have every
line passing through 4 of the points as a factor (the line has to
pass through 6 points to guarantee this). Let *C** be the
degree 5 curve that is the product of the five solution lines. As
mentioned above, if we interpolate a degree 5 curve through the
10 points, we will in general *not* get *C** as the
solution. However, notice a special property of
*C**—it passes through each point *twice;* a
priori there is no reason to expect that an interpolated curve
will have this property. The Guruswami–Sudan idea is to
*insist* that the interpolation stage produces a degree 5
curve with zero of multiplicity at least 2 at each point (i.e.,
the curve intersects each point twice). One can then argue that
each of the five lines must be a factor of the curve. In fact,
this will be the case for degree up to 7. This is because the
number of intersections of each of these lines with the curve,
counting multiplicities, is at least 4 × 2 = 8 which is
greater than the degree of the curve. Finally, one can always fit
a degree 7 curve passing through any 10 points twice (again by a
counting argument). So by insisting on multiplicities in the
interpolation step, one can solve this example.

In general, the interpolation stage of the
Guruswami–Sudan list decoder finds a polynomial *Q(X,
Y)* that has a zero of multiplicity *w* for some suitable
integer *w* at each *(α _{i}*,
y

**Soft Decoding:** We now comment on a further crucial
benefit of the multiplicities idea which is relevant to potential
practical applications of list decoding. The multiplicities can
be used to encode the relative importance of different codeword
positions, using a higher multiplicity for symbols whose value we
are more confident about, and a lower multiplicity for the less
reliable symbols that have lower confidence estimates. In
practice, such confidence estimates (called "soft inputs") are
available in abundance at the input to the decoder (e.g., from
demodulation of the analog signal). This has led to a promising
*soft-decision* decoder for RS codes with good coding gains
in practice,^{13} which was adopted in
the Moonbounce program to improve communication between Ham radio
operators who bounce radio signals off the moon to make long
distance contacts.

We now discuss a variant of RS codes called *folded
Reed–Solomon codes* (henceforth folded RS codes), which
will let us approach the optimal error-correction radius of a
fraction 1 – *R* of errors. The codewords in the
folded RS code will be in one-to-one correspondence with RS
codewords. We begin with an informal description. Consider the RS
codeword corresponding to the polynomial (*X*)
that is evaluated at the points *x*_{0},
*x*_{1}, ..., *x _{n}*

We would like to stress on a subtle point here: in the
worst-case error model, the "atomic" unit of error is an alphabet
character. This was used crucially in the example above to rule
out an error pattern that was admissible for the smaller
alphabet. For the reader who might we worried that this
constitutes "cheating," e.g., what if one collapses the entire RS
codeword into one large symbol, we offer two counter-points.
First, since we will only use a *constant* folding
parameter, the increase in alphabet size from that of RS codes is
modest. Second, in Section 4, we will see how to convert folded
RS codes into codes over alphabets whose size *does not depend
at all* on the block length, while still maintaining similar
error-correction properties.

We now formally define the folded RS code. Let the nonzero
elements of the field **F** be generated by *γ*,
i.e., every nonzero element is *γ ^{i}* for
some 0≤

The block length of these codes is *N* =
*n*/*m*. The rate of the code remains
*k*/*n*, since the folding operation does not introduce
any further redundancy.

The folding operation does restrict the error patterns that one needs to worry about. But how can one actually exploit this in a decoding algorithm and manage to correct a larger fraction of errors compared to the unfolded RS codes? We turn to this question next.

**3.1. Multivariate decoding**

Recall the two step Guruswami-Sudan (GS) algorithm. First, we
interpolate a bivariate polynomial *Q*(*X*, *Y*)
through the points (*α _{i}*,

Let us now return to problem of list decoding folded RS code
with *m =* 4. Given the received word whose *i*th
symbol (for 0 ≤ *i < N)* is *(y _{i,0}*

Given the setup above, the first thing to explore is whether
one can generalize the GS algorithm to the setting of interleaved
RS codes. To see one such generalization, note that RS codes are
interleaved RS codes of order 1. The GS algorithm interpolated a
nonzero bivariate polynomial *Q(X, Y)* in this case. Thus,
for an interleaved RS code of order 4, a natural attempt would be
to compute a nonzero 5-variate polynomial *Q(X, Y, Z, U,
W)*, where (as before) *Y* is a placeholder *for
(X)* and (this part is the generalization) *Z, U*, and
*W* are placeholders *for _{1}*(X),

The generalization above (and similar ideas) were tried out in
a few works, but could not decode beyond the radius.
Finally, 7 years after the GS algorithm was published, Parvaresh
and Vardy^{14} had an ingenious idea:
force the polynomials _{1}(*X*),
_{2}(*X*) and
_{3}(*X*) to be related to
(*X*). In particular, they only look at the
subcode of the interleaved RS code where
* _{j}* (

Finally, let us return to the problem of list decoding folded
RS codes with *m* = 4. *In* folded RS codes also, we
have the property that * _{j}*(X) is related

**3.2. The final piece**

The improvement over Parvaresh-Vardy comes from comparing
apples to oranges. In particular, till now we have seen that the
folded RS code with folding parameter 4 is a special case of the
PV code of order 4. Instead let us compare the folded RS code
with a PV code of *smaller* order, say 2. It turns out that
the folded RS code with folding parameter 4 are compressed forms
of certain specific PV codes of order 2 and we will exploit this
observation. In particular, as in Figure 7
compare the folded RS codeword with the PV code of order 2 (where
the polynomial (*X*) is evaluated at the
points {1, γ, *...*, γ^{n}^{−1}}\{γ^{3},
γ* ^{7}*, ..., γ

Thus, our list decoding algorithm for folded RS with folding
parameter *m* can be modularly defined as follows: unfold
the received word for the appropriate PV code of order
*s≤m* and then run the Parvaresh&Vardy list decoder
on this unfolded received word. It turns out that this list
decoder can correct such a folded RS code of *R* from up to
fraction of errors. By picking *m* to be
(somewhat) larger than *s* and picking *s* to be
sufficiently large (in terms of 1/), we can conclude the
following result.

Theorem 1. *For every ε >* 0 *and* 0 <
*R <* 1, *there is a family of folded RS codes that
have rate at least R and which can be list decoded up to a
fraction* 1 - *R - ε of errors in time*
(*N/ε ^{2}*)

We note that the time complexity has an undesirable dependence
on *ε*, with 1/*ε* in the exponent.
Improving this bound remains a challenging open question.

So far we have discussed codes over large alphabets. For
example, folded RS codes of rates *R* that can be list
decoded from 1 − *R −* ** ε**
fraction of errors need alphabet size of roughly

We start with perhaps the most natural small alphabet: {0, 1}.
For codes defined over this alphabet (also called *binary*
codes), it turns out that to list decode from ρ fraction of
errors the best possible rate is 1 −
*H*(ρ*)*, where *H(x) = −
x*log_{2}*x −* (1 − *x*)
log_{2}(1 − *x*) is the entropy function. Two
remarks are in order. First, the rate 1 *- H*(ρ*)*
is much smaller than the rate of 1 − ρ that folded RS
codes can achieve. (It turns out that to attain a rate of 1
− ρ− ε, the alphabet size needs to be at
least 2^{1/ε}; more on this later in the
section.) Second, as shown in Shannon's seminal
paper,^{16} the quantity 1 −
*H*(ρ*)* is *exactly* the same as the best
possible rate (aka "capacity") that can be achieved in the binary
symmetric channel BSC. Thus, list decoding can bridge the
traditionally perceived gap between the Shannon stochastic model
and the Hamming worst-case model.

We "transfer" our result for folded RS codes to a result for
binary codes via a natural method for composing together codes
called *code concatenation*, proposed by Forney over 40
years ago. Thanks to the powerful algorithms for decoding folded
RS codes, we can use this approach to achieve a certain trade-off
called the *Zyablov bound* between rate and fraction of
errors corrected.^{9} In a subsequent
work, using another generalization of code concatenation, we
improved the trade-off to the *Blokh-Zyablov
bound*.^{8} Figure 8
plots these two trade-offs along with the best possible trade-off
(list-decoding capacity). There is a large gap between the
list-decoding capacity of binary codes and the best bound known
to be achievable with efficient algorithms. Closing this gap
remains a central and extremely challenging open question.

We now briefly mention how we resolve the large alphabet issue
that was raised in Section 3. When the folding parameter of the
folded RS code is a constant (as in Theorem 1), the number of
bits needed to represent a symbol from the alphabet is no larger
than roughly the logarithm of the block length of the folded RS
code. This is small enough to use the idea of code concatenation
mentioned above to reduce the alphabet. In order to maintain the
optimal trade-off between rate and fraction of errors list
decoded, we need to combine concatenation with an approach to
redistribute symbols using *expander
graphs*.^{6} This leads to codes of
rate *R* that can be list decoded from a fraction 1 −
*R* − *ε* of errors over an alphabet of
size 2^{1/ε} ^{4}, which is close
to the lower bound of 2^{1/ε} mentioned
earlier.

First, we mention some related work that appeared subsequent
to the initial publication of our result. Further work on
extending the results in this article to the framework of
*algebraic-geometric codes* has been done in
Guruswami^{5} and Guruswami and
Patthak.^{7} A surprising application of
the ideas in the Parvaresh-Vardy list decoder is the construction
of *randomness extractors* by Guruswami, Umans, and
Vadhan.^{11} Randomness extractors
convert input from a weakly random source into an almost
perfectly random string, and have been intensively studied in
theoretical computer science for over 15 years. This recent
extractor is almost optimal in all parameters, while having a
simple, self-contained description and proof.

Even though the work presented in this article makes good progress in our theoretical understanding of list decoding, applying these ideas into practice requires further innovation. We conclude by posing two practical challenges.

The first challenge is specific to making the list decoding algorithms for folded RS codes more practical. Recall that the algorithm involved an interpolation step and a "root-finding" step. There are fast heuristic approaches for the latter step that could be used in practice. The interpolation step, however, seems too inefficient for practical purposes due to the large size of the linear systems that need to be solved. It would be very useful to have more efficient algorithms for this step. We note that such improvements for the Guruswami–Sudan algorithm have been obtained.

The second challenge is more general. Codes have found numerous practical applications in domains such as communication and data storage. Despite its promise and the recent advances, list decoding has not yet found widespread use in practical systems (though as mentioned earlier, the Moonbounce program does use the multiplicities based list decoder). One possible reason could be that the previous list decoding algorithms do not provide much gain for the high rate regime over traditional unique decoding algorithms. However, this is no longer a concern—we now have algorithms that obtain much better theoretical bounds in this regime. Further, folded RS codes are very similar to RS codes that are ubiquitous in practice. Hopefully in the near future, list decoding will be used more widely in practical systems.

The research described here was supported in part by NSF award CCF-0343672 and fellowships from the Sloan and Packard Foundations. We thank Ronitt Rubinfeld for several valuable comments on an earlier draft of this paper.

1. Berlekamp, E. Factoring polynomials over
large finite fields. *Math. Comput. 24* (1970),
713–735.

2. Elias, P. List decoding for noisy
channels. *Technical Report 335*, MIT Research Lab of
Electronics, 1957.

3. Gemmell, P., Sudan, M. Highly resilient
correctors for multivariate polynomials. *Information
Processing Lett. 43*, 4 (1992), 169–174.

4. Goldreich, O., Levin, L. A hard-core
predicate for all one-way functions. In *Proceedings of the
21st ACM Symposium on Theory of Computing*, 1989,
25–32.

5. Guruswami, V. Artin automorphisms, cyclotomic function fields, and folded list-decodable codes, 2008. Manuscript.

6. Guruswami, V., Indyk, P. Linear-time
encodable/decodable codes with near-optimal rate. *IEEE Trans.
Information Theory 51*, 10 (2005), 3393–3400.

7. Guruswami, V., Patthak, A. Correlated
algebraic-geometric codes: Improved list decoding over bounded
alphabets. *Math. Comput. 77*, 261 (Jan. 2008),
447–473.

8. Guruswami, V., Rudra, A. Better binary
list-decodable codes via multilevel concatenation. In
*Proceedings of the 11th International Workshop on
Randomization and Computation*, 2007, 554–568.

9. Guruswami, V., Rudra, A. Explicit codes
achieving list decoding capacity: Error-correction up to the
Singleton bound. *IEEE Trans. Information Theory 54*, 1
(2008), 135–150. Preliminary version in *STOC'06*.

10. Guruswami, V., Sudan, M. Improved
decoding of Reed–Solomon and algebraic-geometric codes.
*IEEE Trans. Information Theory*, *45* (1999),
1757–1767.

11. Guruswami, V., Umans, C., Vadhan, S.P.
Unbalanced expanders and randomness extractors from
Parvaresh–Vardy codes. In *Proceedings of 22nd IEEE
Conference on Computational Complexity*, 2007,
96–108.

12. Hamming, R.W. Error detecting and error
correcting codes. *Bell System Technical J. 29* (Apr. 1950),
147–160.

13. Koetter, R., Vardy, A. Algebraic
soft-decision decoding of Reed–Solomon codes. *IEEE
Trans. Information Theory 49*, 11 (2003), 2809–2825.

14. Parvaresh, F., Vardy, A. Correcting
errors beyond the Guruswami–Sudan radius in polynomial
time. In *Proceedings of the 46th IEEE Symposium on Foundations
of Computer Science*, 2005, 285–294.

15. Peterson, W.W. Encoding and
error-correction procedures for Bose–Chaudhuri codes.
*IEEE Trans. Information Theory*, *6* (1960),
459–470.

16. Shannon, C.E. A mathematical theory of
communication. *Bell System Technical J. 27* (1948),
379–423, 623–656.

17. Sudan, M. Decoding of Reed–Solomon
codes beyond the error-correction bound. *J. Complexity*,
*13*, 1 (1997), 180–193.

18. Welch, L.R., Berlekamp, E.R. Error
correction of algebraic block codes. *US Patent Number
4,633,470*, December 1986.

19. Wozencraft, J.M. List Decoding.
*Quarterly Progress Report, Research Laboratory of Electronics,
MIT*, 48 (1958), 90–95.

a. The problem was introduced about 50 years ago and the main combinatorial limitations of list decoding were established in the 1970s and 1980s.

b. For this article, readers not conversant
with fields can think of a finite field as {0, 1, ...,
*p*−1) for a prime *p* with addition and
multiplication operations defined modulo *p*.

c. If not,
RS_{F,S,n,k}(m)–RS_{F,S,n,k}(m′),
which corresponds to the evaluation of the non-zero polynomial
(*m _{i}*–m′

d. We skip formalizing the notion of
multiple zeroes in this description, but this follows along
standard lines and we refer the interested reader to Guruswami
and Sudan^{10} for the details.

e. This is the same as looking at an appropriate subset of messages.

The original version of this paper was published in *IEEE
Transactions on Information Theory*, 2008.

DOI: http://doi.acm.org/10.1145/1467247.1467269

Figure 1. Oversampling from a line *Y* = *aX*
+ *b* to tolerate errors, which occur at *X* = 3 and
5.

Figure 2. An RS codeword (polynomial *f*(*X*)
evaluated on points *α*_{1},
*α*_{2},...,
*α*_{11}); its corruption by two errors (at
locations *α*_{2} and
*α*_{5}); and illustration of the curve
fitting through the noisy data using correct curve and
"error-locator lines."

Figure 3. illustration of list decoding of RS code that evaluates lines over the points −7, −5, −4,..., 4, 5, 6, 7. The two lines are recovered as factors of a degree 4 interpolated curve through all the points.

Figure 4. Illustration of Guruswami–Sudan algorithm for list decoding RS codes. The lines are recovered as factors of a degree 5 curve that passes through each point twice.

Figure 5. Rate vs. error-correction radius for RS codes. The optimal trade-off is also plotted, as is the Parvaresh–Vardy's improvement over RS codes.

Figure 6. Folding of the Reed–Solomon code with
parameter *m* = 4. Each column represents an alphabet
character.

Figure 7. The correspondence between a folded RS code
(with *m* = 4 and *x _{i}* =

Figure 8. Error-correction radius ρ of our
algorithms for binary codes plotted against the rate *R*.
the best possible trade-off, i.e., list decoding capacity, is
ρ = *H*^{−1}(1 – *R*), and is
also plotted.

**©2009
ACM 0001-0782/09/0300 $5.00**

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.