Repairing Reed-Solomon Codes
September 16, 2015

A fundamental fact about polynomial interpolation is that k evaluations of a degree-(k-1) polynomial f are sufficient to determine f. This is also necessary in a strong sense: given k-1 evaluations, we learn nothing about the value of f on any k'th point. In this paper, we study a variant of the polynomial interpolation problem. Instead of querying entire evaluations of $f$ (which are elements of a large field F), we are allowed to query partial evaluations; that is, each evaluation delivers a few elements from a small subfield of F rather than a single element from F. We show that in this model, one can do significantly better than in the traditional setting, in terms of the amount of information required to determine the missing evaluation. More precisely, we show that only O(k) bits are necessary to recover a missing evaluation. In contrast, the traditional method of looking at k evaluations requires Omega(klog(k)) bits. We also show that our result is optimal for linear recovery schemes, even up to the leading constants.

Our motivation comes from the use of Reed-Solomon (RS) codes for distributed storage systems, in particular for the exact repair problem. The traditional use of RS codes in this setting is analogous to the traditional interpolation problem. Each node in a system stores an evaluation of f, and if one node fails we can recover it by reading k other nodes. However, each node is free to send less information, leading to the modified problem above. The quickly-developing field of regenerating codes has yielded several codes which take advantage of this freedom. However, these codes are not RS codes, and RS codes are still often used in practice; it was a question of (Dimakis et al., 2011) how well RS codes could perform in this setting. Our results imply that RS codes can also take advantage of this freedom to download partial symbols. In some parameter regimes---those with small levels of sub-packetization---our scheme for RS codes outperforms all known regenerating codes. Even with a high degree of sub-packetization, our methods give non-trivial schemes, and we give an improved repair scheme for a specific (14,10)-RS code used in the Facebook Hadoop Analytics cluster.

This is joint work with Venkat Guruswami.