Convertible Codes: Efficient Conversion of Coded Data in Distributed Storage
February 17, 2021 (Zoom - See email or contact organizers for link)

Abstract: Erasure codes are typically used in large-scale distributed storage systems to provide reliability against failures. In this setting, a set of k blocks to be stored is encoded using an [n, k] code to generate n blocks that are then stored on different nodes. Recent work shows that the failure rate of storage devices vary significantly over time, and that changing the rate of the code (via a change in the parameters n and k) in response to such variations provides significant reduction in storage space requirement. However, the overhead on CPU, disk IO, and network bandwidth of realizing such a change in the code rate on already encoded data for traditional codes is prohibitively high. Motivated by this application, we present a new framework to formalize the notion of code conversion, i.e., the process of converting data encoded with an [n^I, k^I] code into data encoded with an [n^F, k^F] code while maintaining desired decodability properties, such as the maximum-distance-separability (MDS) property. We also introduce convertible codes, a new class of code pairs that allow for code conversions in a resource-efficient manner. We derive the fundamental limits on the cost of code conversion and propose constructions which minimize that cost. Our results thus show that it is indeed possible to achieve code conversions with significantly less resources than the default approach of re-encoding.