We study codes that are list-decodable under insertions and deletions ("insdel codes"). Specifically, we consider the setting where, given a codeword \(x\) of length \(n\) over some finite alphabet \(\Sigma\) of size \(q\), \(\delta\cdot n\) codeword symbols may be adversarially deleted and \(\gamma\cdot n\) symbols may be adversarially inserted to yield a corrupted word \(w\). A code is said to be list-decodable if there is an (efficient) algorithm that, given \(w\), reports a small list of codewords that include the original codeword \(x\). Given \(\delta\) and \(\gamma\) we study what is the rate \(R\) for which there exists a constant \(q\) and list size \(L\) such that there exist codes of rate \(R\) correcting \(\delta\)-fraction insertions and \(\gamma\)-fraction deletions while reporting lists of size at most \(L\).

Using the concept of *synchronization strings*, introduced by the first two authors [Proc. STOC 2017], we show some surprising results. We show that for every \(0\leq \delta < 1\), every \(0 \leq \gamma < \infty\) and every \(\epsilon > 0\) there exists codes of rate \(1 - \delta - \epsilon\) and constant alphabet (so \(q = O_{\delta,\gamma,\epsilon}(1)\)) and sub-polynomial list sizes. Furthermore our codes are accompanied by efficient (polynomial time) decoding algorithms. We stress that the fraction of insertions can be arbitrarily large (more than 100%), and the rate is independent of this parameter. We also prove several tight bounds on the parameters of list-decodable insdel codes. In particular we show that the alphabet size of insdel codes needs to be exponentially large in \(\epsilon^{-1}\), where \(\epsilon\) is the gap to capacity above. Our result even applies to settings where the unique-decoding capacity equals the list-decoding capacity and when it does so, it shows that the alphabet size needs to be exponentially large in the gap to capacity. This is sharp contrast to the Hamming error model where alphabet size polynomial in \(\epsilon^{-1}\) suffices for unique decoding. This lower bound also shows that the exponential dependence on the alphabet size in previous works that constructed insdel codes is actually necessary!

Our result sheds light on the remarkable asymmetry between the impact of insertions and deletions from the point of view of error-correction: Whereas deletions cost in the rate of the code, insertion costs are borne by the adversary and not the code! Our results also highlight the dominance of the model of insertions and deletions over the Hamming model: A Hamming error is equal to one insertion and one deletion (at the same location). Thus the effect of \(\delta\)-fraction Hamming errors can be simulated by \(\delta\)-fraction of deletions and \(\delta\)-fraction of insertions — but insdel codes can deal with much more insertions without loss in rate (though at the price of higher alphabet size).