 I am a final year PhD student at Carnegie Mellon University's Computer Science Department and a member of Algorithms and Complexity group particiapating in Algorithms, Combinatorics, and Optimization (ACO) program. I am very fortunate to be advised by Bernhard Haeupler. During the summer of 2019, I worked as a research intern at Microsoft DNA Data Storage group under the supervision of Sergey Yekhanin.

I am broadly interested in algorithmic and combinatorial aspects of theoretical computer science. I have been recently working on communication problems concerning synchronization errors.

## Publications

### Conference Papers

Optimally Resilient Codes for List-Decoding from Insertions and Deletions
with Venkatesan Guruswami and Bernhard Haeupler
Proceedings of the Annual Symposium on Theory of Computing (STOC), 2020.
[ arXiv ] [ AbstractHide Abstract ]

We give a complete answer to the following basic question: "What is the maximal fraction of deletions or insertions tolerable by q-ary list-decodable codes with non-vanishing information rate?"
This question has been open even for binary codes, including the restriction to the binary insertion-only setting, where the best known results was that a $$\gamma\leq=0.707$$ fraction of insertions is tolerable by some binary code family.
For any desired $$\varepsilon>0$$, we construct a family of binary codes of positive rate which can be efficiently list-decoded from any combination of $$gamma$$ fraction of insertions and $$\delta$$ fraction of deletions as long as $$\gamma+2\delta\leq 1-\varepsilon$$. On the other hand, for any $$\gamma, \delta$$ with $$\gamma+2\delta=1$$ list-decoding is impossible. Our result thus precisely characterizes the feasibility region of binary list-decodable codes for insertions and deletions.
We further generalize our result to codes over any finite alphabet of size $$q$$. Surprisingly, our work reveals that the feasibility region for $$q>2$$ is not the natural generalization of the binary bound above. We provide tight upper and lower bounds that precisely pin down the feasibility region, which turns out to have a $$(q-1$$)-piece-wise linear boundary whose $$q$$ corner-points lie on a quadratic curve.
The main technical work in our results is proving the existence of code families of sufficiently large size with good list-decoding properties for any combination of $$\delta, \gamma$$ within the claimed feasibility region. We achieve this via an intricate analysis of codes introduced by [Bukh, Ma; SIAM J. Discrete Math; 2014]. Finally we give a simple yet powerful concatenation scheme for list-decodable insertion-deletion codes which transforms any such (non-efficient) code family (with vanishing information rate) into an efficiently decodable code family with constant rate.

Near-Linear Time Insertion-Deletion Codes and (1+$$\varepsilon$$)-Approximating Edit Distance via Indexing
with Bernhard Haeupler and Aviad Rubinstein
Proceedings of the Annual Symposium on Theory of Computing (STOC), 2019.
[ arXiv ] [ AbstractHide Abstract ]

We introduce fast-decodable indexing schemes for edit distance which can be used to speed up edit distance computations to near-linear time if one of the strings is indexed by an indexing string $$I$$. In particular, for every length $$n$$ and every $$\varepsilon >0$$, one can in near linear time construct a string $$I \in \Sigma'^n$$ with $$|\Sigma'| = O_{\varepsilon}(1)$$, such that, indexing any string $$S \in \Sigma^n$$, symbol-by-symbol, with $$I$$ results in a string $$S' \in \Sigma''^n$$ where $$\Sigma'' = \Sigma \times \Sigma'$$ for which edit distance computations are easy, i.e., one can compute a $$(1+\varepsilon)$$-approximation of the edit distance between $$S'$$ and any other string in $$O(n \text{poly }(\log n))$$ time. Our indexing schemes can be used to improve the decoding complexity of state-of-the-art error correcting codes for insertion and deletions. In particular, they lead to near-linear time decoding algorithms for the insertion-deletion codes of [Haeupler, Shahrasbi; STOC '17] and list-decodable insertion-deletion codes of [Haeupler, Shahrasbi, Sudan; ICALP `18]. Interestingly, the latter codes are a crucial ingredient in the construction of fast-decodable indexing schemes.

Synchronization Strings: Efficient and Fast Deterministic Constructions over Small Alphabets
with Kuan Cheng, Bernhard Haeupler, Xin Li and Ke Wu
ACM-SIAM Symposium on Discrete Algorithms (SODA), 2019.
[ arXiv ] [ AbstractHide Abstract ]

Synchronization strings are recently introduced by Haeupler and Shahrasbi [STOC 2017] in the study of codes for correcting insertion and deletion errors (insdel codes). A synchronization string is an encoding of the indices of the symbols in a string, and together with an appropriate decoding algorithm it can transform insertion and deletion errors into standard symbol erasures and corruptions. This reduces the problem of constructing insdel codes to the problem of constructing standard error correcting codes, which is much better understood. Besides this, synchronization strings are also useful in other applications such as synchronization sequences and interactive coding schemes. For all such applications, synchronization strings are desired to be over alphabets that are as small as possible, since a larger alphabet size corresponds to more redundant information added.
Haeupler and Shahrasbi [STOC 2017] showed that for any parameter $$\varepsilon>0$$, synchronization strings of arbitrary length exist over an alphabet whose size depends only on $$\varepsilon$$. Specifically, Haeupler and Shahrasbi [STOC 2017] obtained an alphabet size of $$O(\varepsilon^{-4})$$, which left an open question on where the minimal size of such alphabets lies between $$\Omega(\varepsilon^{-1})$$ and $$O(\varepsilon^{-4})$$. In this work, we partially bridge this gap by providing an improved lower bound of $$\Omega\left(\varepsilon^{-3/2}\right)$$, and an improved upper bound of $$O\left(\varepsilon^{-2}\right)$$. We also provide fast explicit constructions of synchronization strings over small alphabets.
Further, along the lines of previous work on similar combinatorial objects, we study the extremal question of the smallest possible alphabet size over which synchronization strings can exist for some constant $$\varepsilon < 1$$. We show that one can construct $$\varepsilon$$-synchronization strings over alphabets of size four while no such string exists over binary alphabets. This reduces the extremal question to whether synchronization strings exist over ternary alphabets.

Synchronization Strings: Channel Simulations and Interactive Coding for Insertions and Deletions
with Bernhard Haeupler and Ellen Vitercik
International Colloquium on Automata, Languages, and Programming (ICALP), 2018.
[ arXiv ] [ AbstractHide Abstract ]

We present many new results related to reliable (interactive) communication over insertion-deletion channels. Synchronization errors, such as insertions and deletions, strictly generalize the usual symbol corruption errors and are much harder to protect against.
We show how to hide the complications of synchronization errors in many applications by introducing very general channel simulations which efficiently transform an insertion-deletion channel into a regular symbol corruption channel with an error rate larger by a constant factor and a slightly smaller alphabet. We generalize synchronization string based methods which were recently introduced as a tool to design essentially optimal error correcting codes for insertion-deletion channels. Our channel simulations depend on the fact that, at the cost of increasing the error rate by a constant factor, synchronization strings can be decoded in a streaming manner that preserves linearity of time. We also provide a lower bound showing that this constant factor cannot be improved to $$1+\epsilon$$, in contrast to what is achievable for error correcting codes. Our channel simulations drastically generalize the applicability of synchronization strings.
We provide new interactive coding schemes which simulate any interactive two-party protocol over an insertion-deletion channel. Our results improve over the interactive coding scheme of Braverman et al. which achieves a small constant rate and requires exponential time computations. We provide the first computationally efficient interactive coding schemes for synchronization errors, the first coding scheme with a rate approaching one for small noise rates, and also improve over the maximal tolerable error rate. We also show tight connections between synchronization strings and edit-distance tree codes which allow us to transfer results from tree codes directly to edit-distance tree codes.

Synchronization Strings: List Decoding for Insertions and Deletions
with Bernhard Haeupler and Madhu Sudan
International Colloquium on Automata, Languages, and Programming (ICALP), 2018.
[ arXiv ] [ AbstractHide Abstract ]

We study codes that are list-decodable under insertions and deletions ("insdel codes"). Specifically, we consider the setting where, given a codeword $$x$$ of length $$n$$ over some finite alphabet $$\Sigma$$ of size $$q$$, $$\delta\cdot n$$ codeword symbols may be adversarially deleted and $$\gamma\cdot n$$ symbols may be adversarially inserted to yield a corrupted word $$w$$. A code is said to be list-decodable if there is an (efficient) algorithm that, given $$w$$, reports a small list of codewords that include the original codeword $$x$$. Given $$\delta$$ and $$\gamma$$ we study what is the rate $$R$$ for which there exists a constant $$q$$ and list size $$L$$ such that there exist codes of rate $$R$$ correcting $$\delta$$-fraction insertions and $$\gamma$$-fraction deletions while reporting lists of size at most $$L$$.
Using the concept of synchronization strings, introduced by the first two authors [Proc. STOC 2017], we show some surprising results. We show that for every $$0\leq \delta < 1$$, every $$0 \leq \gamma < \infty$$ and every $$\epsilon > 0$$ there exists codes of rate $$1 - \delta - \epsilon$$ and constant alphabet (so $$q = O_{\delta,\gamma,\epsilon}(1)$$) and sub-polynomial list sizes. Furthermore our codes are accompanied by efficient (polynomial time) decoding algorithms. We stress that the fraction of insertions can be arbitrarily large (more than 100%), and the rate is independent of this parameter. We also prove several tight bounds on the parameters of list-decodable insdel codes. In particular we show that the alphabet size of insdel codes needs to be exponentially large in $$\epsilon^{-1}$$, where $$\epsilon$$ is the gap to capacity above. Our result even applies to settings where the unique-decoding capacity equals the list-decoding capacity and when it does so, it shows that the alphabet size needs to be exponentially large in the gap to capacity. This is sharp contrast to the Hamming error model where alphabet size polynomial in $$\epsilon^{-1}$$ suffices for unique decoding. This lower bound also shows that the exponential dependence on the alphabet size in previous works that constructed insdel codes is actually necessary!
Our result sheds light on the remarkable asymmetry between the impact of insertions and deletions from the point of view of error-correction: Whereas deletions cost in the rate of the code, insertion costs are borne by the adversary and not the code! Our results also highlight the dominance of the model of insertions and deletions over the Hamming model: A Hamming error is equal to one insertion and one deletion (at the same location). Thus the effect of $$\delta$$-fraction Hamming errors can be simulated by $$\delta$$-fraction of deletions and $$\delta$$-fraction of insertions — but insdel codes can deal with much more insertions without loss in rate (though at the price of higher alphabet size).

Synchronization Strings: Explicit Constructions, Local Decoding, and Applications
with Bernhard Haeupler
Proceedings of the Annual Symposium on Theory of Computing (STOC), 2018.
[ arXiv ] [ AbstractHide Abstract ]

This paper gives new results for synchronization strings, a powerful combinatorial object that allows to efficiently deal with insertions and deletions in various communication problems:

• We give a deterministic, linear time synchronization string construction, improving over an $$O(n^5)$$ time randomized construction. Independently of this work, a deterministic $$O(n \log^2 \log n)$$ time construction proposed by Cheng, Li, and Wu.
• We give a deterministic construction of an infinite synchronization string which outputs the first $$n$$ symbols in $$O(n)$$ time. Previously it was not known whether such a string was computable.
• Both synchronization string constructions are highly explicit, i.e., the $$i^{th}$$ symbol can be deterministically computed in $$O(\log i)$$ time.
• This paper also introduces a generalized notion we call long-distance synchronization strings. Such strings allow for local and very fast decoding. In particular, only $$O(\log^3 n)$$ time and access to logarithmically many symbols is required to decode any index.
The paper also provides several applications for these improved synchronization strings:
• For any $$\delta < 1$$ and $$\epsilon > 0$$ we provide an insdel error correcting block code with rate $$1 - \delta - \epsilon$$ which can correct any $$O(\delta)$$ fraction of insertion and deletion errors in $$O(n \log^3 n)$$ time. This near linear computational efficiency is surprising given that we do not even know how to compute the (edit) distance between the decoding input and output in sub-quadratic time.
• We show that local decodability implies that error correcting codes constructed with long-distance synchronization strings can not only efficiently recover from $$\delta$$ fraction of insdel errors but, similar to [Schulman, Zuckerman; TransInf'99], also from any $$O(\delta / \log n)$$ fraction of block transpositions and block replications. These block corruptions allow arbitrarily long substrings to be swapped or replicated anywhere.
• We show that highly explicitness and local decoding allow for infinite channel simulations with exponentially smaller memory and decoding time requirements. These simulations can then be used to give the first near linear time interactive coding scheme for insdel errors, similar to the result of [Brakerski, Naor; SODA'13] for Hamming errors.

Synchronization Strings: Codes for Insertions and Deletions Approaching the Singleton Bound
with Bernhard Haeupler
Proceedings of the Annual Symposium on Theory of Computing (STOC), 2017.
[ arXiv ] [ AbstractHide Abstract ]

We introduce synchronization strings as a novel way of efficiently dealing with synchronization errors, i.e., insertions and deletions. Synchronization errors are strictly more general and much harder to deal with than commonly considered half-errors, i.e., symbol corruptions and erasures. For every $$\epsilon >0$$, synchronization strings allow to index a sequence with an $$\epsilon^{-O(1)}$$ size alphabet such that one can efficiently transform $$k$$ synchronization errors into $$(1+\epsilon)k$$ half-errors. This powerful new technique has many applications. In this paper, we focus on designing insdel codes, i.e., error correcting block codes (ECCs) for insertion deletion channels.
While ECCs for both half-errors and synchronization errors have been intensely studied, the later has largely resisted progress. Indeed, it took until 1999 for the first insdel codes with constant rate, constant distance, and constant alphabet size to be constructed by Schulman and Zuckerman. Insdel codes for asymptotically large or small noise rates were given in 2016 by Guruswami et al. but these codes are still polynomially far from the optimal rate-distance tradeoff. This makes the understanding of insdel codes up to this work equivalent to what was known for regular ECCs after Forney introduced concatenated codes in his doctoral thesis 50 years ago.
A direct application of our synchronization strings based indexing method gives a simple black-box construction which transforms any ECC into an equally efficient insdel code with a slightly larger alphabet size. This instantly transfers much of the highly developed understanding for regular ECCs over large constant alphabets into the realm of insdel codes. Most notably, we obtain efficient insdel codes which get arbitrarily close to the optimal rate-distance tradeoff given by the Singleton bound for the complete noise spectrum.

Critical Graphs in Index Coding
with Mehrdad Tahmasbi and Amin Gohari
Proceedings of IEEE International Symposium on Information Theory (ISIT), 2014.
[ arXiv ] [ AbstractHide Abstract ]

In this paper we define critical graphs as minimal graphs that support a given set of rates for the index coding problem, and study them for both the one-shot and asymptotic setups. For the case of equal rates, we find the critical graph with minimum number of edges for both one-shot and asymptotic cases. For the general case of possibly distinct rates, we show that for one-shot and asymptotic linear index coding, as well as asymptotic non-linear index coding, each critical graph is a union of disjoint strongly connected subgraphs (USCS). On the other hand, we identify a non-USCS critical graph for a one-shot non-linear index coding problem. Next, we identify a few graph structures that are critical. We also generalize some of our results to the groupcast problem. In addition, we show that the capacity region of the index coding is additive for union of disjoint graphs.

### Journal Papers

Critical Graphs in Index Coding
with Mehrdad Tahmasbi and Amin Gohari
IEEE Journal on Selected Areas in Communications (JSAC), 2015.
[ arXiv ] [ AbstractHide Abstract ]

In this paper we define critical graphs as minimal graphs that support a given set of rates for the index coding problem, and study them for both the one-shot and asymptotic setups. For the case of equal rates, we find the critical graph with minimum number of edges for both one-shot and asymptotic cases. For the general case of possibly distinct rates, we show that for one-shot and asymptotic linear index coding, as well as asymptotic non-linear index coding, each critical graph is a union of disjoint strongly connected subgraphs (USCS). On the other hand, we identify a non-USCS critical graph for a one-shot non-linear index coding problem. Next, we identify a few graph structures that are critical. We also generalize some of our results to the groupcast problem. In addition, we show that the capacity region of the index coding is additive for union of disjoint graphs.

## Teaching Assistance

#### Carnegie Mellon University

Algorithms and Advanced Data Structures (15-350): F17

#### Sharif University of Technology

Design and Analysis of Algorithms (22-981/40-384): F13, S14
Introduction to Cryptography (22-813): F13
Introduction to Probability and Statistics (25-732): F12

## Contact

School of Computer Science
Carnegie Mellon University
Pittsburgh PA 15213-3891
Office: 9003 Gates Hillman Center
Email: LASTNAME [at] cs.cmu.edu