# Theory

Consider a large distributed storage system that needs to store $k$ data items using $n$ servers. We study how to encode information so that a large number $t$ of read requests can be performed in parallel while the rate remains constant (and ideally approaches one). This problem is equivalent to the design of multiset Batch Codes introduced by Ishai, Kushilevitz, Ostrovsky and Sahai.

We give families of multiset batch codes with asymptotically optimal rates of the form $1-1/poly\left(t\right)$ and a number of servers $n$ scaling polynomially in the number of read requests $t$. An advantage of our batch code constructions over most previously known multiset batch codes is explicit and deterministic decoding algorithms and asymptotically optimal fault tolerance. Our constructions rely on dense bipartite graphs with no small cycles. We modify prior graph constructions of dense, high-girth graphs to obtain our batch code results. We achieve close to optimal tradeoffs between the parameters for bipartite graph based batch codes.

* This is joint work with Zhao Song, Alex Dimakis and Anna Gal. *

In this talk, I will present a new algorithm for solving linear programs. Given a linear program with n variables, m > n constraints, and bit complexity L, our algorithm runs in Õ(sqrt(n) L) iterations each consisting of solving Õ(1) linear systems and additional nearly linear time computation. Our method improves upon the convergence rate of previous state-of-the-art linear programming methods which required solving either Õ(sqrt(m)L) linear systems [R88] or consisted of Õ((mn)^(1/4)) steps of more expensive linear algebra [VA93].

Interestingly, our algorithm not only nearly matches the convergence rate of the universal barrier of Nesterov and Nemirovskii [NN94], but in the special case of the linear programming formulation of various flow problems our methods converge at a rate faster than that predicted by any self-concordant barrier. In particular, we achieve a running time of Õ(|E| sqrt(|V|) log^2 U) for solving the maximum flow problem on a directed graph with |E| edges, |V| vertices, and capacity ratio U, thereby improving upon the previous fastest running time for solving this problem when |E| > Ω(|V|^(1+epsilon)) for any constant epsilon.

**Faculty Host:** Gary Miller

We develop a new approach for online network design and obtain improved competitive ratios for several problems. Our approach gives natural deterministic algorithms and simple analyses. At the heart of our work is a novel application of embeddings into hierarchically well-separated trees (HSTs) to the analysis of online network design algorithms --- we charge the cost of the algorithm to the cost of the optimal solution on any HST embedding of the terminals. This analysis technique is widely applicable to many problems and gives a unified framework for online network design.

In a sense, our work brings together two of the main approaches to online network design. The first uses greedy-like algorithms and analyzes them using dual-fitting. The second uses tree embeddings and results in randomized *O(log n)*-competitive algorithms, where n is the total number of vertices in the graph. Our approach uses deterministic greedy-like algorithms but analyzes them via HST embeddings of the terminals. Our proofs are simpler as we do not need to carefully construct dual solutions and we get* O(log k)* competitive ratios, where *k* is the number of terminals.

Our approach yields deterministic *O(log k)*-competitive online algorithms for the following problems.

- Steiner network with edge duplication. Previously, only a randomized *O(log n)*-competitive algorithm was known.

- Rent-or-buy. Previously, only deterministic *O(log^2 k)*-competitive and randomized *O(log k)*-competitive algorithms by Awerbuch, Azar and Bartal (2004) were known.

- Connected facility location. Previously, only a randomized *O(log^2 k)*-competitive algorithm by San Felice, Williamson and Lee (2014) was known.

- Prize-collecting Steiner forest. We match the competitive ratio first achieved by Qian and Williamson (2011) and give a simpler analysis.

Our competitive ratios are optimal up to constant factors. These results will appear in SODA 2015 and an arxiv version can be found at http://arxiv.org/abs/1410.4240

About the Speaker

Given a k-SAT instance with the promise that there is an assignment satisfying at least t out of k literals in each clause, can one efficiently find a satisfying assignment (setting at least one literal to true in every clause)? The NP-hardness of 3-SAT implies that this problem is NP-hard when t <= k/3, and extensions of some 2-SAT algorithms give efficient solutions when t >= k/2. We prove that for t < k/2, the problem is NP-hard. Thus, satisfiability becomes hard when the promised density of true literals falls below 1/2. One might thus say that the transition from easy to hard in 2-SAT vs. 3-SAT takes place just after two and not just before three.

The talk will sketch most of the proof, which is based on the fact that the only functions passing a natural dictatorship test are "juntas" depending on few variables. We will also briefly mention the general principle based on the lack of certain "polymorphisms" that underlies hardness of constraint satisfaction.

A strengthening of the k-SAT result shows that given a (2t+1)-uniform hypergraph admitting a balanced 2-coloring (at most t+1 vertices of each color in every hyperedge), it is NP-hard to even find a 2-coloring that avoids a monochromatic edge. This shows extreme hardness of discrepancy minimization for systems of bounded-size sets. [Subsequent work with Euiwoong Lee in fact rules out coloring with any constant number of colors for the case of 2t-uniform hypergraphs with discrepancy 2.]

*This is joint work with Per Austrin and Johan Håstad.*

In the unique decoding problem, Alice wants to talk to Bob over a noisy channel, and Bob's job is to figure out (exactly) what Alice meant to say. List-decoding is a relaxation of this model, where Bob is allowed to return a short list of messages, with the guarantee that Alice's true message is in the list somewhere. In addition to helping Alice and Bob communicate, list-decoding has applications throughout complexity theory. In this talk, we'll investigate the list-decodability of some random ensembles of codes. The (very informal) punchline is that any "reasonable" distribution on codes with enough randomness is nearly optimally list-decodable with high probability. In the talk we'll make this punchline slightly more precise; as corollaries, we'll establish the list-decodability of random linear codes for high error rates, and the existence of Reed-Solomon codes that are nearly optimally list-decodable. Our arguments apply more generally to random modifications (folding, puncturing, etc) of "good enough" codes. * This is joint work with Atri Rudra.*

About the Speaker

A fundamental result of Karger [SODA94] shows that for any k-edge-connected graph with n nodes, sampling each edge independently with probability p=Omega(log n/k) results in a graph that has edge connectivity Omega(kp), w.h.p. We show novel techniques that generalize this result and prove that:

- For any k-edge-connected
*hypergraph*with n nodes, sampling each edge independently with probability p=Omega(log n/k) results in a hypergraph that has edge connectivity Omega(kp), w.h.p. - For any k-
*vertex*-connected graph, sampling each*edge*independently with probability p=Omega(log n/k) results in a graph that has*vertex connectivity*Omega(kp), w.h.p. - For any k-
*vertex*-connected graph, sampling each*vertex*independently with probability p=Omega(sqrt{log n/k}) results in a graph that has*vertex connectivity*Omega(kp^2), w.h.p.

These bounds are all optimal up to constant factors.

A fundamental result of Karger [SODA94] shows that for any k-edge-connected graph with n nodes, sampling each edge independently with probability p=Omega(log n/k) results in a graph that has edge connectivity Omega(kp), w.h.p. We show novel techniques that generalize this result and prove that:

- For any k-edge-connected
*hypergraph*with n nodes, sampling each edge independently with probability p=Omega(log n/k) results in a hypergraph that has edge connectivity Omega(kp), w.h.p. - For any k-
*vertex*-connected graph, sampling each*edge*independently with probability p=Omega(log n/k) results in a graph that has*vertex connectivity*Omega(kp), w.h.p. - For any k-
*vertex*-connected graph, sampling each*vertex*independently with probability p=Omega(sqrt{log n/k}) results in a graph that has*vertex connectivity*Omega(kp^2), w.h.p.

These bounds are all optimal up to constant factors.

In recent years, the area of property testing of probability distributions has produced a wide variety of interesting results. Here one is given sample access to an unknown distribution and asked whether it satisfies some property, e.g. is it the uniform distribution, does it have small entropy, and so forth. In this work, we study the natural quantum analogue of this problem, in which one is given many copies of an unknown mixed state (the quantum version of a probability distribution) and asked whether it satisfies a given property. We focus on properties which depend only on the mixed state's spectrum (hence the name Quantum Spectrum Testing).

Our paper gives several optimal algorithms for testing specific properties of mixed states, e.g. the property of being the maximally mixed state. This can be viewed as the quantum analogue of a result of Paninski. In addition, we show a nearly-tight lower bound for a certain algorithm of Keyl-Werner which learns an unknown mixed state's spectrum.

Our main tool is Schur-Weyl duality, which leads us to studying a particular algorithm called weak Schur sampling. This algorithm is based on the representation theory of the symmetric group, and to analyze it we use various tools in this area, including Kerov's algebra of observables and a limiting theorem of Biane.

No background in quantum computing or representation theory is assumed.

*This is joint work with Ryan O'Donnell. *

Motivated by a radically new peer review system currently under evaluation by the National Science Foundation, we study peer review systems in which proposals are reviewed by PIs who have submitted proposals themselves. An *(m,k)*-selection mechanism asks each PI to review m proposals, and uses these reviews to select (at most) k proposals. We are interested in impartial mechanisms, which guarantee that the ratings given by a PI to others' proposals do not affect the likelihood of the PI's own proposal being selected. We design two impartial mechanisms that – under different assumptions – select a *k*-subset of proposals that is essentially as highly rated as the one selected by the non-impartial (abstract version of) the NSF pilot mechanism, even when the latter mechanism has the "unfair" advantage of eliciting honest reviews.

*Joint work with David Kurokawa, Omer Lev, and Ariel Procaccia.*

We study the problem of learning halfspaces in the malicious noise model of Valiant. In this model, an adversary can corrupt an *η* fraction of both the label part and the feature part of an example. We design a polynomial-time algorithm for learning halfspaces in ℜ^{d} under the uniform distribution with near information-theoretic optimal noise tolerance. Our results also imply the first active learning algorithm for learning halfspaces that can handle malicious noise.

*Joint work with Nina Balcan and Phil Long. *