# CSD

A learning algorithm is agnostic if it doesn’t presume a perfect model of how input data produce output data. Such algorithms are diﬃcult to design, even for the basic task of classifying data as well as the best linear separator. This has led to a persistent rift between practice and theory: popular algorithms, such as SVMs and logistic regression, are susceptible to noise; provable agnostic algorithms involve brute-force sampling (which uses too much time) or ﬁtting polynomials (which uses too much data.)

We recently introduced a new classiﬁcation algorithm, KG, which is both practical and agnostic. I achieves promising experimental performance for both natural and artiﬁcial problems. We seek to deepen our theoretical understanding of the algorithm and expand its practical applications. The main question we shall answer is: when is KG provably fast? It eventually converges to the correct solution for a wide variety of input distributions. However, these intersect with a litany of hardness results, so restricting the input distribution seems necessary. Based on experimental evidence and the mechanics of the algorithm, we believe it is possible the algorithm runs in polynomial time when the inputs are normally distributed. If so, this algorithm would solve a notorious problem in computer science: learning logarithmically-sparse parities with noise. This would resolve a variety of challenges in learning theory, such as learning DNFs (encountered in 1984 by Valiant) and learning log-juntas (the subject of a prize oﬀered in 2003 by Blum). As exciting as this possibility seems, it does not contradict known hardness results, nor does it upset the consensus on related problems in cryptography or complexity theory. We propose to gain more experimental and theoretical evidence for this possibility.

In practice, many classiﬁcation tasks involve multiple classes. When the number of classes is large, we do not believe fast agnostic classiﬁcation is possible. We posit stronger lower bounds for classiﬁcation with a growing number of classes which depend on P != NP rather than weaker conjectures about refuting random constraint satisfaction problems.

**Thesis Committee: **

Geoff Gordon (Chair)

Avrim Blum

Nina Balcan

Varun Kanade (University of Oxford)

Raw bit errors are common in NAND flash memory and will increase in the future. These errors reduce flash reliability and limit the lifetime of a flash memory device. This proposal aims to improve flash reliability with a multitude of low-cost architectural techniques. Our thesis statement is: NAND flash memory reliability can be improved at low cost and with low performance overhead by deploying various architectural techniques that are aware of higher-level application behavior and underlying flash device characteristics.

Our proposed approach is to understand flash error characteristics and workload behavior through characterization, and to design smart flash controller algorithms that utilize this understanding to improve flash reliability. We propose to investigate four directions through this approach. (1) Our preliminary work proposes a new technique that improves flash reliability by 12.9 times by managing flash retention differently for write-hot data and write-cold data. (2) We propose to characterize and model flash errors on new flash chips. (3) We propose to develop a technique to construct a flash error model online and improve flash lifetime by exploiting our online model. (4) We propose to understand and develop new techniques that utilize flash self-healing effect. We hope that these four directions will allow us to achieve higher flash reliability at low cost.

**Thesis Committee: **

Onur Mutlu (Chair)

Phillip B. Gibbons

James C. Hoe

Yu Cai, SK Hynix

Erich F. Haratsch (Seagate Technology)

We show dynamic algorithms for graph sparsification problems that take polylogarithmic time per each insertion / deletion in the graph. Our three main results are as follows.

First, we give a fully dynamic algorithm for maintaining an (1 \pm \epsilon)-spectral sparsifier with amortized update time poly(\log{n},\epsilon^{-1}).<br>

Second, we give a fully dynamic algorithm for maintaining an (1 \pm \epsilon)-cut sparsifier with worst-case update time poly(\log{n},\epsilon^{-1}).<br>

Third, we apply our dynamic sparsifier algorithm to obtain a fully dynamic algorithm for maintaining an (1 + \epsilon)-approximate minimum cut in an unweighted, undirected, bipartite graph with amortized update time poly(\log{n},\epsilon^{-1}).

*Joint work with Ittai Abraham, David Durfee, Ioannis Koutis, and Sebastian Krinninger.*

Copy of Manuscript

—

Richard Peng is an assistant professor in the School of Computer Science at the Georgia Institute of Technology. His research interests are in the design, analysis, and implementation of efficient algorithms. He is involved in the Algorithms and Randomness Center and the Algorithms, Combinatorics, and Optimization program.

Prior to coming to Georgia Tech, he received his PhD in Computer Science at CMU, and was an Instructor in Applied Mathematics at MIT for two years. His thesis, Algorithm Design Using Spectral Graph Theory, won the 2012/2013 CMU SCS Dissertation Award.

The k-center problem is a canonical and long-studied facility location and clustering problem with many applications in both its symmetric and asymmetric forms. Both versions of the problem have tight approximation factors on worst case instances: a 2-approximation for symmetric k-center and an O(log*k)-approximation for the asymmetric version.

In this work, we go beyond the worst case and provide strong positive results both for the asymmetric and symmetric k-center problems under a very natural input stability (promise) condition called Î±-perturbation resilience (Bilu & Linial 2012) , which states that the optimal solution does not change under any alpha-factor perturbation to the input distances. We show that by assuming 2-perturbation resilience, the exact solution for the asymmetric k-center problem can be found in polynomial time. To our knowledge, this is the first problem that is hard to approximate to any constant factor in the worst case, yet can be optimally solved in polynomial time under perturbation resilience for a constant value of alpha. Furthermore, we prove our result is tight by showing symmetric k-center under (2âˆ’epsilon)-perturbation resilience is hard unless NP=RP. This is the first tight result for any problem under perturbation resilience, i.e., this is the first time the exact value of alpha for which the problem switches from being NP-hard to efficiently computable has been found.

Our results illustrate a surprising relationship between symmetric and asymmetric k-center instances under perturbation resilience. Unlike approximation ratio, for which symmetric k-center is easily solved to a factor of 2 but asymmetric k-center cannot be approximated to any constant factor, both symmetric and asymmetric k-center can be solved optimally under resilience to 2-perturbations.

*Joint work with Nina Balcan and Nika Haghtalab. *

*Presented in Partial Fulfillment of the CSD Speaking Skills Requirement. *

A wide range of machine learning problems, including astronomical inference about galaxy clusters, natural image scene classification, parametric statistical inference, and predictions of public opinion, can be successfully tackled by learning a function on (samples from) distributions. One of the most effective class of techniques for doing so is kernel methods. Applying these methods to large datasets, though, can be computationally challenging: typical methods take between quadratic and cubic time and storage in the number of input samples.

In this talk, we present methods for approximate embeddings into Euclidean spaces that approximate various classes of kernels between distributions, which allow for learning with time and space linear in the number of inputs. We first present an improved analysis of the workhorse tool in this area, random Fourier features la Rahimi and Recht: we show that of the two variants of this approach in common usage, one is strictly better. Then, after showing how to use these methods for learning on distributions via the maximum mean discrepancy, we give a new class of distribution embeddings that allows machine learning on large datasets using the total variation, Jensen-Shannon, and Hellinger distances, expanding the set of modeling options available in such problems. Theoretical and empirical results will be given throughout.

*Based on joint work with Junier Oliva, Barnabas Poczos, and Jeff Schneider.*

*Presented in partial Fulfillment of the CSD Speaking Skills Requirement.*

Homotopy type theory is a new field of mathematics based on the recently-discovered correspondence between Martin-Löf’s constructive type theory and abstract homotopy theory. We have a powerful interplay between these disciplines - we can use geometric intuition to formulate new concepts in type theory and, conversely, use type-theoretic machinery to verify and often simplify existing mathematical proofs. Higher inductive types form a crucial part of this new system since they allow us to represent mathematical objects from a variety of domains, such as spheres, tori, pushouts, and quotients, in the type theory.

In this thesis we formulated and investigated a class of higher inductive types we call W-quotients which generalize Martin-Löf’s well-founded trees to a higher-dimensional setting. We have shown that a propositional variant of W-quotients, whose computational behavior is determined up to a higher path, is characterized by the universal property of being a homotopy-initial algebra. As a corollary we derived that W-quotients in the strict form are homotopy-initial. A number of other interesting higher inductive types (such as spheres, suspensions, and type quotients) arise as special cases of W-quotients, and the characterization of these as homotopy-initial algebras thus follows as a corollary to our main result.

We have investigated further higher inductive types - arbitrary truncations, set quotients, and groupoid quotients - and established analogous characterizations in terms of the universal property of homotopy-initiality. Furthermore, we showed that set and groupoid quotients can be recovered from W-quotients and truncations."

**Thesis Committee: **

Frank Pfenning (Co-Chair)

Steve Awodey (Co-Chair)

Robert Harper

Thierry Coquand (University of Gothenburg)

Nicola Gambino (University of Leeds)

We consider the task of visual zero-shot learning, in which a system must learn to recognize concepts omitted from the training set. While most prior work makes use of linguistic cues to do this, we do so by using a *pictorial language* representation of the training set, implicitly learned by a CNN, to generalize to new classes. We first demonstrate the robustness of pictorial language classifiers (PLCs) by applying them in a weakly supervised manager: labeling unlabeled concepts for visual classes present in the training data. Specifically, we show that a PLC built on top of a CNN trained for ImageNet classification can localize humans in Graz-02 and determine the pose of birds in PASCAL-VOC without extra labeled data or additional training. We then apply PLCs in a truly zero-shot manner, demonstrating that pictorial languages are expressive enough to detect a set of visual classes in MSCOCO that never appear in the ImageNet training set.

**Thesis Committee:**

Deva Ramanan

Kayvon Fatahalian

Although compression has been widely used for decades to reduce file sizes (thereby conserving storage capacity and network bandwidth when transferring files), there has been little to no use of compression within modern memory hierarchies. Why not? Especially as programs become increasingly data-intensive, the capacity and bandwidth within the memory hierarchy (including caches, main memory, and their associated interconnects) are becoming increasingly important bottlenecks. If data compression could be applied successfully to the memory hierarchy, it could potentially relieve pressure on these bottlenecks by increasing effective capacity, increasing effective bandwidth, and even reducing energy consumption.

In this thesis, I describe a new, practical approach to integrating data compression within the memory hierarchy, including on-chip caches, main memory, and both on-chip and off-chip interconnects. This new approach is fast, simple, and effective in saving storage space. A key insight in our approach is that access time (including decompression latency) is critical in modern memory hierarchies. By combining inexpensive hardware support with modest OS support, our holistic approach to compression achieves substantial improvements in performance and energy efficiency across the memory hierarchy. In addition to exploring compression-related issues and enabling practical solutions in modern CPU systems, we discover new problems in realizing hardware-based compression for GPU-based systems and develop new solutions to solve these problems.

**Thesis Committee: **

Todd C. Mowry (Co-Chair)

Onur Mutlu (Co-Chair)

Kayvon Fatahalian

David A. Wood (University of Wisconsin-Madison)

Douglas C. Burger (Microsoft)

Michael A. Kozuch (Intel)

How can computers help ordinary people make collective decisions about real-life dilemmas, like which restaurant to go to with friends, or even how to divide an inheritance? I will present an optimization-driven approach that draws on ideas from AI, theoretical computer science, and economic theory, and illustrate it through my research in computational social choice and computational fair division. In both areas, I will make a special effort to demonstrate how fundamental theoretical questions underlie the design and implementation of deployed services that are already used by tens of thousands of people (spliddit.org), as well as upcoming services (robovote.org).

**Thesis Committee: **

Ariel D. Procaccia (Chair)

Nina Balcan

Avrim Blum

Tuomas Sandholm

Vincent Conitzer (Duke University)

Error-correcting codes were originally developed in the context of reliable delivery of data over a noisy communication channel and continue to be widely used in communication and storage systems. Over time, error-correcting codes have also been shown to have several exciting connections to areas in theoretical computer science. Recently, there have been several advances including new constructions of efficient codes as well as coding for different settings. This thesis proposal explores

several new directions in modern coding theory. To this end, we:

- Provide a theoretical analysis of polar codes, which were a breakthrough made by Arikan in 2008. We show that polar codes over prime alphabets are the first explicit construction of efficient codes to provably achieve Shannon capacity for symmetric channels with polynomially fast convergence. We introduce interesting new techniques involving entropy sumset inequalities, which are an entropic analogue of sumset inequalities studied in additive combinatorics.
- Consider the recent problem of coding for two-party interactive communication, in which two parties wish to execute a protocol over a noisy interactive channel. Specifically, we provide an explicit interactive coding scheme for oblivious adversarial errors and bridge the gap between channel capacities for interactive communication and one-way communication. We also consider a related question involving one-way communication with partial feedback.
- Study the problem of list decodability for codes. We resolve an open question about the list decodability of random linear codes and show surprising connections to the field of compressed sensing, in which we provide improved bounds on the number of frequency samples needed for exact reconstruction of sparse signals (improving upon work of Candès-Tao and Rudelson-Vershynin).
- Study locally-testable codes and affine invariance in codes. Specifically, we investigate the limitations posed by local testability, which has served as an important notion in the study of probabilistically checkable proofs (PCPs) and hardness of approximation.

**Thesis Committee:**

Venkatesan Guruswami (Chair)

Bernhard Haeupler

Gary Miller

Emmanuel Abbe (Princeton University)