# CSD

Solving optimization problems on large distributed system have been studied in great depth owing to their usefulness in internet protocols. Problems such as Minimum Spanning Tree and Min-Cut in this model have a long history of continuously improving algorithms and lower bounds. A model of particular interest is the one when all the nodes communicate in synchronous rounds with small,O(logn) sized, messages. However, it appeared that their development reached a stalemate when a matching upper and lower bound of Θ(diameter+n) was ascertained. Nevertheless, real-world graphs do not exhibit the strong lower bound properties and can typically be solved more efficiently. I will introduce the Shortcut framework that can be used to tackle the question. The framework allows us to easily reason and design otherwise fairly complex distributed algorithms. As an important consequence, the framework was used to construct the first sub-n -round protocol that works on several important graph classes, like planar and treewidth-bounded graphs.

—

Goran Zuzic is a second-year PhD student in CSD at CMU. He is advised by Bernhard Haeupler.

Fairly dividing goods among several (possibly strategic) parties is a common problem both in economics and computer science. For instance, a government may wish to divide tracts of land to various organizations or a cloud computing environment may wish to divide computing time among several users. Often, such situations are dealt with via the use of monetary transfers — such as in auctioning o the desired goods; however we are concerned with the case where such transfers are not allowed.

In this proposal we expound upon our work in both the divisible setting as well as the indivisible. We start with an examination of our work and future problems to tackle in classical envy-free cake-cutting, as well as the often ignored game-theoretic aspects of this area. We then move onto the indivisible setting and present our results on the MMS guarantee, and two new properties we coin as PMMS and EFX. The improvement of guaranteeable approximations is the main open problem regarding these properties. We end with a discussion on our applications of fair division techniques and research to real world problems: classroom allocation among schools, and the peer review process.

**Thesis Committee:**

Ariel Procaccia (Chair)

Manuel Blum

Tuomas Sandholm

Ioannis Caragiannis (University of Patras)

Wearable cognitive assistance applications can provide guidance for many facets of a user's daily life. This thesis targets the enabling of a new genre of such applications that require both heavy computation and very low response time on inputs from mobile devices. The core contribution of this thesis is the design, implementation, and evaluation of Gabriel, an application framework that simplifies the creation of and experimentation with this new genre of applications. An essential capability of this framework is to use cloudlets for computation offloading to achieve low latency. By implementing several prototype applications on top of Gabriel, the thesis evaluates the system performance of Gabriel under various system conditions. It also shows how Gabriel is capable of exploiting coarse grain parallelism on cloudlets to improve system performance, and conserving energy on mobile devices based on user context.

**Thesis Committee: **

Mahadev Satyanarayanan (Chair)

Daniel P. Siewiorek

Martial Hebert

Padmanabhan Pillai (Intel)

Unsupervised learning is widely recognized as one of the most important challenges facing machine learning nowadays. However, unlike supervised learning, our current theoretical understanding of those tasks, and in particular of clustering, is very rudimentary. Although hundreds of clustering papers are being published every year, there is hardly any work reasoning about clustering independently of any particular algorithm, objective function, or generative data model. My talk will focus on such clustering research. I will discuss two aspects in which theory could play a significant role in guiding the use of clustering tools. The first is model selection - how should a user pick an appropriate clustering tool for a given clustering problem, and how should the parameters of such an algorithmic tool be tuned? In contrast with other common computational tasks, in clustering, different algorithms often yield drastically different outcomes. Therefore, the choice of a clustering algorithm may play a crucial role in the usefulness of an output clustering solution. However, there currently exist no methodical guidance for clustering tool selection for a given clustering task.

I will describe some recent proposals aiming to address this crucial lacuna. The second aspect of clustering that I will address is the complexity of computing a cost minimizing clustering (given some clustering objective function). Once a clustering model (or objective) has been picked, the task becomes an optimization problem. While most of the clustering objective optimization problems are computationally infeasible, they are being carried out routinely in practice. This theory-practice gap has attracted significant research attention recently. I will describe some of the theoretical attempts to address this gap and discuss how close they bring us to a satisfactory understanding of the computational resources needed for achieving good clustering solutions.

—

Shai Ben-David is a professor in the School of Computer Science at the University of Waterloo. His research interests span a wide spectrum of topics in the foundations of computer science and its applications, with a particular emphasis on statistical and computational machine learning.

*Sponsored by Yahoo! Labs and the Simons Foundation.*

For a lot of important machine learning problems, due to the rapid growth of data and the ever increasing model complexity, which often manifests itself in the large number of model parameters, no single machine can solve them fast enough. Therefore, distributed optimization and inference is becoming more and more inevitable for solving large scale machine learning problems in both academia and industry. Obtaining an efficient distributed implementation of an algorithm, however, is far from trivial. Both intensive computational workloads and the volume of data communication demand careful design of distributed computation systems and distributed machine learning algorithms.

In this thesis, we focus on the co-design of distributed computing systems and distributed optimization algorithms that are specialized for large machine learning problems. We propose two distributed computing frameworks: a parameter server framework which features efficient data communication, and MXNet, a multi-language library aiming to simplify the development of deep neural network algorithms. In less than two years, we have witnessed the wide adoption of the proposed systems. We believe that as we continue to develop these systems, they will enable more people to take advantage of the power of distributed computing to design efficient machine learning applications to solve large-scale computational problems. Leveraging the two computing platforms, we examine a number of distributed optimization problems in machine learning. We present new methods to accelerate the training process, such as data partitioning with better locality properties, communication friendly optimization methods, and more compact statistical models. We implement the new algorithms on the two systems and test on large scale real data sets. We successfully demonstrate that careful co-design of computing systems and learning algorithms can greatly accelerate large scale distributed machine learning.

**Thesis Committee: **

David G. Andersen (Co-Chair)

Alexander J. Smola (Co-Chair)

Barnabás Póczos

Ruslan Salakhutdinov

Jeffrey Dean (Google, Inc.)

When building datasets for supervised machine learning problems, data is often labeled manually by human annotators. In domains like medical imaging, acquiring labels can be prohibitively expensive. Both active learning and crowdsourcing have emerged as ways to frugally label datasets. In active learning, there has been recent interest in algorithms that exploit the data's structure to direct querying. When learning from crowds, one must balance the accuracy and cost of different teachers when gathering labels; weak teachers are assumed to be most accurate when labeling samples from label-homogeneous regions of space.

In this thesis, we explore how the data's structure can be leveraged for both of these techniques. The sequential probability ratio test (SPRT) provides the backbone for our contributions. Using the SPRT, we provide a cluster-based active learning algorithm to find a small, homogeneous partitioning of the data. We also use the SPRT to measure the confidence of a weak teacher's label by analyzing its estimates on neighboring labels. The optimality of the SPRT allows the algorithms to inherently minimize the average number of queries required before their termination.

**Thesis Committee:**

Christopher J. Landmead

Carl Kingsford

**Faculty Host: **Gary Miller

We present fully dynamic algorithms for graph sparsification problems. The algorithms allow both edge insertions and edge deletions. For cut sparsification, the algorithm takes poly-logarithmi

*Joint work with I. Abraham, D. Durfee, S. Krinninger, R. Peng*

**Faculty Host:** Gary Miller

Motivated by the signiﬁcantly higher cost of writing than reading in emerging memory technologies, we consider parallel algorithm design under such asymmetric read-write costs, with the goal of reducing the number of writes while preserving work-efﬁciency and low span. We present a nested-parallel model of computa-tion that combines (i) small per-task stack-allocated memories with symmetric read-write costs and (ii) an unbounded heap-allocated shared memory with asymmetric read-write costs, and show how the costs in the model map efﬁciently onto a more concrete machine model under a work-stealing scheduler. We use the new model to design reduced-write, work-efﬁcient, low-span parallel algorithms for a number of fundamental problems such as reduce, list con-traction, tree contraction, breadth-ﬁrst search, ordered ﬁlter, and planar convex hull. For the latter two problems, our algorithms are output-sensitive in that the work and number of writes decrease with the output size. We also present a reduced-write, low-span minimum spanning tree algorithm that is nearly work-efﬁcient (off by the inverse Ackermann function). Our algorithms reveal sever-al interesting techniques for signiﬁcantly reducing shared memory writes in parallel algorithms without asymptotically increasing the number of shared memory reads.

—

Charlie McGuffey is a 2nd year PhD student working with Phil Gibbons

In Massively Open Online Courses (MOOCs) TA resources are limited. In order to deal with this, MOOCs use peer assessments to grade assignments and then use TA resource to check over some of these assignments. We present a model that models the challenge involved in allocating the limited TA resource. We discuss what makes this problem difficult, and then we present an algorithm that computes an approximate solution.

*Presented in Partial Fulfillment of the CSD Speaking Skills Requirement. *