Due to the rapid growth of data and the ever increasing model complexity, which often manifests itself in the large number of model parameters, today, many important machine learning problems cannot be efficiently solved by a single machine. Distributed optimization  and inference is becoming more and more inevitable for solving large scale machine learning  problems in both academia and industry. However, obtaining an efficient distributed implementation of an algorithm, is far from trivial. Both intensive computational workloads and the volume of data communication demand careful design of distributed computation systems  and distributed machine learning algorithms. In this thesis, we focus on the co-design of distributed computing systems and distributed optimization algorithms that are specialized for large machine learning problems.<p>

In the first part, we propose two distributed computing frameworks: Parameter Server, a distributed machine learning framework that features efficient data communication between the machines; MXNet, a multi-language library that aims to simplify the development of deep neural network algorithms. We have witnessed the wide adoption of the two proposed systems in the past two years. They have  nabled and will continue to enable more people to harness the power of distributed  computing to design efficient large-scale machine learning applications.<p>

In the second part, we examine a number of distributed optimization problems in machine learning, leveraging the two computing platforms. We present new methods to accelerate the training process, such as data partitioning with better locality properties, communication friendly optimization methods, and more compact statistical models. We implement the new algorithms on the two systems and test on large scale real data sets. We successfully demonstrate that careful co-design of computing systems and learning algorithms can greatly accelerate large scale distributed machine learning.

Thesis Committee:
David G. Andersen (Co-Chair)
Alexander J. Smola (Co-Chair)
Ruslan Salakhutdinov
Barnabas Poczos
Jeffrey Dean (Google, Inc.)

We consider the following basic problem: given an n-variate degree-d homogeneous polynomial f with real coefficients, compute a unit vector x∈ℝn that maximizes ∣f(x)∣. Besides its fundamental nature, this problem arises in many diverse contexts ranging from tensor and operator norms to graph expansion to quantum information theory. The homogeneous degree 2 case is efficiently solvable as it corresponds to computing the spectral norm of an associated matrix, but the higher degree case is NP-hard. In this work, we give multiplicative approximation algorithms for this problem. Our algorithms leverage the tractability of the degree 2 case, and output the best solution among a carefully constructed set of quadratic polynomials. They offer a trade-off between the approximation ratio and running time, which is governed by the number of quadratic problems we search over. Specifically, in nO(q) time, we get an approximation factor of Od((n/q)d/2-1) for arbitrary polynomials, and Od((n/q)d/4-1/2) for polynomials with non-negative coefficients. The approximation guarantees are with respect to the optimum of the level-q SoS SDP relaxation of the problem. We also consider the case when f is random with independent ±1 coefficients, and prove that w.h.p the level-q SoS solution gives a certificate within factor O~d((n/q)d/4-1/2) of the optimum. We complement our algorithmic results with some polynomially large integrality gaps for d-levels of the SoS relaxation. To obtain our results, we develop general techniques which help analyze the approximation obtained by higher levels of the SoS hierarchy. We believe these techniques will also be useful in understanding polynomial optimization for other constrained settings.

Vijay Bhattiprolu is a third-year PhD student in CSD, advised by Venkatesan Guruswami and Gary Miller.

Joint work with Mrinalkanti Ghosh, Venkatesan Guruswami, Euiwoong Lee and Madhur Tulsiani.

Supported in part to the generous support of Yahoo! Labs and the Simons Foundation.

There are many scenarios where dividing control of a networked system into multiple independent control planes is natural and desirable. For example, inter-domain routing should not require a single centralized control plane, but rather utilize independent control planes, allowing each autonomous system to define policies for their own network. This division of control propelled Internet adoption through inter-domain protocols like BGP. However, control plane division often comes at a cost: as no single entity has full information of all resources and policies, correctness, performance, and fault tolerance may suffer. BGP, for example, has been shown to provide non-optimal routes as well as convergence issues during failures. This is further complicated as multiple independent control planes make multiple independent decisions which may have conflicting resource allocation.

Clearly, multiple independent control planes can work in harmony, as we’ve seen many examples in the wild and in research, such as multiple BGP instances, OSPF areas, and OSPF fibbing. However, such examples have ad hoc designs. To our knowledge there has been no comprehensive look into characterizing the types of divisions that define multiple control planes, as well as examples of how independent control planes communicate to achieve their overall goals despite these divisions. We argue that there are three fundamental ways to divide control planes: 1) spatially (are data plane resources shared, overlapping, or disjoint?), 2) temporally (do the control planes operate at different timescales?), and 3) administratively (can the control planes share policy information?). Using these three divisions as the axes of a design space, the amount of variety of across different problems and solutions quickly become apparent.

In this work we examine three points within the design space that are not covered by existing split control planes in practice or recent research, to examine how each axis affects cooperation. We look at 1) delivering live video over the wide-area in CDNs, 2) optimizing content delivery across multiple CDNs using a content broker, 3) and rearchitecting the datacenter network stack to take advantage of reconfigurable network topologies. Due to their differing points in the design space, we find a different mechanism aids in cooperation for each. Namely, prioritization and isolation, an ad exchange, and cross-layer optimization.

Thesis Committee:
Srini Seshan (Chair)
Peter Steenkiste
Vyas Sekar
Bruce Maggs (Duke University/Akamai Technologies Inc.)

Copy of Thesis Summary

Ordered sets (and maps when data is associated with each key) are one of the most important and useful data types. The set-set functions union, intersection and difference are particularly useful in certain applications. Brown and Tarjan first described an algorithm for these functions, based on 2-3 trees, that meet the optimal O(mlog(n/m + 1)) time bounds in the comparison model (n and m n are the input sizes). Later Adams showed very elegant algorithms for the functions, and others, based on weight-balanced trees. They only require a single function that is specific to the balancing scheme--a function that joins two balanced trees--and hence can be applied to other balancing schemes. Furthermore the algorithms are naturally parallel. However, in the twenty-four years since, no one has shown that the algorithms, sequential or parallel are asymptotically work optimal.

In this paper we show that Adams' algorithms are both work efficient and highly parallel (polylog span) across four different balancing schemes--AVL trees, red-black trees, weight balanced trees and O(mlog(n/m + 1)) treaps. To do this we use careful, but simple, algorithms for JOIN that maintain certain invariants, and our proof is (mostly) generic across the schemes. To understand how the algorithms perform in practice we have also implemented them (all code except JOIN is generic across the balancing schemes). Interestingly the implementations on all four balancing schemes and three set functions perform similarly in time and speedup (more than 45x on 64 cores). We also compare the performance of our implementation to other existing libraries and algorithms.

Joint work with Guy Blelloch and Daniel Ferizovic

Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.

This thesis research presents content delivery optimization techniques in the eXpressive Internet Architecture (XIA). First, we propose an intradomain CID routing protocol that achieves performance objectives of reducing request latency for ISP’s customers by routing content requests off-path br />to nearby caches. This is beneficial for ISP since it helps them attract more customer through offering content delivery with lower latency. The challenges for the CID routing design are that 1) scalability of the protocol to a large number of CIDs in the domain and 2) resistance to cache churn.

To address the first challenge, we propose 1) scoping the advertisements and 2) advertise the delta of locally cached CIDs. We evaluate the scalability of the protocol by comparing it against traditional link-state and distance vector routing protocol. The metrics we used for comparison are: 1) size and the number of advertisements messages, 2) efficiency of route computations, and 3) space consumptions. For churn resistance, we show that advertising content that would stay in cache longer during the event of high cache churn can reduce the number of failed requests in a domain. We compare this benefit against the performance objective of CID routing and show that advertising popular content can be a good strategy to reduce the number of invalid CID routes during high cache churn.

The second part of the thesis discusses an interdomain cache management scheme. We discuss a cache sharing model that allow regional peering ISPs to collaboratively share their cache space to reduce cost of interdomain data exchange, and the tradeoffs between performance, availability, and bandwidth consumption. Our results show that replicating the top few popular content locally in the domain achieves a good balance between these criteria for coordinating ISPs. Finally, we present a cache sharing algorithm that distributes the request load evenly among coordinating domains without incurring explicit message exchange. We evaluate this approach against the oracle that would require coordination overhead but approximate the optimal solution very closely. The result shows that the caching decision output by our approach differs little from the oracle in terms of the cache hit rate for each domain.

Thesis Committee: Srinivasan Seshan, Peter Steenkiste

Copy of Thesis Document

We believe that it is essential for robots that coexist with humans to be able to interact with their users in a seamless way.  This thesis advocates the use of language as a rich and natural interface for human-robot interaction.  Previous work on language-based human-robot interaction has extensively focused on enabling robots to understand spatial language in the context of users giving step-by-step directions to robots.  We assume a  mobile  service  robot,  like  our  CoBot,  is equipped  with  a  map  of  its environment and is able to autonomously navigate to a desired goal position.  This thesis will address the problem of user-to-robot interaction, but is going to assume users provide a high-level specification of the task (e.g.,  "Take  the package  to  the  small-size  lab  and  then  bring  me  tea") rather than step-by-step navigational instructions. 

Furthermore the thesis will focus on a novel robot-to-user interaction where the robot is able to adapt to different users, to answer user queries about its state (present, past or future), and to proactively take information-providing actions (i.e., reporting on the outcome of a task after finishing its execution).  Summing up, this thesis will contribute a novel language-based bidirectional interaction approach for mobile service robot:  from users to the robot and vice-versa.  We will evaluate the work in real and extensive real-data constructed simulated environments.

Thesis Committee:
Manuela Veloso (Chair)
Jaime Carbonell
Stephanie Rosenthal
Xiao-Ping Chen (University of Science and Technology of China)

Copy of Thesis Summary 

Recent progresses on a number of combinatorial and numerical problemsvbenefited from combining ideas and techniques from both fields to design faster and more powerful algorithms. A prime example is the field of spectral graph theory, which involves the interplay between combinatorial graph algorithms with numerical linear algebra, and this led to the first nearly linear time solvers for symmetric and diagonally dominant (SDD) linear systems.

In this thesis proposal we focus on a number of combinatorial building blocks of spectral graph theory as well as their applications in solving SDD linear systems. We give new (and often parallel) algorithms for low diameter decompositions, low stretch tree embeddings, graph spanners, and combinatorial sparsifiers. We propose to improve our techniques developed in solving the above problems to data sensitive metrics, in particular the nearest neighbor metric or equivalently the edge squared metric, when vertices are samples from some probability distribution. We
also propose to investigate spectral methods for partitioning probability distributions in Euclidean domains from random samples.

Thesis Committee:
Gary Miller (Chair)
Bernhard Haeupler
Daniel Sleator
Noel J. Walkington
Ioannis Koutis (University of Puerto Rico)

Copy of Proposal Document


Subscribe to CSD