# CSD

Human experts as autonomous agents in a referral network must decide whether to accept a task or refer to a more appropriate expert, and if so to whom. In order for the referral network to improve over time, the experts must learn to estimate the topical expertise of other experts. This thesis extends concepts from Multi-agent Reinforcement Learning and Active Learning to referral networks.

Among a wide array of algorithms evaluated, Distributed Interval Estimation Learning (DIEL), based on Interval Estimation Learning, was found to be promising for learning appropriate referral choices, compared to Greedy, Q-learning, and UCB methods. DIEL's rapid performance gain in the early phase of learning makes it a practically viable algorithm, including when multiple referral hops are allowed. In addition to a synthetic data set, we compare the performance of several top-performing referral algorithms on a referral network of high-performance Stochastic Local Search (SLS) SAT solvers and demonstrate that the referral learning algorithms can learn appropriate referral choices in the real task of solving satisfiability problems where expertise do not obey any known parameterized distribution. Apart from evaluating overall network performance, we conduct a robustness analysis across the learning algorithms, with an emphasis on capacity constraints, evolving networks (i.e. with known experts dropping off and new experts of unknown performance entering), and expertise drift --- situations that often arise in real-world scenarios but largely ignored in the Active Learning literature.

In an augmented learning setting, where experts may report their top skills to their colleagues, we proposed three algorithms, proactive-DIEL, proactive-Q-Learning, and proactive-ε-Greedy. All algorithms exhibited robustness to noisy self-skill estimates and were resilient to evolving networks and strategic misreporting.

**Thesis Committee:**

Jaime Carbonell (Chair)

Manuel Blum

Manuela Veloso

Victor Lesser (University of Massachusetts, Amherst)

Copy of Thesis Summary

Planning is a problem in Computer Science with many applications. It can be used to automate robotic tasks or beat players in games, for example. However, it has been very difficult coming up with programs that can approximately plan well. Here, we tackle the problem using a different route. We want to see whether we can improve our ability to approximately plan by using quantum mechanics to model classical systems.

The idea of planning using quantum systems may have some merit. Quantum systems make different assumptions than classical systems so to plan in a quantum system, we have to make a new model. By making a new model and planning on it, we increase the scope of problems we can plan on. Also, there is hope that planning in new quantum models might be simpler than in the classical models, especially when we use a compact basis to plan approximately.

We formulate a new quantum model for planning, give some planning algorithms, and show some experimental results, assuming only elementary knowledge in linear algebra. To formulate a new quantum model and planning, we review discrete quantum mechanics to get an understanding about how quantum systems work. Then, we derive the QuaMDP planning model and show one way to plan in it. We also show one way to generate QuaMDP models from a dynamical system, given its potential energy function. Then, we test how well our algorithms can approximately plan on this new model to find inherent weaknesses and strengths.

**Thesis Committee:**

Geoff Gordon

Gary Miller

Programmers, like pen-and-paper mathematicians, often seek to define new syntactic sugar that lowers the syntactic cost of common idioms. Unfortunately, the most direct mechanisms of syntactic control available to programmers do not support modular reasoning about determinism (i.e. separately defined forms can conflict syntactically with one another), and they obscure the type and binding structure of the program. We argue therefore that these mechanisms are unreasonable for programming "in the large". Typed term-rewriting macro systems (e.g. as available in Scala) are more reasonable, but afford programmers limited syntactic control.

This thesis formally describes typed syntax macros (TSMs), which give library providers programmatic control over both the parsing and expansion of expressions and patterns of generalized literal form at a specified type or parameterized family of types. Expansion is strictly hygienic, meaning that 1) expansions can refer to external bindings only via spliced terms or supplied module parameters; and 2) expansions do not reveal internal bindings to spliced terms or to the remainder of the program. We argue that this mechanism occupies a "sweet spot" in the design space, in that it captures many common syntactic idioms while avoiding the problem of syntactic conflicts by construction and supplying clients with clear abstract reasoning principles.

**Thesis Committee: **

Jonathan Aldrich (Chair)

Robert Harper

Karl Crary

Eric Van Wyk (University of Minnesota)

I will explain the cheat sheet technique in query complexity, and show how to use it to get a power 2.5 separation between classical and quantum computation (beating the quadratic speedup of Grover search). I'll then discuss the connections between query and communication complexity, and present recent applications of the cheat sheet technique in the communication complexity setting.

—

Shalev Ben-David is a PhD student in his last year at MIT.

*Supported in part by the Simons Foundation*

The spectra of signed matrices have played a fundamental role in social sciences, graph theory and control theory. They have been key to understanding balance in social networks, to counting perfect matchings in bipartite graphs, and to analyzing robust stability of dynamic systems involving uncertainties. More recently, the results of Marcus et al. have shown that an efficient algorithm to find a signing of a given adjacency matrix that minimizes the largest eigenvalue could immediately lead to efficient construction of Ramanujan expanders.

Motivated by these applications, this talk investigates natural spectral properties of signed matrices and address the computational problems of identifying signings with these spectral properties. There are three main results we will talk about: (a) NP-completeness of three signing related problems with (negative) implications to efficiently constructing expander graphs, (b) a complete characterization of graphs that have all their signed adjacency matrices be singular, which implies a polynomial-time algorithm to verify whether a given matrix has a signing that is invertible, and (c) a polynomial-time algorithm to find a minimum increase in support of a given symmetric matrix so that it has an invertible signing.

—

About the Speaker

Mechanized reasoning has proved effective in avoiding serious mistakes in software and hardware, and yet remains unpopular in the practice of mathematics. My thesis is aimed at making mechanization easier so that more mathematicians can benefit from this technology. Particularly, I experimented with higher-dimensional types, an extension of ordinary types with a hierarchy of stacked relations, and managed to mechanize many important results from classical homotopy theory in the proof assistant Agda. My work thus suggests higher-dimensional types may help mechanize mathematical concepts.

**Thesis Committee:**

Robert Harper (Chair)

Frank Pfenning

Jeremy Avigad

Andrej Bauer (Univerza v Ljubljani)

Ulrik Buchholtz (Technische Universität Darmstadt)

Daniel R. Licata (Wesleyan University)

In this talk, we consider the problem of estimating the mean and covariance of a distribution from iid samples in *R^n*, in the presence of an *eta* fraction of malicious noise; this is in contrast to much recent work where the noise itself is assumed to be from a distribution of known type. We will give polynomial-time algorithms to compute estimates for the mean, covariance and operator norm of the covariance matrix, and show that the dimensional dependence of the error is optimal up to a *O(\sqrt{\log n})* factor. This gives polynomial-time solutions to some of the questions studied in robust statistics. As one of the applications, this immediately enables one to do agnostic SVD.

*This is a joint work with Kevin Lai and Santosh Vempala.*

Due to the rapid growth of data and the ever increasing model complexity, which often manifests itself in the large number of model parameters, today, many important machine learning problems cannot be efficiently solved by a single machine. Distributed optimization and inference is becoming more and more inevitable for solving large scale machine learning problems in both academia and industry. However, obtaining an efficient distributed implementation of an algorithm, is far from trivial. Both intensive computational workloads and the volume of data communication demand careful design of distributed computation systems and distributed machine learning algorithms. In this thesis, we focus on the co-design of distributed computing systems and distributed optimization algorithms that are specialized for large machine learning problems.<p>

In the first part, we propose two distributed computing frameworks: Parameter Server, a distributed machine learning framework that features efficient data communication between the machines; MXNet, a multi-language library that aims to simplify the development of deep neural network algorithms. We have witnessed the wide adoption of the two proposed systems in the past two years. They have nabled and will continue to enable more people to harness the power of distributed computing to design efficient large-scale machine learning applications.<p>

In the second part, we examine a number of distributed optimization problems in machine learning, leveraging the two computing platforms. We present new methods to accelerate the training process, such as data partitioning with better locality properties, communication friendly optimization methods, and more compact statistical models. We implement the new algorithms on the two systems and test on large scale real data sets. We successfully demonstrate that careful co-design of computing systems and learning algorithms can greatly accelerate large scale distributed machine learning.

**Thesis Committee:**

David G. Andersen (Co-Chair)

Alexander J. Smola (Co-Chair)

Ruslan Salakhutdinov

Barnabas Poczos

Jeffrey Dean (Google, Inc.)