# SSS

This talk will describe MXNet that is a new deep learning framework developed by collaborators from over 10 institutes. It is designed for both flexiblity and optimized performance, with easy to use interfaces in 7 programming languages including Python, Scala and R. We will discuss the technologies to scale out the framework to distributed clouds, in which we can achieve over 110x speedup by using 128 GPUs without slowing down the convergence rate, and also memory optimizations to fit into embedded *systems like mobile phones.*

*In Partial Fulfillment of the Speaking Requirement*

We study the amortized number of combinatorial changes (edge insertions and removals) needed to update the graph structure of the Voronoi diagram *VD(S)* (and several variants thereof) of a set $S$ of *n* sites in the plane as sites are added to the set. To that effect, we define a general update operation for planar graphs that can be used to model the incremental construction of several variants of Voronoi diagrams as well as the incremental construction of an intersection of halfspaces in 3 Dimensional space. We show that the amortized number of edge insertions and removals needed to add a new site to the Voronoi diagram is *O(\sqrt{n})*. A matching \*Omega(\sqrt{n})* combinatorial lower bound is shown, even in the case where the graph representing the Voronoi diagram is a tree. This contrasts with the* O(\log{n})* upper bound of Aronov et al.~(2006) for farthest-point Voronoi diagrams in the special case where the points are inserted in clockwise order along their convex hull.

We then present a semi-dynamic data structure that maintains the Voronoi diagram of a set *S* of *n* sites in convex position. This data structure supports the insertion of a new site *p* (and hence the addition of its Voronoi cell) and finds the asymptotically minimal number *K* of edge insertions and removals needed to obtain the diagram of* S \cup \{p\}* from the diagram of *S*, in time *O(K\,\mathrm{polylog}\ n) *worst case, which is *O(\sqrt{n}\;\mathrm{polylog}\ n)* amortized by the aforementioned combinatorial result.

The most distinctive feature of this data structure is that the graph of the Voronoi diagram is maintained explicitly at all times and can be retrieved and traversed in the natural way; this contrasts with other structures supporting nearest neighbor queries in that they do not maintain the Voronoi diagram after each insertion. This structure supports general search operations on the current Voronoi diagram, which can, for example, be used to perform point location queries in the cells of the current Voronoi diagram in *O(\log n)* time, or to determine whether two given sites are neighbors in the Delaunay triangulation.

*Presented in Partial Fulfillment of the CSD Speaking Skills Requirement. *

In many real-world situations a decision maker may make decisions across many separate reinforcement learning tasks in parallel, yet there has been very little work on concurrent RL. Building on the efficient exploration RL literature, we introduce two new concurrent RL algorithms and bound their sample complexity. We show that under some mild conditions, both when the agent is known to be acting in many copies of the same MDP, and when they are not the same but are taken from a finite set, we can gain linear improvements in the sample complexity over not sharing information. This is quite exciting as a linear speedup is the most one might hope to gain. Our preliminary experiments confirm this result and show empirical benefits.

*Joint work with Emma Brunskill.*

*Presented in Partial Fulfillment of the CSD Speaking Skills Requirement. *

Linear classifiers are convenient to train, but cannot cope with noisy or complex data. Nonlinear kernel classifiers are powerful, but are prone to overfitting and are cumbersome to train. A new kind of classifier addresses the limitations of both of its forebears, as evidenced by its surprising statistical and algebraic properties. Our classifier underlies a new learning algorithm of notable practical and theoretical interest.

*Joint work with Geoff Gordon. *

*Presented in Partial Fulfillment of the CSD Speaking Skills Requirement. *

Machine learning (ML) methods are used to analyze data which are collected from various sources. As the problem size grows, cluster computing technology has been widely adopted for solving ML problems. There are two driving factors behind this trend: Big-data: computation and storage capacity of a single machine becomes not-enough to process data; Big-model: the number of ML parameters to learn becomes large to the extent that a single machine can not finish learning in a reasonable amount of time. In this talk, we focus on big model problem. A natural solution is to turn to parallelizing parameter updates in a cluster. However, naive parallelization of ML algorithms often hurts the effectiveness of parameter updates due to the dependency structure among model parameters and a subset of model parameters are often bottlenecks to the completion of ML algorithms due to the uneven convergence rate.

In this talk, we will present Scheduled Model-Parallel approach for addressing these challenges of parallelizing big model problems efficiently, and a distributed framework called STRADS that facilitates development and deployment of scheduled model-parallel ML applications in distributed systems. I will first talk about scheduled model-parallel approach with two specific scheduling schemes: 1) model parameter dependency checking to avoid updating conflicting parameters concurrently; 2) parameter prioritization to give more update chances to the parameters far from its convergence point. To realize scheduled model-parallel in a distributed system, we implement a prototype framework called STRADS. STRADS improves updates executed per second by pipelining iterations and overlapping update computation with network communication for parameter synchronization. With scheduled model-parallel and STRADS, we improve convergence per update and improved updates per second. As a result, we substantially improves convergence per second and achieve faster ML execution time. For benchmark, we implement various ML algorithms such as MF, LDA, Lasso, Logistic Regression in the form of scheduled model-parallel on top of STRADS.

*Presented in Partial Fulfillment of the CSD Speaking Skills Requirement. *

As robots move out of labs and into the real world, it is critical that humans will be able to specify task requirements in an intuitive and flexible way. In this talk, we will see how to enable untrained people to instruct robots via dialog.

In the first part of the talk I am going to present a joint probabilistic model over speech, the resulting semantic parse and the mapping from each element of the parse to a physical entity in the building. This model allows the robot to understand spoken commands while being flexible to the ways that untrained people interact with it, being robust to speech to text errors, and learning referring expressions for physical locations in a map.

The second part of the talk focuses on complex commands, a natural language sentence consisting of sensing-based conditionals, conjunctions, and disjunctions. In this second part I am going to present a template based algorithm able to recover the correct structure of a complex command, two dialogue models that enable the user to confirm or correct the extracted command structure and how the structure extracted can be used for planning and execution by the service robot.

* Joint work with Manuela Veloso. In partial fulfillment of the speaking requirement.*

The study of linguistic universals, the proliferation of treebanks in different languages, and the move toward unified linguistic representations together suggest that annotated data from many languages might be used together to build linguistic analyzers capable of processing input in any language. We present a single dependency parsing model learned from a multilingual collection of treebanks. Sharing of statistical strength across languages requires lexical embeddings in a shared vector space, for which we present a new estimation method. Our model also makes use of typological side information to allow generalization to new, treebank-free languages whose words can be embedded in the shared space.

*Joint work with George Mulcaire, Miguel Ballesteros, Chris Dyer and Noah A. Smith*

Many big data applications are performed on data sets that are constantly changing, such as calculating the PageRank of every page on the internet. Rather than rerunning the entire calculation repeatedly to keep the results up to date, which can be expensive, incremental computation allows you to update the output of previous calculations with minimal effort. ThomasDB is a distributed, fault tolerant system designed to make it is easy to make many suitable big data calculations incremental. ThomasDB accomplishes this through a technique known as self-adjusting computation, which tracks relationships between data and code in a structure called a dynamic dependency graph, allowing ThomasDB to identify which portions of the execution are affected by changes to the data set and to only re-execute those parts.

*Joint work with Andy Pavlo and Umut Acar.*

*In partial fulfillment of the speaking requirement.*

The halting problem is undecidable, in general. But do there exist simple loop programs where termination is actually decidable? In this talk, I would define a class of programs called linear loop programs and will talk about the termination of such programs. I will motivate the use of petri nets, a special class of transition systems which can perhaps solve the problem. At the very least, I will establish some interesting connections between transition systems and linear algebra.

*This is joint work with Prof. Supratik Chakraborty (IIT Bombay), Prof. Akshay S (IIT Bombay) and Pallerla Sai Sandeep Reddy (IIT Bombay). Since this is an ongoing work, I would really love to hear about ideas, extensions or more insight into the problem. *

In this talk, I will introduce how to incorporate the external word correlation knowledge to improve the coherence of topic modeling. Existing topic models assume words are generated independently and lack the mechanism to utilize the rich similarity relationships among words to learn coherent topics. To solve this problem, we build a Markov Random Field (MRF) regularized Latent Dirichlet Allocation (LDA) model, which defines a MRF on the latent topic layer of LDA to encourage words labeled as similar to share the same topic label. Under our model, the topic assignment of each word is not independent, but rather affected by the topic labels of its correlated words. Similar words have better chance to be put into the same topic due to the regularization of MRF, hence the coherence of topics can be boosted. In addition, our model can accommodate the subtlety that whether two words are similar depends on which topic they appear in, which allows word with multiple senses to be put into different topics properly. We derive a variational inference method to infer the posterior probabilities and learn model parameters and present techniques to deal with the hardto-compute partition function in MRF.

*In Partial Fulfillment of the Speaking Requirement*