go back to the main page

SSS Abstracts
Fall 2015

go back to the main page

Sorting with Asymmetric Read and Write Costs

Friday, September 11^th, 2015 from 12-1 pm in GHC 6501.

Presented by Yan Gu, CSD

Emerging memory technologies have a significant gap between the cost, both in time and in energy, of writing to memory versus reading from memory. In this talk, we present models and algorithms that account for this difference, with a focus on two models and associated sorting algorithms. First, we consider the RAM model with asymmetric write cost, and show that sorting can be performed in O(n) writes, O(n log n) reads. Next, we consider a variant of the External Memory (EM) model that charges w > 1 for writing a block of size B to the secondary memory, and present a multi-way mergesort that asymptotically reduce the number of writes over the original algorithms, and perform roughly w block reads for every block write. Lastly, we will briefly introduce some other results of related research.

Joint work with Guy Blelloch, Jeremy Fineman, Phil Gibbons, and Julian Shun.

Termination of Initialized Linear Loop Programs

Friday, September 18^th, 2015 from 12-1 pm in GHC 4303.

Presented by Ankush Das, CSD

The halting problem is undecidable, in general. But do there exist simple loop programs where termination is actually decidable? In this talk, I would define a class of programs called linear loop programs and will talk about the termination of such programs. I will motivate the use of petri nets, a special class of transition systems which can perhaps solve the problem. At the very least, I will establish some interesting connections between transition systems and linear algebra.

This is joint work with Prof. Supratik Chakraborty (IIT Bombay), Prof. Akshay S (IIT Bombay) and Pallerla Sai Sandeep Reddy (IIT Bombay). Since this is an ongoing work, I would really love to hear about ideas, extensions or more insight into the problem.

Annotating and Automatically Tagging Constructions of Causation

Monday, September 21^st, 2015 from 12-1 pm in GHC 6501.

Presented by Jesse Dunietz, CSD

Causal relationships are everywhere. They underpin our very understanding of our world, and we often want to know what causes, enables, or prevents some phenomenon (e.g., medical symptoms, political events, or interpersonal actions). It is unsurprising, then, that causation is also ubiquitous in language. Recognizing these relations would thus be invaluable for many kinds of natural language semantics applications.

Identifying and interpreting these relationships, however, is not easy for traditional computational approaches to semantic analysis. Causal relations take a wide variety of linguistic realizations, ranging from individual words (including verbs, conjunctions, adjectives, and prepositions) to much more complex patterns, some of which involve words and syntactic relationships from multiple parts of the sentence.

In this talk, I will present an approach to annotating and automatically labeling instances of causal language. We base our approach on the emerging linguistic paradigm known as Construction Grammar (CxG). CxG places form/function pairings called constructions at the heart of both syntax and semantics, allowing the semantics to be anchored to an enormous variety of forms. Drawing on these principles, I will first present a scheme for annotating causal constructions in text. I will then describe two supervised methods, both combining automatically induced rules with statistical classifiers, for automatically recognizing these constructions and identifying their arguments.

This talk is based on joint work with Jaime Carbonell and Lori Levin.

Presented in partial fulfillment of the CSD Speaking Skills Requirement.

Scheduling Black-box Mutational Fuzzing

Friday, October 2^nd, 2015 from 12-1 pm in GHC 6501.

Presented by Samantha Gottlieb, CSD

Black-box mutational fuzzing is a simple yet effective technique to find bugs in software. Given a set of program-seed pairs, we ask how to schedule the fuzzing of these pairs in order to maximize the number of unique bugs found at any point in time. We develop an analytic framework using a mathematical model of black-box mutational fuzzing and use it to evaluate 26 existing and new randomized online scheduling algorithms. Our experiments show that one of our new scheduling algorithms outperforms the multi-armed bandit algorithm in the current version of the CERT Basic Fuzzing Framework by finding 1.5 more unique bugs in the same amount of time.

Joint work with Maverick Woo, Sang Kil Cha, and David Brumley.

Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.

Towards a language-universal syntactic parser

Monday, October 5^th, 2015 from 12-1 pm in GHC 6501.

Presented by Waleed Ammar, LTI

The study of linguistic universals, the proliferation of treebanks in different languages, and the move toward unified linguistic representations together suggest that annotated data from many languages might be used together to build linguistic analyzers capable of processing input in any language. We present a single dependency parsing model learned from a multilingual collection of treebanks. Sharing of statistical strength across languages requires lexical embeddings in a shared vector space, for which we present a new estimation method. Our model also makes use of typological side information to allow generalization to new, treebank-free languages whose words can be embedded in the shared space.

Joint work with George Mulcaire, Miguel Ballesteros, Chris Dyer and Noah A. Smith

ThomasDB: Automatic Incremental Computation for Big Data

Friday, October 9^th, 2015 from 12-1 pm in GHC 6501.

Presented by Thomas Tauber-Marshall, CSD

Many big data applications are performed on data sets that are constantly changing, such as calculating the PageRank of every page on the internet. Rather than rerunning the entire calculation repeatedly to keep the results up to date, which can be expensive, incremental computation allows you to update the output of previous calculations with minimal effort. ThomasDB is a distributed, fault tolerant system designed to make it is easy to make many suitable big data calculations incremental. ThomasDB accomplishes this through a technique known as self-adjusting computation, which tracks relationships between data and code in a structure called a dynamic dependency graph, allowing ThomasDB to identify which portions of the execution are affected by changes to the data set and to only re-execute those parts.

Joint work with Andy Pavlo and Umut Acar. In partial fulfillment of the speaking requirement.

Simple And Complex Natural Language Commands For A Mobile Service Robot

Friday, October 16^th, 2015 from 12-1 pm in GHC 6501.

Presented by Vittorio Perera, CSD

As robots move out of labs and into the real world, it is critical that humans will be able to specify task requirements in an intuitive and flexible way. In this talk, we will see how to enable untrained people to instruct robots via dialog.

In the first part of the talk I am going to present a joint probabilistic model over speech, the resulting semantic parse and the mapping from each element of the parse to a physical entity in the building. This model allows the robot to understand spoken commands while being flexible to the ways that untrained people interact with it, being robust to speech to text errors, and learning referring expressions for physical locations in a map.

The second part of the talk focuses on complex commands, a natural language sentence consisting of sensing-based conditionals, conjunctions, and disjunctions. In this second part I am going to present a template based algorithm able to recover the correct structure of a complex command, two dialogue models that enable the user to confirm or correct the extracted command structure and how the structure extracted can be used for planning and execution by the service robot.

Joint work with Manuela Veloso. In partial fulfillment of the speaking requirement.

Building Effective Query Classifiers: A Case Study in Self-harm Intent Detection

Friday, November 20^th, 2015 from 12-1 pm in GHC 6501.

Presented by Ashiqur Rahman Khudabukhsh, CSD

Query-based triggers play a crucial role in modern search systems, e.g., in deciding when to display direct answers on result pages. We address a common scenario in designing such triggers for real-world settings where positives are rare and search providers possess only a small seed set of positive examples to learn query classification models. We choose the critical domain of self-harm intent detection to demonstrate how such small seed sets can be expanded to create meaningful training data with a sizable fraction of positive examples. Our results show that with our method, substantially more positive queries can be found compared to plain random sampling. Additionally, we explored the effectiveness of traditional active learning approaches on classification performance and found that maximum uncertainty performs the best among several other techniques that we considered.

This work was performed at Microsoft Research under the mentor-ship of Paul N. Bennett and Ryen W. White. In partial fulfillment of the speaking requirement.

STRADS: Parallelizing ML over Model Parameters

Monday, November 30^th, 2015 from 12-1 pm in GHC 8102.

Presented by Jin Kyu Kim, CSD

Machine learning (ML) methods are used to analyze data which are collected from various sources. As the problem size grows, cluster computing technology has been widely adopted for solving ML problems. There are two driving factors behind this trend: Big-data: computation and storage capacity of a single machine becomes not-enough to process data; Big-model: the number of ML parameters to learn becomes large to the extent that a single machine can not finish learning in a reasonable amount of time. In this talk, we focus on big model problem. A natural solution is to turn to parallelizing parameter updates in a cluster. However, naive parallelization of ML algorithms often hurts the effectiveness of parameter updates due to the dependency structure among model parameters and a subset of model parameters are often bottlenecks to the completion of ML algorithms due to the uneven convergence rate.

In this talk, we will present Scheduled Model Parallel approach for addressing these challenges of parallelizing big model problems efficiently, and a distributed framework called STRADS that facilitates development and deployment of scheduled model-parallel ML applications in distributed systems. I will first talk about SMP scheduling schemes: 1) model parameter dependency checking to avoid updating conflicting parameters concurrently; 2) parameter prioritization to give more update chances to the parameters far from its convergence point. To realize SMP, we implement a prototype SMP framework called STRADS. STRADS improves updates executed per second by pipelining iterations and overlapping update computation with network communication for parameter synchronization. With SMP and STRADS, we improve convergence per update and improved updates per second. As a result, we substantially improves convergence per second and achieve faster ML execution time. For benchmark, we implement various ML algorithms such as MF, LDA, Lasso, Logistic Regression in the form of SMP on top of STRADS.

In partial fulfillment of the CSD Speaking Skills Requirement.

New classifiers for noisy data

Friday, December 4^th, 2015 from 12-1 pm in GHC 4303.

Presented by Shiva Kaul, CSD

Linear classifiers are convenient to train, but cannot cope with noisy or complex data. Nonlinear kernel classifiers are powerful, but are prone to overfitting and are cumbersome to train. A new kind of classifier addresses the limitations of both of its forebears, as evidenced by its surprising statistical and algebraic properties. Our classifier underlies a new learning algorithm of notable practical and theoretical interest.

Joint work with Geoff Gordon.

In partial fulfillment of the CSD Speaking Skills Requirement.

Concurrent PAC RL

Monday, December 7^th, 2015 from 12-1 pm in GHC 6501.

Presented by Zhaohan (Daniel) Guo, CSD

In many real-world situations a decision maker may make decisions across many separate reinforcement learning tasks in parallel, yet there has been very little work on concurrent RL. Building on the efficient exploration RL literature, we introduce two new concurrent RL algorithms and bound their sample complexity. We show that under some mild conditions, both when the agent is known to be acting in many copies of the same MDP, and when they are not the same but are taken from a finite set, we can gain linear improvements in the sample complexity over not sharing information. This is quite exciting as a linear speedup is the most one might hope to gain. Our preliminary experiments confirm this result and show empirical benefits.

Joint work with Emma Brunskill.

This is In Partial Fulfillment of the Speaking Requirement.

Web contact: sss+www@cs