go back to the main page

SSS Abstracts
Fall 2019

go back to the main page

LDPC Codes Achieve List-Decoding Capacity

Friday, October 25^th, 2019 from 12-1 pm in GHC 6501.

Presented by Nicolas Resch, CSD

We show that Gallager's ensemble of Low-Density Parity Check (LDPC) codes achieve list-decoding capacity. These are the first graph-based codes shown to have this property. Previously, the only codes known to achieve list-decoding capacity were completely random codes, random linear codes, and codes constructed by algebraic (rather than combinatorial) techniques. This result opens up a potential avenue towards truly linear-time list-decodable codes which achieve list-decoding capacity.

Our result on list decoding follows from a much more general result: any local property satisfied with high probability by a random linear code is also satisfied with high probability by a random LDPC code from Gallager's distribution. Local properties are properties characterized by the exclusion of small sets of codewords, and include list-decoding, list-recovery and average-radius list-decoding. Along the way, we give a characterization of sets of codewords that are likely to appear in a random linear code, which may be of independent interest.

Processing Massive Static and Evolving Graphs

Monday, October 28^th, 2019 from 12-1 pm in GHC 6501.

Presented by Laxman Dhulipala, CSD

Today, the largest publicly available graph dataset is the Hyperlink Web graph, with over 3.5 billion vertices and 128 billion edges. Despite being publicly available for several years now, most experimental work in the literature reports results on much smaller graphs, or resorts to external or distributed memory to process the Hyperlink graph. A natural question then, is to ask whether we can efficiently solve a broad class of problems on this graph in memory. Given that existing systems capable of processing this graph only support static processing, another interesting question is whether we can efficiently represent and update this graph in memory.

In the first part of this talk I will present theoretically efficient implementations of parallel graph algorithms for a set of 20 important graph problems. Our codes, which we have publicly open-sourced as part of the Graph Based Benchmark Suite (GBBS) solve these problems on the Hyperlink Web graph using a single commodity multicore machine with a terabyte of RAM, with most of the codes running in under a minute on this graph. The running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs, and for many of the problems that we consider, this is the first time they have been solved on graphs at this scale.

In the second part of this talk I will present Aspen, a framework for processing streaming graphs, which are graphs that evolve over time via a stream of updates (batches of edge insertions/deletions). The key idea is to represent the adjacency information using a purely-functional compressed tree data structure called a C-tree, which can be efficiently updated. Aspen achieves millisecond latency for processing small batches, and achieves throughputs of hundreds of millions of updates per second for very large batches, while providing strict serializability for both queries and updates. Using Aspen, we are able to concurrently update and run analytics on the Hyperlink Web graph using a single commodity multicore server with a terabyte of RAM.

Based on joint work with Guy Blelloch and Julian Shun.

C2DN: How to Code on the Edge for Content Delivery Network

Monday, November 18^th, 2019 from 12-1 pm in GHC 6501.

Presented by Jason Yang, CSD

Content Delivery Networks (CDNs) cache and serve much of the worlds online content to users around the world. CDNs cache content in a replicated fashion to alleviate the impact of individual server unavailabilities which hurts user, CDN and content provider. Erasure codes are a resource-efficient alternative to replication. In this talk, we show how we introduce erasure coding into CDNs and build Coded CDN (C2DN). Compared to state-of-the-art CDN, C2DN reduces midgress (major CDN operation cost) by up to 30%, and provides better user latency. In addition, we show that C2DN only has a slight resource usage increase.

RADIO: A Robust Behavioral Anomaly Detection for IoT Devices in Enterprise Networks

Monday, November 25^th, 2019 from 12-1 pm in GHC 6501.

Presented by Tianlong Yu, CSD

IoT devices deployed inside enterprise networks (e.g., routers, storage appliances, cameras) are emerging security threats for enterprises. However, existing enterprise security mechanisms are often found limited to address such threats. For example, existing IDS and IPS systems deployed in enterprise network heavily rely on attack signatures but zero-day vulnerabilities are common for IoT devices. Fortunately, we observe that unlike general-purpose computing devices, the normal behavior of an IoT device is limited (e.g., a camera has zooming-in, video streaming, and audio recording behaviors). Based on this insight, we revisit behavioral anomaly detection at the network layer. Designing such a system is challenging on two fronts. First, we need a behavior model tailored towards enterprise IoT devices to abstract the key characteristics of those device behaviors (e.g., commands or arguments used) from network traffic. Second, in practical enterprise settings, the network traces for learning normal behavior models are unlabeled and potentially polluted. We address these challenges in designing RADIO, a practical and robust behavioral anomaly detection system for enterprise IoT devices. We design a novel learning mechanism that can build benign behavior models (finite-state-machines) for IoT devices, from unlabeled and potentially polluted network traces. We show that our approach achieves low false positives/false negatives and is robust to pollution (false positive rate<1% and false negative rate<0.01% when 15% of the network traffic is polluted).

Parity Models: Erasure-Coded Resilience for Prediction Serving Systems

Friday, December 6^th, 2019 from 12-1 pm in GHC 6501.

Presented by Jack Kosaian, CSD

Machine learning models are becoming the primary workhorses for many applications. Services deploy models through prediction serving systems that take in queries and return predictions by performing inference on models. Prediction serving systems are commonly run on many machines in cluster settings, and thus are prone to slowdowns and failures that inflate tail latency. Erasure coding is a popular technique for imparting resource-efficient resilience against data unavailability in storage and communication systems. However, existing approaches for imparting erasure-coded resilience to distributed computation apply only to a severely limited class of functions, precluding their use for many serving workloads, such as neural network inference.

We introduce parity models, a new approach for enabling erasure-coded resilience in prediction serving systems. A parity model is a neural network trained to transform erasure-coded queries into a form that enables a decoder to reconstruct slow or failed predictions. We implement parity models in ParM, a prediction serving system that makes use of erasure-coded resilience. ParM encodes multiple queries into a ``parity query'', performs inference over parity queries using parity models, and decodes approximations of unavailable predictions by using the output of a parity model. We showcase the applicability of parity models to image classification, speech recognition, and object localization tasks. Parity models enable accurate reconstructions of unavailable predictions and reduce tail latency in the presence of resource contention. These results highlight the potential of parity models to open new doors in imparting resource-efficient resilience to prediction serving systems.

A Domain Semantics for Higher-Order Recursive Processes

Monday, December 9^th, 2019 from 12-1 pm in GHC 6115.

Presented by Ryan Kavanagh, CSD

The polarized SILL programming language uniformly integrates functional programming and session-based concurrency. It supports recursion, asynchronous and synchronous communication, and higher-order programs that communicate processes. In this talk, I will motivate and present a denotational semantics for a fragment of polarized SILL. I will explain why domain theory is an ideal setting for this semantics. Session types will be interpreted as domains of bidirectional communications. We use polarity and domain theory to split these into pairs of unidirectional communications in a natural way. Processes will be interpreted as continuous functions on domains of unidirectional communications.

Web contact: sss+www@cs