|  |  SSS Abstracts  |  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Learning Gene Networks Underlying Clinical Phenotypes Using SNP Perturbations
Friday, November 2nd, 2018 from 12-1 pm in GHC 6501.
Presented by Calvin McCarter, MLD
Recent technologies are generating an abundance of genome sequence data and molecular and clinical phenotype data, providing an opportunity to understand the genetic architecture and molecular mechanisms underlying diseases. Previous approaches have largely focused on the co-localization of single-nucleotide polymorphisms (SNPs) associated with clinical and expression traits, each identified from genome-wide association studies and expression quantitative trait locus (eQTL) mapping, and thus have provided only limited capabilities for uncovering the molecular mechanisms behind the SNPs influencing clinical phenotypes. Here we aim to extract rich information on the functional role of trait-perturbing SNPs that goes far beyond this simple co-localization. We introduce a computational framework called Perturb-Net for learning the gene network that modulates the influence of SNPs on phenotypes, using SNPs as naturally occurring perturbation of a biological system. Perturb-Net uses a probabilistic graphical model to directly model both the cascade of perturbation from SNPs to the gene network to the phenotype network and the network at each layer of molecular and clinical phenotypes. Perturb-Net learns the entire model by solving a single optimization problem with an extremely fast algorithm that can analyze human genome-wide data within a few hours. In our analysis of asthma data, for a locus that was previously implicated in asthma susceptibility but for which little is known about the molecular mechanism underlying the association, Perturb-Net revealed the gene network modules that mediate the influence of the SNP on asthma phenotypes. Many genes in this network module were well supported in the literature as asthma-related.
Exploratory Stage Lighting Design using Visual Objectives
Friday, November 9th, 2018 from 12-1 pm in NSH 3002.
Presented by Evan Shimizu, CSD
Lighting is a critical element of theater. A lighting designer is responsible for drawing the audience's attention to a specific part of the stage, setting time of day, creating a mood, and conveying emotions. Designers often begin the lighting design process by collecting reference visual imagery that captures different aspects of their artistic intent. Then, they experiment with various lighting options to determine which ideas work best on stage. However, modern stages contain tens to hundreds of lights, and setting each light source's parameters individually to realize an idea is both tedious and requires expert skill. In this talk, I present an exploratory lighting design tool based on feedback from professional designers. The system extracts abstract visual objectives from reference imagery and applies them to target regions of the stage. This system can rapidly generate plausible design candidates that embody the visual objectives through a Gibbs sampling method, and present them as a design gallery for rapid exploration and iterative refinement. Evaluations demonstrate that the resulting system allows lighting designers of all skill levels to quickly create and communicate complex designs, even for scenes containing many color-changing lights.
Based on joint work with Kayvon Fatahalian, Sylvain Paris (Adobe Research), Matt Fisher (Adobe Research), and Ersin Yumer (Adobe Research).
This talk is in Partial Fulfillment of the Speaking Requirement.
How AI Has Learned to Play Atari Games
Monday, November 12th, 2018 from 12-1 pm in GHC 6501.
Presented by Steven Osman, CSD
Advances in deep learning have enabled many new applications over the past few years including solutions to such challenges as image classification, machine translation and speech recognition.   A particular form of deep reinforcement learning, deep Q-learning has allowed an AI to learn to play Atari games through trial and error.  Reinforcement learning is a form of machine learning in which, rather than being given a series of examples to learn from (an approach known as supervised learning), a model is trained by repeatedly refining the model in an attempt to maximize a reward.  This reward could be positive, such as a point given each time a task is successfully completed, or negative, such as points taken away for failure or for consuming too many resources.  Q-learning is an approach to teaching an agent to solve a problem by picking the best action given its current state, for example, deciding what move to make given a chess board in a particular configuration.
This talk will explain the first end to end approach to learning to play Atari games which used an algorithm called Deep Q-learning with Experience Replay (Mnih et a. 2013).  First, a brief introduction to Q-learning and deep reinforcement learning will be provided, and then an explanation of how they were combined with a form of memory (the experience replay buffer) to solve the complex task of playing games.  This algorithm learns to play seven Atari 2600 games with no knowledge of the internals of the games themselves.  The input is just the pixels on the screen and the current score, and the output is what move to make at each moment in time.
In Partial Fulfillment of the Speaking Requirement.
Workload Analysis and Caching Strategies for Search Advertising Systems
Friday, November 16th, 2018 from 12-1 pm in GHC 8102.
Presented by Conglong Li, CSD
To maximize profit and connect users to relevant products and services, search advertising systems use sophisticated machine learning algorithms to estimate the revenue expectations of thousands of matching ad listings per query. These machine learning computations constitute a substantial part of the operating cost, e.g., 10% to 30% of the total gross revenues. It is desirable to cache and reuse previous computation results to reduce this cost, but caching introduces approximation which comes with potential revenue loss. To maximize cost savings while minimizing the overall revenue impact, an intelligent refresh policy is required to decide when to refresh the cached computation results.
Our workload analysis and cache simulations show that caching is preferable yet challenging for search advertising systems. In this talk, I present two different caching designs: one uses heuristics and one uses fast machine learning algorithm to build the caching strategies. Evaluation shows that a traditional cache design reduces the total computation cost the most, but also introduces huge negative revenue and net profit impact. On the other hand, our proposed cache designs minimize the negative revenue impact while keeping the amount of cost savings, and improves the total net profit.
Based on joint work with David G. Andersen, Qiang Fu (Microsoft), Sameh Elnikety (Microsoft Research), and Yuxiong He (Microsoft Research).
In Partial Fulfillment of the Speaking Requirement
The Brain of Databases: Forecasting, Modeling, and Planning for Self-Driving Database Management Systems
Friday, November 30th, 2018 from 12-1 pm in GHC 6501.
Presented by Lin Ma, CSD
In the last two decades, both researchers and vendors have built advisory tools to assist database administrators (DBAs) in various aspects of system tuning and physical design. Most of this previous work, however, is incomplete because they still require humans to make the final decisions about any changes to the database and are reactionary measures that fix problems after they occur. What is needed for a truly “self-driving” database management system (DBMS) is a new architecture that is designed for autonomous operation. This is different than earlier attempts because all aspects of the system are controlled by an integrated planning component that not only optimizes the system for the current workload, but also predicts future workload trends so that the system can prepare itself accordingly. With this, the DBMS can support all of the previous tuning techniques without requiring a human to determine the right way and proper time to deploy them. It also enables new optimizations that are important for modern high-performance DBMSs, but which are not possible today because the complexity of managing these systems has surpassed the abilities of human experts.
In this talk, I will present our roadmap towards developing a self-driving DBMS. It has three main components: workload forecasting, action modeling, and planning. I will also present our solution to the first component: a forecasting framework called QueryBot 5000 that allows a DBMS to predict the expected arrival rate of queries in the future based on historical data. It provides multiple horizons (short- vs. long-term) with different aggregation intervals. We implemented our forecasting technique in an external controller for PostgreSQL and MySQL and demonstrate their effectiveness in selecting indexes. Finally, I will present our on-going project in this journey and our long-term vision.
Based on joint work with Dana Van Aken, Ahmed Hefny, Gustavo Mezerhane, Andrew Pavlo, and Geoffrey J. Gordon.
In Partial Fulfillment of the Speaking Requirement
Bandwidth-efficient Live Video Analytics for Drones via Edge Computing
Monday, December 3rd, 2018 from 12-1 pm in GHC 6501.
Presented by Junjue Wang, CSD
Real-time video analytics on small autonomous drones poses several difficult challenges at the intersection of wireless bandwidth, processing capacity, energy consumption, result accuracy, and timeliness of results. In response to these challenges, we describe four strategies to build an adaptive computer vision pipeline for search tasks in domains such as search-and-rescue, surveillance, and wildlife conservation. Our experimental results show that a judicious combination of drone-based processing and edge-based processing can save substantial wireless bandwidth and thus improve scalability, without compromising result accuracy or result latency.
In this talk, I will first introduce our vision of enabling live video analytics on drones and provide a motivating usage case in the search and rescue context. I will then present our experiments estimating how much processing capabilities a small drone can carry and show that onboard processing alone cannot satisfy the computational need and offloading the computation to an edge server is needed. After system design challenges are identified, I will talk about four proposed optimizations to address these challenges in detail.
This talk is based on a joint work with Ziqiang Feng, Zhuo Chen, Shilpa George, Mihir Bala, Padmanabhan Pillai, Shao-Wen Yang, and Mahadev Satyanarayanan.
Presented in Partial Fulfillment of the CSD Speaking Skills Requirement
SuRF: Practical Range Query Filtering with Fast Succinct Tries
Monday, December 10th, 2018 from 12-1 pm in GHC 6501.
Presented by Huanchen Zhang, CSD
In this talk, I will present our Succinct Range Filter (SuRF), a fast and compact data structure for approximate membership tests. Unlike traditional Bloom filters, SuRF supports both single-key lookups and common range queries: open-range queries, closed-range queries, and range counts. SuRF is based on a new data structure called the Fast Succinct Trie (FST) that matches the point and range query performance of state-of-the-art order-preserving indexes, while consuming only 10 bits per trie node. The false positive rates in SuRF for both point and range queries are tunable to satisfy different application needs. We evaluate SuRF in RocksDB as a replacement for its Bloom filters to reduce I/O by filtering requests before they access on-disk data structures. Our experiments on a 100 GB dataset show that replacing RocksDB's Bloom filters with SuRFs speeds up open-seek (without upper-bound) and closed-seek (with upper-bound) queries by up to 1.5x and 5x with a modest cost on the worst-case (all-missing) point query throughput due to slightly higher false positive rate.
In Partial Fulfillment of the Speaking Requirement
Web contact: sss+www@cs