|  |  SSS Abstracts  |  | 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Light-weight In-situ Analysis with Frugal Resource Usage
Friday, November 10th, 2017 from 12-1 pm in GHC 6501.
Presented by Qing Zheng, CSD
In this talk Qing presents Parallel Logging DB (PLDB), a new in-situ analysis technique for indexing data within DeltaFS. With its design as a scalable, server-less file system for HPC platforms, DeltaFS scales file system metadata performance with application scale. The new PLDB is a novel extension to the DeltaFS data plane, enabling in-situ indexing of massive amounts of data written to a single DeltaFS directory simultaneously, and in an arbitrarily large number of files. PLDB achieves this through a compaction-free indexing mechanism for reordering and indexing data, and a log-structured storage layout to pack small writes into large log objects, all while ensuring compute node resources are used frugally. We demonstrate the efficiency of our PLDB through VPIC, a widely-used simulation code developed at Los Alamos National Lab that scales to trillions of particles. With DeltaFS, we modify VPIC to create a file under a special directory for each particle to receive writes of that particle's output data. Dynamically indexing the directory's underlying storage using PLDB allows us to achieve a 5,000x speedup in VPIC particle trajectory queries, which require reading all data for a single particle. This speedup increases with simulation scale, while the overhead is fixed at 3% of available memory and 8% of final storage.
In Partial Fulfillment of the Speaking Requirement
Plan improvement by reasoning about unmodeled degrees of freedom of the world
Friday, November 17th, 2017 from 12-1 pm in GHC 6501.
Presented by Rui Silva, CSD
Planning under uncertainty assumes a model that specifies the probabilistic effects of actions in terms of changes of the state. Given such model, planning proceeds to determine a policy that defines the choice of action at each state that maximizes a reward function. In this work, we realize that the world may have degrees of freedom not necessarily captured in the given model, and that the planning solution based thereon may be sub-optimal when compared to those possible when the additional degrees of freedom are considered. We introduce and formalize the problem of planning while considering feasible modifications to the model of the world that would lead to improved policies. We present an approach that models changes in the world as modifications to the probability transition function, and show that the problem of computing the most rewarding feasible modifications can be reduced to a constrained optimization problem. We then contribute a gradient-based algorithm for solving this optimization problem efficiently. We foresee several possible applications of this work, including some in the field of human-robot interaction.
In Partial Fulfillment of the Speaking Requirement
Reference-Aware Language Models
Monday, November 20th, 2017 from 12-1 pm in GHC 6501.
Presented by Zichao Yang, CSD
We propose a general class of language models that treat reference as an explicit stochastic latent variable. This architecture allows models to create mentions of entities and their attributes by accessing external databases (required by, e.g., dialogue generation and recipe generation) and internal state (required by, e.g. language models which are aware of coreference). This facilitates the incorporation of information that can be accessed in predictable locations in databases or discourse context, even when the targets of the reference may be rare words. Experiments on three tasks shows our model variants based on deterministic attention.
In Partial Fulfillment of the Speaking Requirement
ML for ML: Learning Cost Semantics by Experiment
Monday, November 27th, 2017 from 12-1 pm in GHC 6501.
Presented by Ankush Das, CSD
It is an open problem in static resource bound analysis to connect high-level resource bounds (order of complexity) with the actual execution time and memory usage of compiled machine code. We propose to use machine learning to derive a cost model for a high-level source language that approximates the execution cost of compiled programs, including garbage collection, on a specific hardware platform. The technique has been implemented for a subset of OCaml using Inrias OCaml compiler on an Intel x86-64 and ARM 64-bit v8-A platform with execution time and heap allocations being the considered resources. In this talk, I will define cost semantics, motivate and describe our machine learning approach and finally present the evaluation of our technique on several benchmarks.
Joint work with Jan Hoffmann
In Partial Fulfillment of the Speaking Requirement
Automatic Database Management System Tuning Through Large-scale Machine Learning
Friday, December 1st, 2017 from 12-1 pm in GHC 6501.
Presented by Dana Van Aken, CSD
Database management systems (DBMSs) are the most important component of any data-intensive application. They can handle large amounts of data and complex workloads. But they're difficult to manage because they have hundreds of configuration ``knobs'' that control factors such as the amount of memory to use for caches and how often to write data to storage. Organizations often hire experts to help with tuning activities, but experts are prohibitively expensive for many.
In this talk, I will present OtterTune, a new tool that can automatically find good settings for a DBMS's configuration knobs. OtterTune differs from other DBMS configuration tools because it leverages knowledge gained from tuning previous DBMS deployments to tune new ones. Our evaluation shows that OtterTune recommends configurations that are as good as or better than ones generated by existing tools or a human expert.
In Partial Fulfillment of the Speaking Requirement
Improving DNA read mapping with error-resilient seeds
Monday, December 4th, 2017 from 12-1 pm in GHC 6501.
Presented by Hongyi Xin, CSD
DNA read mapping is one of the fundamental problem in bioinformatics. The general goal is to map billions of short pieces of DNA (reads) to a high quality reference genome, while tolerating errors including genomic variations between individuals and the sequencer imprecision. The seed-and-extend strategy, a general mapping heuristic, efficiently maps erroneous DNA fragments to the reference by breaking the read into multiple non-overlapping segments, called ``seeds'', and assume at least one of the seed is error free. By using seed to index into the reference genome, seed-and-extend mappers drastically reduce the search space hence improves the run time. However, a fundamental trade-off of seed-and-extend mappers is the balance between seed lengths and seed count. Greater seed length further reduces the search space therefore improve the performance. However, greater seed space also renders fewer seeds hence make the mapping process more error prune.
In this talk, I will present an on-going work, VEST (Versatile Error-resilient Seed Technique). VEST aims to adapt long seeds in order to make them error-resilient, hence achieves both high error tolerance as well as high mapping speed.
In Partial Fulfillment of the Speaking Requirement
Belief-Aware Cyber Physical Systems
Monday, December 11th, 2017 from 12-1 pm in GHC 6501.
Presented by Joao Martins, CSD
Formal verification methods are making headway into required safety verification policies for transportation. However, they often fail to account for the fact that controllers, whether digital or human, have imperfect knowledge of the world. Without addressing this shortcoming, we risk never bridging the gap between theoretical and practical safety.
In this talk, we discuss the sometimes fatal consequences of these shortcomings, and describe a new logic with belief as a first-class citizen, allowing us to model, think and verify belief-aware cyber physical systems.
In Partial Fulfillment of the Speaking Requirement
Web contact: sss+www@cs