go back to the main page

SSS Abstracts
Spring 2002

go back to the main page

On the (Im)possibility of Obfuscating Programs

Friday, February 22^nd, 2002 from 12-1 pm in NSH 1507.

Presented by Ke Yang,

Informally, an obfuscator is a compiler that takes an input program P and produces a new program which has the same functionality, yet is "un-readable" in some sense. If an obfuscator exists, it would imply a large set of cryptographical applications and computational complexity results.

In this talk, we investigate a rigorous theoretical definition of obfuscators. We will demonstrate the applications of the obfuscators, if they exists. Finally, we shall discuss why obfuscators don't exist.

This talk is in partial fulfillment of the speaking requirement.

An Optimal Algorithm for the List Update Problem

Friday, March 1^st, 2002 from 12-1 pm in NSH 1507.

Presented by Shuchi Chawla,

The List Update problem involves maintaining an unordered list of elements, where the goal is to minimize the time taken for queries such as "Does x belong to the list?". Queries are presented by a malicious adversary and the algorithm does not know which queries it is going to get in the future. The performance of an algorithm is evaluated with respect to an optimal algorithm (in hindsight) that knows future queries. We will discuss a well known algorithm for this problem and the concept of competitive ratio for evaluating such an algorithm. We will then present an algorithm that is (1+\epsilon) optimal for this problem for any \epsilon>0. This algorithm is based on Experts Analysis, which is a well studied problem in Machine Learning Theory. This is joint work with Avrim Blum and Adam Kalai.

This talk is in partial fulfillment of the speaking requirement.

Detecting Misunderstandings in the CMU Communicator Spoken Dialog System

Friday, March 8^th, 2002 from 12-1 pm in NSH 1507.

Presented by Dan Bohus,

Although it is the main drive behind the emergence of complex, interactive spoken-language interfaces, speech recognition technology also represents one of the most severe limiting factors in the successful development of these systems. We believe this problem can be addressed by the development of an efficient monitoring and recovery process: after all, human-human conversations are also affected by misunderstandings, but in most cases we manage to reestablish a mutual ground by employing various confirmation and clarification techniques.

In this talk, I will focus on the first part of this problem, that of constructing an efficient detector for misunderstandings in human-machine conversations. After casting the problem as a classification task, a set of relevant features from different levels in the dialog system are identified and several classification techniques are compared in an effort to determine the most appropriate one in this setting. The experiments, performed in the context of the CMU Communicator (a spoken dialog system operating in the air-travel planning domain) confirm the validity of this approach, and indicate that significant reductions in error rate can be obtained over a heuristic rule previously used to detect misunderstandings.

This talk is in partial fulfillment of the speaking requirement.

Probabilistic Verification of Discrete Event Systems

Friday, March 15^th, 2002 from 12-1 pm in NSH 1507.

Presented by Hakan Younes,

We present an algorithm for verifying properties of discrete event systems. The dynamics of such systems can be very complex, which makes them hard to analyze using symbolic methods. We resort to methods based on Monte Carlo simulation and acceptance sampling. We are interested in verifying probabilistic real-time properties, e.g. "the probability of system failure within 60 minutes is at most 0.05", and we can express such properties using continuous stochastic logic (CSL). Because our approach is simulation based, we cannot guarantee that the verification algorithm always produces the correct answer. There are two types of errors: false positives and false negatives. The key result is that we can guarantee bounds on the probability of either type of error. The error bounds can be set arbitrarily low, at the cost of computation time. By using sequential acceptance sampling, we minimize the expected number of samples needed to verify a property. While our attention is focused on discrete event systems that can be modeled as generalized semi-Markov processes and simulated using discrete event simulation, the algorithm can be used to verify properties of any type of system that we have a simulator for.

This talk is in partial fulfillment of the speaking requirement.

Learning Templates for Efficient Planning

Friday, March 22^nd, 2002 from 12-1 pm in WeH 4615A.

Presented by Elly Winner,

General-purpose planning can solve problems in a variety of domains but can be quite inefficient. Domain-specific planners are more efficient but are difficult to create. I will discuss template-based planning, a novel paradigm for automatically generating and using domain-specific planning programs, or /templates/. I will present an algorithm for learning templates automatically from example plans and explain how templates are used to solve planning problems. I will also present results that show that the automatically-learned templates compactly represent domain-specific planning experience. Furthermore, the templates situationally generalize the given example plans, thus allowing them to solve problems that have not previously been encountered efficiently.

This talk is in partial fulfillment of the speaking requirement.

Hiasynth: Human-In-Action Synthesis

Friday, March 29^th, 2002 from 12-1 pm in NSH 1507.

Presented by Thomas Kang,

Modeling humans has always been a topic of great interest in computer graphics. Yet it is still difficult to generate a photorealistic model of a human that can be posed and moved according to the user's wishes. Hiasynth is a new image warping-based technique which incorporates a rudimentary knowledge of the underlying geometry to direct and constrain the warping process. The result is a posable photorealistic model of a human that is relatively easy to create and control.

This talk is in partial fulfillment of the speaking requirement.

Modular Verification of Multithreaded Software

Friday, April 12^th, 2002 from 12-1 pm in NSH 1507.

Presented by Sanjit Seshia,

Multithreading is widely used in present-day programs, to improve performance as well as for program structuring. Automated correctness checking of multithreaded programs, however, remains a hard problem.

In this talk, I will describe a method that exploits modularity to build a scalable, automated verifier for multithreaded programs. Our technique performs modular verification of each procedure called by a thread using specifications of other procedures and of other threads. The technique leverages off existing sequential program verification methods. A checker has been implemented for Java on top of the Extended Static Checker for Java (ESC/Java). I will report on our experience in using the checker on real-world multithreaded Java programs.

This work is joint with Cormac Flanagan, Stephen Freund, and Shaz Qadeer.

This talk is in partial fulfillment of the speaking requirement.

Detection and Analysis of Routing Loops in the Internet

Friday, April 19^th, 2002 from 12-1 pm in NSH 1507.

Presented by Urs Hengartner,

Routing loops are caused by inconsistencies in the routing state among a set of routers. They occur even in perfectly engineered packet networks and have a detrimental effect on performance. They impact end-to-end performance through increased packet loss or delay for packets caught in the loop and through increased link utilization and corresponding delay and jitter for packets that traverse the link but are not caught in the loop.

In this talk, I will present two contributions. First, I will describe an algorithm for detecting routing loops on network links. I will show the results of applying the algorithm to packet traces collected from both a tier-1 Internet backbone network and a regional Internet exchange. The detected loops are verified using information from routing protocols. Second, I will present an analysis of routing loops and their impact on network performance. I will discuss various parameters of routing loops, such as their frequency and duration.

This is joint work with Richard Mortier, Sue Moon, and Christophe Diot.

This talk is in partial fulfillment of the speaking requirement.

Correlation Clustering

Friday, April 26^th, 2002 from 12-1 pm in NSH 1507.

Presented by Nikhil Bansal,

We consider a form of clustering, that we call correlation clustering. Suppose we are given a complete graph on n vertices (items), where each edge (u,v) is labelled either + or - depending on whether the items u and v are similar or not. The goal is to produce a clustering of items that agrees as much as possible with the given information. That is, we want to minimize the number of mistakes: the number of - edges inside clusters plus the number of + edges between clusters.

In this talk we will consider various notions of the problem and give approximation algorithms for them.

This is joint work with Avrim Blum and Shuchi Chawla.

This talk is in partial fulfillment of the speaking requirement.

What is a Module System?

Friday, May 3^rd, 2002 from 12-1 pm in NSH 1507.

Presented by Derek Dreyer,

Module systems, in some form or another, are ubiquitous in modern programming languages. At the very least, they provide a form of namespace management. Typically they also provide a way to divide a program into smaller components in such a way that the only information any one component needs to know about another is expressed in a well-defined interface. This facility is critical to the development and maintenance of large software systems.

But what is a module system? Can we describe the key concepts formally? In this talk I will attempt to answer that question using the methodology of type theory. I will show that features as basic as namespace management in C and as complex as type sharing and generativity in ML can be formally understood by starting with the simply typed lambda calculus and building up from there. No prior knowledge of type theory is assumed, aside from some familiarity with the lambda calculus.

This talk is in partial fulfillment of the speaking requirement.

Improving Index Performance through Prefetching

Friday, May 10^th, 2002 from 12-1 pm in NSH 1507.

Presented by Shimin Chen,

In recognition of the importance of CPU cache hierarchies in database performance, recent studies have revisited core database algorithms to reduce the number of cache misses. But because modern processors support prefetching and other mechanisms to overlap cache misses with computation and other misses, we observe that it is not the total number of cache misses that dictates performance, but rather the total amount of exposed miss latency. Therefore an algorithm that utilizes prefetching can potentially outperform an algorithm with fewer cache misses.

In this talk, I will present our work on Prefetching B+-Trees (pB+-Trees), which use cache prefetching to accelerate two important operations on B+-Tree indices: searches and range scans. I will begin by describing B+-Trees with one-line nodes, which minimizes the number of cache misses for searches. Then I will focus on our two prefetching techniques: wider nodes prefetching for searches, and jump-pointer array prefetching for range scans. Finally I will show our experimental results: compared to B+-Trees with one-line nodes, pB+-Trees achieve a factor of 1.2-1.5 speedups on searches and updates, and a six-fold speedup on range scans of 1000+ keys.

This talk is in partial fulfillment of the speaking requirement.

My cache or yours? Making storage more exclusive

Friday, June 7^th, 2002 from 12-1 pm in NSH 1507.

Presented by Ted Wong,

Modern high-end disk arrays often have several gigabytes of cache RAM. Unfortunately, most array caches use management policies which duplicate the same data blocks at both the client and array levels of the cache hierarchy: they are \emph{inclusive}. Thus, the aggregate cache behaves as if it was only as big as the larger of the client and array caches, instead of as large as the sum of the two. Inclusiveness is wasteful: cache RAM is expensive.

We explore the benefits of a simple scheme to achieve \emph{exclusive caching}, in which a data block is cached at either a client or the disk array, but not both. Exclusiveness helps to create the effect of a single, large unified cache. We introduce a "demote" operation to transfer data ejected from the client to the array, and explore its effectiveness with simulation studies. We quantify the benefits and overheads of demotions across both synthetic and real-life workloads. The results show that we can obtain useful---sometimes substantial---speedups.

During our investigation, we also developed some new cache-insertion algorithms that show promise for multi-client systems, and report on some of their properties.

Web contact: sss+www@cs