go back to the main page

SSS Abstracts
Spring 2000

go back to the main page

Potential Parallelism: A New Way of Looking at Parallel Machines

Friday, January 21^st, 2000 from 12-1 pm in WeH 4623.

Presented by Chris Colohan,

For many computer users, there is a constant need: the need for speed. Unfortunately, computers are not always able to deliver the needed performance. One method that system designers use for improving performance is exploiting parallelism, running multiple tasks concurrently. This has allowed regular scientific applications to be run on very large parallel machines, resulting in tremendous speedups. Unfortunately, many (perhaps most) applications are not easy to parallelize by hand, and are also resistant to automatic parallelization.

In this talk I will discuss the topic of potential parallelization, a method of relaxing the constraints on the programmer and compiler when trying to parallelize a program. With these relaxed constraints, much more parallelism can be exposed, and impressive speedups can be realized on programs previously thought to contain no parallelism. This talk will discuss the system support for this technique, changes in the programming model and compiler support.

This talk is a high level overview of ongoing work in the STAMPede research project. It is meant to appeal to a general audience, and does not focus solely on presenting our latest results.

Towards Automated Verification of Safety Architectures

Friday, February 25^th, 2000 from 12-1 pm in WeH 4623.

Presented by Carsten Schürmann,

It is very unlikely that software will ever be proven totally correct, safe, and secure. For example, in order to trust that the execution of a binary is memory safe, we have to trust the correctness of the compiler implementation. But in general, we can neither expect any real compiler to be free of bugs nor can we expect to prove its implementation correct. On the other hand, when compiled code is augmented with safety proofs, the execution of the binary will be safe; in this case instead of trusting the compiler we must trust the safety architecture (e.g. proof-carrying code or typed assembly language) which is typically based on logics or type systems.

In my talk I will present a meta-logical framework and its implementation in the Twelf system. Twelf is a tool that supports the design of deductive systems in general and safety architectures in particular. In these domains its automated reasoning power far exceeds that of any other theorem prover. Specifically, Twelf has been successfully employed to derive various properties of logics and type systems directly relevant to safety architectures, such as the consistency of logics and the admissibility of new inference rules that shorten the size of safety proofs. Other results include automatic proofs of the Church-Rosser theorem, cut-elimination, and type preservation and progress of various operational semantics. Using one particular example, I will motivate the design of Twelf.

Natural Programming: An HCI Approach to Programming Language Design

Friday, March 10^th, 2000 from 12-1 pm in WeH 4623.

Presented by John Pane,

Programming is a very difficult activity, especially for beginners, but perhaps it does not have to be so hard. Over the past thirty years many researchers have studied programming and discovered important usability issues. Meanwhile, the field of Human-Computer Interaction (HCI) has assembled a set of techniques and principles for improving the usability of computer systems. It is time to apply this work to influence the designs of new languages, by combining the specific knowledge we have about programming with the general HCI techniques. We can catalog and interpret the prior work, perform new studies where questions remain unanswered, and focus on usability throughout the design, to produce programming systems that are easier to learn and use.

This talk will summarize my research, which focuses on usability in the design of a new programming system for children. I will begin with a review of some of the more important issues from prior research on beginner programmers. Some open questions led us to conduct a pair of studies examining how children and other non-programmers naturally express problem solutions. These studies exposed some of the ways that current programming languages make people express their solutions unnaturally. I will then present a preliminary system design that addresses many of these issues. The features of this design elevate the importance of query specification, a well-known area of difficulty for beginners. A third study characterized some of the problems users have with boolean expressions and tested some alternatives for query specification. Our tabular query design performed significantly better than boolean expressions, and will be incorporated into my system. I am now finalizing the system design and implementing it, and will perform additional studies with children to test its usability.

X means --- extending K-means to estimate the number of clusters

Friday, March 31^st, 2000 from 12-1 pm in WeH 4623.

Presented by Dan Pelleg,

Despite its popularity for general clustering, K-means suffers three major shortcomings; it scales poorly computationally, the number of clusters K has to be supplied by the user, and the search is prone to local minima. We propose solutions for the first two problems, and a partial remedy for the third. Building on prior work for algorithmic acceleration that is not based on approximation, we introduce a new algorithm that efficiently, simultaneously, searches the space of cluster locations and number of clusters to optimize the Bayesian Information Criterion (BIC) or the Akaike Information Criterion (AIC) measure. The innovations include two new ways of exploiting cached sufficient statistics and a new very efficient test that in one K-means sweep selects the most promising subset of classes for refinement. This gives rise to a fast, statistically founded algorithm that outputs both the number of classes and their parameters. Experiments show this technique reveals the true number of classes in the underlying distribution, and that it is much faster than repeatedly using accelerated K-means for different values of K.

Joint work with Andrew Moore.

An Overview of Probabilistic Path Planning Algorithms

Friday, April 7^th, 2000 from 12-1 pm in WeH 8220.

Presented by Mark Moll,

Snake-like robots have great potential to assist humans at complicated tasks. Reconnaissance of earthquake areas, inspection of car engines, minimally invasive surgery are three possible domains. One great challenge is to plan the motions of such robots. The very property that makes snake-like robots attractive, namely their high degrees of freedom, makes it very difficult to plan their motions. In the past years several practical randomized algorithms for path planning in many dimensions have been developed. These algorithms sample the configuration space and try to build a roadmap of this space. A roadmap is a 1-dimensional structure that has the same structure as the obstacle-free part of the configuration space. Once a roadmap has been built, we can search it to find paths between arbitrary configurations.

The naive roadmap algorithm samples the configuration space uniformly and connects samples that are "close". This algorithm often fails to find a path if there is little clearance between obstacles. Subsequent algorithms have been proposed that use non-uniform sampling techniques in order to maximize the probability of finding narrow corridors in configuration space.

In my talk I will give an overview of probabilistic path planning and explain a few probabilistic roadmap methods. I will also give an example of an algorithm that takes into account kinodynamic constraints: constraints on the velocity and acceleration of the robot. Hovercraft and satellites with thrusters are examples of this type of robot.

This talk should be accessible to a general CS audience.

Integrating a Command Shell Into a Web Browser

Friday, April 21^st, 2000 from 12-1 pm in WeH 4623.

Presented by Rob Miller,

The transition from command-line interfaces to graphical interfaces has resulted in programs that are easier to learn and use, but harder to automate and reuse. Another transition is now underway, this time to HTML interfaces hosted by a web browser. To help users automate HTML interfaces, I propose the "browser-shell," a web browser that integrates a command interpreter into the browser's Location box. The browser-shell's command language is designed for extracting and manipulating HTML and text, and commands can also invoke local programs. Command input is drawn from the current browser page, and command output is displayed as a new page. The browser-shell brings to web browsing many of the strengths of the Unix shell, including scripting web services and making pipelines of web services and local programs.

The browser-shell also allows legacy command-line programs to be wrapped in an HTML/CGI interface that is graphical but still scriptable. Even without wrappers, the browser-shell offers a new shell interaction model, different from the conventional typescript model used by xterm or Emacs, which may improve usability in some respects.

The talk will include a demonstration of a prototype browser-shell, with a discussion of different implementation alternatives and their tradeoffs.

Note that this talk is in partial fulfillment of the speaking requirement.

PicturePiper: Using a re-configurable pipeline to find images on the WWW

Friday, April 28^th, 2000 from 12-1 pm in WeH 4623.

Presented by Adam Fass,

PicturePiper is an interactive browser that allows users to find images related to a topic of interest. The user enters a text query, and PicturePiper contacts an existing web search engine to find matching web pages. Once these pages are found, PicturePiper scans them for images, downloads the images, and shows them to the user.

Since the resulting set of images can be large and eclectic, PicturePiper allows the user to explore the images by clustering them, discarding un-interesting clusters, and re-clustering the resulting set. As the user carries out these operations, additional images are simultaneously downloaded and added to the appropriate cluster, possibly resulting in updates to the user's display.

In order to download images and perform the interactive clustering operations simultaneously, PicturePiper uses a reconfigurable pipeline. Operations such as clustering add additional stages to the pipeline, while the entire pipeline continues to download additional images and assign them to the appropriate clusters.

PicturePiper was built to explore the use of a re-configurable pipeline architecture in browsing a large collection of images. Two design decisions that worked out well were: 1) the use of a single common data type for passing information between pipeline stages, and 2) having the client application poll the pipeline buffers for new results, rather than having the pipeline send explicit update notifications.

Note that this talk is in partial fulfillment of the speaking requirement.

End System Multicast: A New Architecture for Group Communication over the Internet

Friday, May 5^th, 2000 from 12-1 pm in WeH 4623.

Presented by Sanjay Rao,

Multi-party applications like audio/video conferencing, internet games and distance learning are becoming increasingly popular. Common to these applications is the requirement that data be transmitted efficiently from a sender to multiple recipients. So far, researchers have concentrated on an architecture called IP Multicast for meeting this objective. IP Multicast requires significant changes to routers and existing Internet infrastructure. However, ten years after its initial proposal, IP Multicast is still plagued with concerns pertaining to scalability, network management and deployment.

In this talk, I will present our alternative architecture, that we call End System Multicast, where end systems implement all multicast related functionality including membership management and packet replication. This shifting of multicast support from routers to end systems has the potential to address most problems associated with IP Multicast. However, a key concern is the performance penalty associated with such a model. In particular, End System Multicast introduces duplicate packets on physical links and incurs larger end-to-end delay than IP Multicast. We have studied this question in the context of the Narada protocol. In Narada, end systems self-organize into an overlay structure using a fully distributed protocol. In addition, Narada attempts to optimize the efficiency of the overlay based on end-to-end measurements.

I will present details of Narada and results from simulation and Internet experiments. Preliminary results are encouraging. In most experiments, the delay and bandwidth penalty are low. We believe the potential benefits of repartitioning multicast functionality between end systems and routers significantly outweigh the performance penalty incurred.

This talk is based on joint work with Yang-hua Chu and Hui Zhang.

Note that this talk is in partial fulfillment of the speaking requirement.

Using the Naive Bayes Classifier to Obtain Reasonable Class Posteriors

Friday, May 12^th, 2000 from 12-1 pm in WeH 4623.

Presented by Paul Bennett,

The Naive Bayes classifier has shown itself to be a top competitor in many text domains. In other cases, it is used because its simplicity and its computational efficiency make it an attractive alternative. However, researchers have often found when it is used for learning the actual posterior distribution, performance has been poor or average. Many people have specifically noted the fact that Naive Bayes tends to generate a bimodal posterior with probabilities clustered either arbitrarily close to 0 or arbitrarily close to 1. Of course these estimates would be fine if the Naive Bayes classifier were always correct, but, as it is not, it tends to produce uncallibrated probability estimates. We give a theoretical justification of why this bimodal distribution is extremely likely to occur when using the Naive Bayes classifier, and given this justification, show how one can use the classifier to obtain improved probability estimates.

Note that this talk is in partial fulfillment of the speaking requirement.

Web contact: sss+www@cs