Guy E. Blelloch's Home Page


My research has largely been in the interaction of Algorithms and Programming Languages, much of it in the area of parallel computing.
Here are some of the more recent topics I have worked on with my students and collaborators.


  • Benchmarking Parallel Algorithms. We are developing a benchmark suite of parallel algorithmic problems for testing machines and programming paradigms. The ideas is that they will be defined in terms of the interface instead of the code so they can be compared across machine types and programming paradigms.
  • Cache efficient parallel algorithms. We have been developing cost models, algorithms, and schedulers for taking advantage of locality on parallel cache hierarchies without worrying about details of the machine.
  • Synchronization and concurrent algorithms. We are looking into the idea of adding transactions on individual memory blocks to the memory system and how this can be used to simplify many concurrent algorithms. We have also looked into the idea "room-synchronizations".
  • Thread Scheduling. We are developing scheduling dynamic parallel languages and studying how they affect time, memory and cache performance.
  • Data structures and algorithms

  • Self-adjusting Computation. The idea is to keep track of dependencies while executing an algorithm so that you can go back, change history, and propagate the change to the output.
  • Computational Biology: algorithms for phylogenetic tree construction. We have been looking at algorithms that generate trees with maximum-parsimony.
  • Uniquely Represented Data Structures. Data structures that only depend on their contents and not the order of insertion or deletion have applications in security and concurrency.
  • and here are other things I've worked on:
  • Algorithms for Multiprocessor Garbage Collection. The goal here is to bound the wait time for a GC while also bounding the memory required.
  • The NESL Language
  • Provably Efficient Language Implementations
  • Compact Data Structures. We look at how to represent graphs, meshes, indices and sets with a number of bits that are asymptotically optimal and access/update times that are also asymptotically optimal.
  • Parallel Algorithms including work on Delaunay Triangulation, Treaps, and Sorting.
  • Other: Pipelining with Futures, Multibank Memory Systems.

  • Cache Efficient Parallel Algorithms

    We have been developing high-level algorithmic models for accounting for locality on parallel machines, developing algorithms which are efficient under these models, developing schedulers for the models, and proving bounds on runtime based on the models. This includes work reported in the following papers:

    Synchronization Primitives and Concurrent Algorithms

    Self Adjusting Computation

    It is often the case that by slightly changing the input to an algorithm, the execution path will only change slightly. The idea of self-adjusting computation is to let the algorithm "adjust" to such small changes by only updating the part of the computation that is required to be updated. The idea is implemented by maintaining a dependence graph of the computation and using this graph to propagate changes. The propagation will both update the output and update the graph itself (so that it is ready for another change).

    We are interested both in techniques for maintaining the graph, and in proving bounds for how much needs to be updated for different algorithms.

    Compact Data Structures for Graphs, Indices, Meshes and Sets

    We have looked at developing data structures that are compressed to within a constant factor of optimal while allowing access within a constant factor of optimal.

    Uniquely Represented Data Structures

    A data structure is uniquely represented if its layout in memory depends only on the contents (as seen by the interface). In particular it cannot depend on the ordering that objects were added or deleted. Such data structures have applications to security (to make sure that no information is revealed about the order of operations) and concurrency (to make sure the data structure ends up in the same state, independent of how operations end up being interleaved).

    Multiprocessor Garbage Collection

    Thread Scheduling

    Many of today's high level parallel languages support dynamic, fine-grained parallelism. The language implementation typically requires an efficient scheduling algorithm to assign computations to processors at runtime. However, in an attempt to expose a sufficient degree of parallelism to keep all processors busy, schedulers often create many more parallel threads than necessary, leading to excessive memory usage. Further, the order in which the threads are scheduled can greatly affect the total size of the live data at any instance during the parallel execution, and unless the threads are scheduled carefully, the parallel execution of a program may require much more memory than its serial execution.

    Our research involves finding theoretically space- and time-efficient scheduling algorithms for multithreaded programs, including programs with nested parallelism and synchronization variables. In addition to proving the space and time bounds of the resulting parallel schedules, we demonstrate that they are efficient in practice. We have implemented a runtime system that uses our algorithm to schedule threads in computations with nested parallelism. The following papers describe our results.

    Parallel Algorithms

    We have studied and experimented with parallel algorithms for a variety of problems, including sorting, Delaunay-triangulation, graph-connectivity, N-body simulations and set operations. The goal of our work has been to look at key problems and see what the very fastest algorithm we could develop for the problem is. Our work has consisted of both looking at theoretical and practical consequences. We have implemented all the algorithms we describe. In many cases our algorithms are the fastest available (at least at the time they were implemented). I've also written the chapter in the CRC Press Computer Science and Engineering Handbook (with Bruce Maggs). Our work includes the following.

    The NESL Programming Language

    In the early 90s we designed a parallel programming language called NESL (NESted-parallel Language) and implemented it on a wide variety of parallel machines. The goal has been to experiment with ideas of nested parallelism, and with language-based cost models. We have coded up a large number of algorithms in NESL, and in most cases the code is very much more concise than in other parallel programming languages. Many of these algorithms are available off of the NESL algorithm library page, and demos of some of them are available off of the algorithm animation page. An online tutorial and interpreter are also available. The Chapter on Parallel Algorithms in the CRC Computer Science and Engineering Handbook uses pseudocode based on NESL.

    Perhaps the most important feature of NESL is the model it supplies for analyzing costs. In particular it defines costs in terms of work (total number of operations) and depth (longest sequence of dependences, or critical path) and defines rules for composing these costs across expression. In general when making calls sequentially the sum is taken of both the work and depth, and when making calls in parallel the maximum is taken over the work and depth. This idea of using work and depth as cost measures has been adopted by other languages, such as the CILK programming language.

    In addition to the NESL home page the following papers describe our work on NESL.

    Provably-Efficient Language Implementations

    The idea of a provably-efficient language implementations is to
    1. Formally define an abstract cost model from which the users of the language can analyze their code.
    2. Specify a mapping from these costs to running times on a concrete machine model.
    3. Describe an implementation and prove that this mapping is achieved with the implementation.
    This work was originally motivated by the need to model time in the parallel programming language NESL, but we have applied it to other parallel programming models. We formalize the cost model using a profiling semantics, which is an operational semantics extended with cost measures. The following papers describe our results.

    Computational Biology: Maximum-Parsimony Phylogeny

    Other Research