Many of today's high level parallel languages support dynamic, fine-grained parallelism. The language implementation typically requires an efficient scheduling algorithm to assign computations to processors at runtime. However, in an attempt to expose a sufficient degree of parallelism to keep all processors busy, schedulers often create many more parallel threads than necessary, leading to excessive memory usage. Further, the order in which the threads are scheduled can greatly affect the total size of the live data at any instance during the parallel execution, and unless the threads are scheduled carefully, the parallel execution of a program may require much more memory than its serial execution.
This research involves finding theoretically space- and time-efficient scheduling algorithms for multithreaded programs, including programs with nested parallelism and synchronization variables. In addition to proving the space and time bounds of the resulting parallel schedules, we demonstrate that they are efficient in practice. We have implemented a runtime system that uses our algorithm to schedule threads in computations with nested parallelism.
Recently, we added this space-efficient scheduling technique to the native Pthreads library on Solaris, and studied the performance of 7 dynamic or irregular parallel benchmarks written using Pthreads We show how simple, yet provably good modifications to the library can result in significantly improved space and time performance. (click here for the paper). With the modified Pthreads library, each of the parallel benchmarks performs as well as its coarse-grained, hand-partitioned counterpart. These results demonstrate that, provided we use a good scheduler, the rich functionality and standard API of Pthreads can be combined with the advantages of dynamic, lightweight threads to result in high performance.