Subset Barrier Synchronization on a Private-Memory Parallel System Anja Feldmann Thomas Gross David O'Hallaron Thomas M. Stricker School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Abstract A global barrier synchronizes all processors in a parallel system. This paper investigates algorithms that allow disjoint subsets of processors to synchronize independently and in parallel. The user model of a subset barrier is straight forward; a processor that parti- cipates in a subset barrier needs to know only the name of the barrier and the number of participating processors. This paper identifies two general communication models for private-memory parallel systems: the bounded buffer broadcast model and the anonymous destination message passing model and presents algorithms for bar- rier synchronization in the terms of these models. The models are detailed enough to allow meaningful cost estimates for their primitives, yet independent of a specific architecture and can be supported efficiently by a modern private-memory parallel system. The anonymous destination message passing model is the most at- tractive. The time complexity to synchronize over a uni-directional ring of N processors is O(log N) for common cases, and O(sqrt(N)) in the worst case. The algorithms have been implemented on iWarp, a private-memory parallel system and are now in daily use. The paper concludes with timing measurements obtained on a 64-node system. Note: Reprint from the proceedings of the ACM Symposium on Parallel Algorithms and Architectures, SPAA92 June 29 - July 1, 1992, San Diego, California.