Decoupling Synchronization and Data Transfer in Message Passing Systems of Parallel Computers T. Stricker 1), J. Stichnoth 1), D. O'Hallaron, S. Hinrichs 1) and T. Gross 1),2) (1) School of Computer Science (2) Institut f"ur Computer Systeme Carnegie Mellon University ETH Z"urich Pittsburgh, PA 15213, USA CH 8092 Z"urich, Switzerland Abstract Synchronization is an important issue for the design of a scalable parallel computer, and some systems include special hardware support for control messages or barriers. The cost of synchroniza tion has a high impact on the design of the message passing (communication) services. In this paper, we investigate three different communication libraries that are tailored toward the synchronization services available: (1) a version of generic send-receive message passing (PVM), which relies on traditional flow control and buffering to synchronize the data transfers; (2) message passing with pulling, i.e. a message is transferred only when the recipient is ready and requests it (as, e.g., used in NX for large messages); and (3) the decoupled direct deposit message passing that uses separate, global synchronization to ensure that nodes send messages only when the message data can be deposited directly into the final destination in the memory of the remote recipient. Measurements of these three styles on a Cray T3D demonstrate the benefits of the decoupled message passing with direct deposit. The performance advantage of this style is made possible by (1) preemptive syn chronization to avoid unnecessary copies of the data, (2) high-speed barrier synchronization, and (3) improved congestion control in the network. The designers of the communication system of future parallel computers are therefore strongly encouraged to provide good synchronization facilities in addition to high throughput data transfers to support high performance message passing.