iWarp: An Integrated Solution to High-Speed Parallel Computing Shekhar Borkar, Robert Cohn, George Cox, Sha Gleason, Thomas Gross, H. T. Kung, Monica Lam, Brian Moore, Craig Peterson, John Pieper, Linda Rankin, P. S. Tseng, Jim Sutton, John Urbanski, and Jon Webb Department of Computer Science Intel Corporation, JF1-60 Carnegie Mellon University 5200 N.E. Elam Young Pkwy Pittsburgh, Pennsylvania 15213 Hillsboro, Oregon 97124 Abstract iWarp is a system architecture for high speed signal, image and scientific computing. The heart of an iWarp system is the iWarp component: a single chip processor that requires only the addition of memory chips to form a complete system building block, called the iWarp cell. Each iWarp component contains both a powerful computation engine (20 MFLOPS) and a high throughput (320 MBytes/sec), low latency (100-150 ns) communication engine for interfacing with other iWarp cells. Because of its strong computation and com- munication capabilities, the iWarp component is a versatile building block for various high performance parallel systems. These systems range from special purpose systolic arrays to general purpose distributed memory computers. They are able to support both fine-grain parallel and coarse-grain dis- tributed computation models simultaneously in the same sys- tem. An iWarp system can include a large number of cells; the initial iWarp demonstration system consists of an 8x8 torus of iWarp cells, delivering more than 1.2 GFLOPS. It can be expanded to include up to 1,024 cells. This paper describes the iWarp architecture and how it supports various communication models and system configurations. Note: reprint from proceedings of Supercomputing '88, Kissimmee, Florida, November 14-18, 1988