Date: Wednesday, 20-Nov-96 20:04:38 GMT Server: NCSA/1.3 MIME-version: 1.0 Content-type: text/html Last-modified: Thursday, 07-Nov-96 19:41:15 GMT Content-length: 7148 CASHMERe Home Page

Coherence Algorithms for SHared MEmory aRchitectures


The CASHMERe Project


Overview

CASHMERe stands for "Coherence Algorithms for SHared MEmory aRchitectures" and is an ongoing effort to provide efficient, scalable, shared memory with minimal hardware support. It is well accepted today that commercial workstations offer the best price/performance ratio and that shared memory provides the most desirable programming paradigm for parallel computing. Unfortunately shared memory emulations on networks of workstations provide acceptable performance for only a limited class of applications. CASHMERe attempts to bridge the performance gap between shared memory emulations on networks of workstations and tightly-coupled cache-coherent multiprocessors while using minimal hardware support.

In the context of CASHMERe we have discovered that NCC-NUMA (Non Cache Coherent Non Uniform Memory Access) machines can greatly improve the performance of DSM systems, and approach that of fully hardware coherent multiprocessors. The basic property of NCC-NUMA systems is the ability to access remote memory directly; such a capability is offered by a variety of network interfaces including DEC's Memory Channel, HP's Hamlyn, and the Princeton Shrimp. Given current technology the additional hardware cost of NCC-NUMA systems over pure message passing systems is minimal. Based on this fact and our performance results we believe that NCC-NUMA machines lie near the knee of the price-performance curve.

The department of Computer Science at the University of Rochester is building a 32 processor Cashmere prototype. Significant part of the funding comes in the form of an equipment grant from Digital Equipment Corporation. The prototype consists of eight 4-processor DEC 2100 4/233 multiprocessors on a Memory Channel network. The Memory Channel plugs into any PCI bus. It provides a memory-mapped network interface with which processors can read and write remote locations without kernel intervention or inter-processor interrupts. End-to-end bandwidth is currently about 40MB/sec; remote write latency is about 3.5us. The next hardware generation is expected to increase bandwidth by approximately one order of magnitude, and cut latency by half. Cashmere augments the functionality of the Memory Channel by providing cache coherence in software.

Implementation of Cashmere

Slides from the Workshop on Scalable Shared Memory Multiprocessors, Boston, MA, October 1996.

CASHMERe People

The people behind CASHMERe are Michael L. Scott, Wei Li, Sandhya Dwarkadas, Leonidas Kontothanassis, Galen Hunt, Maged Michael, Robert Stets. Nikolaos Hardavellas, Sotirios Ioannidis, Wagner Meira, Alexandros Poulos, Michal Cierniak, Srinivasan Parthasarathy, and Mohammed Zaki.

CASHMERe papers

For comments and/or requests send mail to kthanasi@crl.dec.com or scott@cs.rochester.edu.
URCS Home Page