Date: Wednesday, 20-Nov-96 20:04:38 GMT
Server: NCSA/1.3
MIME-version: 1.0
Content-type: text/html
Last-modified: Thursday, 07-Nov-96 19:41:15 GMT
Content-length: 7148
CASHMERe Home Page
Coherence Algorithms for SHared MEmory aRchitectures
The CASHMERe Project
CASHMERe stands for "Coherence Algorithms for SHared MEmory
aRchitectures" and is an ongoing effort to provide efficient,
scalable, shared memory with minimal hardware support. It is well
accepted today that commercial workstations offer the best
price/performance ratio and that shared memory provides the most
desirable programming paradigm for parallel computing. Unfortunately
shared memory emulations on networks of workstations provide
acceptable performance for only a limited class of applications.
CASHMERe attempts to bridge the performance gap between shared memory
emulations on networks of workstations and tightly-coupled
cache-coherent multiprocessors while using minimal hardware support.
In the context of CASHMERe we have discovered that NCC-NUMA (Non Cache
Coherent Non Uniform Memory Access) machines can greatly improve the
performance of DSM systems, and
approach that of fully hardware
coherent multiprocessors. The basic property of NCC-NUMA systems
is the ability to access remote memory directly; such a capability is
offered by a variety of network interfaces including DEC's Memory
Channel, HP's Hamlyn, and the Princeton Shrimp. Given current
technology the additional hardware cost of NCC-NUMA systems over pure
message passing systems is minimal. Based on this fact and our
performance results we believe that NCC-NUMA machines lie near the knee
of the price-performance curve.
The department of Computer
Science at the University of
Rochester is building a 32 processor
Cashmere prototype. Significant part of the funding comes in the
form of an equipment grant from Digital
Equipment Corporation. The prototype consists of eight
4-processor DEC
2100 4/233 multiprocessors on a Memory
Channel network. The Memory Channel plugs into any PCI bus. It
provides a memory-mapped network interface with which processors can
read and write remote locations without kernel intervention or
inter-processor interrupts. End-to-end bandwidth is currently about
40MB/sec; remote write latency is about 3.5us. The next hardware
generation is expected to increase bandwidth by approximately one
order of magnitude, and cut latency by half. Cashmere augments the
functionality of the Memory Channel by providing cache coherence in
software.
Slides from the
Workshop on Scalable Shared Memory Multiprocessors,
Boston, MA, October 1996.
The people behind CASHMERe are
Michael L. Scott,
Wei Li,
Sandhya Dwarkadas,
Leonidas Kontothanassis,
Galen Hunt,
Maged Michael,
Robert Stets.
Nikolaos Hardavellas,
Sotirios Ioannidis,
Wagner Meira,
Alexandros Poulos,
Michal Cierniak,
Srinivasan Parthasarathy,
and
Mohammed Zaki.
-
G. C. Hunt and M. L. Scott.
``Using Peer Support to Reduce Fault-Tolerant Overhead in
Distributed Shared Memories''.
TR 626, Computer Science Department, University of Rochester, June 1996.
- L. I. Kontothanassis and M. L. Scott.
``Efficient Shared Memory with Minimal Hardware Support''. In
Computer Architecture News, September 1995.
- L. I. Kontothanassis and M. L. Scott.
``Using Memory-Mapped Network Interfaces to Improve the Performance of
Distributed Shared Memory''. In Proc., 2nd HPCA, San Jose, CA,
February 1996.
- L. I. Kontothanassis, M. L. Scott, and R. Bianchini. ``Lazy Release Consistency for
Hardware-Coherent Multiprocessors''. In Proc., SUPERCOMPUTING '95,
San Diego, CA, December 1995.
- L. I. Kontothanassis and M. L. Scott.
``Software Cache Coherence for Current and Future
Architectures''. In Special JPDC Issue on Scalable Shared Memory,
November 1995, V29, N2, pp 179-195.
- L. I. Kontothanassis and M. L. Scott.
``Software Cache Coherence for Large Scale Multiprocessors''. In
Proc., 1st HPCA, Raleigh, NC, January 1995.
- M. Marchetti, L. I. Kontothanassis, R. Bianchini, and
M. L. Scott.
``Using Simple Page Placement Policies to Reduce the Cost of Cache
Fills in Coherent Shared-Memory Systems''. In Proc., IPPS '95,
Santa Barbara, CA, April 1995.
- M. Cierniak and Wei Li.
``Unifying Data and Control Transformations for Distributed
Shared-Memory Machines''. In Proc., SIGPLAN '95 PLDI, La Jolla,
CA, June 1995. Also available as TR 542.
For comments and/or requests send mail to
kthanasi@crl.dec.com
or
scott@cs.rochester.edu.
URCS Home Page