Date: Wed, 15 Jan 1997 00:15:50 GMT Server: NCSA/1.5.1 Last-modified: Tue, 13 Aug 1996 21:27:09 GMT Content-type: text/html Content-length: 2255
My research addresses the problem of fault-tolerance in distributed memory parallel computers or multicomputers. While multicomputers have a number of attractive features, one difficulty in these systems is that of computing in the presence of faults. This problem is particularly acute in massively parallel architectures employing very large numbers of nodes.
The objective of this research is to develop practical algorithms for fault-tolerant unicast and multicast routing in wormhole-routed multicomputers. In addition, we seek to obtain tight theoretical bounds on the number of faults that can be tolerated with a given set of resources. In particular, my students and I are developing efficient fault-tolerant adaptive routing algorithms for the mesh and torus topologies. Several technical reports and papers describing recent results are available in postscript.
A number of Harvey Mudd students have taken an active role in this research project. They are Eli Brandt ('95), Tom Hehre ('96), Kevin Watkins ('97), Andrew Hutchings ('98), and Mark Reyes ('98).
Tom Hehre and Kevin Watkins developed the Mesh Architecture Routing Simulator (MARS). MARS is an event-driven simulation package for studying wormhole-routed meshes. MARS is implemented in C++ and has an X-windows-based graphical user interface that allows visualization of network traffic and other network characteristics. MARS allows easy definition of router architectures, routing algorithms, and selection policies. Virtual channels are implemented with demand-based multiplexing and the number of internal channels and other router characteristics are user-definable. A number of router and channel latency parameters are also user-definable in order to facilitate fair comparisons between algorithms based on different routing models. An example of the MARS interface is shown below.