Distributed Computing on Nectar

Network-based multicomputers differ substantially from traditional multicomputers such as Paragon and NCube. While these systems are homogeneous and typically used in dedicated mode, network-based multicomputers are often heterogeneous and both the network and the nodes are shared with other users. The focus of the distributed computing research is on building tools that help users in dealing with these issues that are specific to network-based multicomputers.

The tools developed in the context of the Nectar project fall in two classes. First users need monitoring tools so that they can trace both the system and their application. Since network-based multicomputers can be very dynamic, such tools are essential to understanding the behavior of applications. An example of such a tool is Bee, a configurable monitoring tool for distributed programs. A second class of tools consist of dynamic load balancers that move work are runtime so that the available computational resources are used efficiently. Finally, project members have worked on automatic checkpointing tools that allow recovery from node failures.

Both application experience and tool development have been reported on in more detail in a number of papers.