Experience with Resource Management Services on an Opportunistic Cluster

Jim Pruyne and Miron Livny

Department of Computer Sciences
University of Wisconsin--Madison
The Condor Project
{pruyne, miron}@cs.wisc.edu


Since release 3.3, PVM has provided an interface, which we co-developed with the PVM research group, for Resource Management (RM) services to be provided by user supplied tasks rather than within the PVM daemons. Using this interface we have developed the Condor Application Resource Management Interface (CARMI) which extends the set of RM services provided by PVM to help applications adapt to changes in resource availability at run-time. In an opportunistic cluster where resources are privately owned, resources may become idle and therefore available to CARMI applications or resources may be reclaimed by their owners and must therefore be vacated. To make writing master-workers type applications which run in this environment easier, we have developed a Work Distributor (WoDi). We are focusing on the master-workers paradigm because it is often used, and it lends itself well to adpating to changes in resources. WoDi gathers resources and starts worker processes for the application, and insures that every work step will be computed, even when resources are reclaimed by their owners. WoDi also monitors the characteristics of the work steps, and uses this information to intelligently assign work steps to worker processes, and to determine the number of resources which can be efficiently utilized. We have used CARMI and WoDi to run a few applications on our department's pool of 200 workstations. One of these applications is a first principles materials science program developed at Oak Ridge National Lab. By running this application numerous times, we have been able to see the value of WoDi's decision making abilities, and have learned more about the characteristics of the machines available in our pool. We are also using WoDi to distribute compilation steps generated by a parallel "make" facility, and to run a parallelized version of the POVRay ray-tracing software.
Last modified: Tue Jun 27 11:19:04 1995 by James Pruyne