A New Tracing Facility for PVM 3.4

James Arthur Kohl and G. A. Geist

1995 PVM User's Group Meeting

ABSTRACT

One of the more bothersome aspects of developing a parallel program is that of monitoring the behavior of the program for debugging and performance tuning. This problem is typically alleviated by capturing a trace history during run-time to monitor the execution and then analyze it post-mortem. The current release of PVM, version 3.3, includes a built-in tracing instrumentation for this purpose. However, this tracing facility is in many ways inefficient and inflexible. Inefficiency in tracing can cause critical problems when debugging a parallel application, due to potential timing race conditions. Lack of flexibility in a tracing system can make maintenance and upgrading difficult or even impossible.

The upcoming release of PVM, Version 3.4, will contain a new and improved tracing facility which provides more flexible and efficient access to run-time program information. This new tracing system will support a buffering mechanism which will allow trace events to be collected locally and then dumped in larger, more efficient messages, rather than many smaller ones, thereby reducing intrusion. A more flexible trace event definition scheme will also be instantiated, based more directly on the SDDF (Self-Defining Data Format, Reed et al) trace syntax. This new scheme expedites the collection and analysis of execution histories by custom user tools and XPVM, and allows for integration of custom user trace events and system upgrades. The new tracing instrumentation will still be built into the PVM library to avoid re-compilation, and additionally will supports on-the-fly adjustments to each task's trace event mask, to interactively control the level of tracing detail.

Along with this new tracing facility, XPVM will be updated to provide better access to the new tracing functionality. Several new views will be implemented to utilize the additional tracing information now possible, and user-defined events will be included in existing views. The XPVM system has also been optimized to provide better real-time monitoring capabilities.

A Postscript copy of the presentation slides is available.

Please direct all correspondence to:
James Arthur Kohl, Ph.D.
Oak Ridge National Laboratory
Computer Science and Mathematics Division
Mathematical Sciences Section
Computer Science Group
P.O. Box 2008, Bldg 6012
Oak Ridge, TN 37831-6367
kohl@msr.epm.ornl.gov