ECO: Efficient Collective Operations Beta release 0.1b Bruce Lowekamp and Adam Beguelin School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 (C) 1996 All Rights Reserved NOTICE: Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted provided that the above copyright notice appear in all copies and that both the copyright notice and this permission notice appear in supporting documentation. Neither Carnegie Mellon University nor the Authors make any representations about the suitability of this software for any purpose. This software is provided ``as is'' without express or implied warranty. This research is sponsored in part by the Department of Defense Advanced Research Projects Agency and the National Science Foundation. -------------- About ECO Because message passing systems such as PVM give the programmer an abstract view of the system, it is difficult for the programmer to use resources, especially network resources, efficiently. This is particularly apparent in collective operations, where otherwise efficient algorithms can perform extremely poorly due to network bottlenecks. ECO helps to solve this problem by providing programs which analyze the network and a runtime library of routines which make use of this information about the network to provide more efficient implementations of common collective operations. Presently, ECO optimizes communication properly on slow networks such as ethernet. Current research is extending these techniques to high-performance networks, but this code is not available at this time. ECO runs on top of PVM and is implemented in C. C++ and FORTRAN versions may be forthcoming. -------------- Compiling ECO - make sure PVM_ROOT is set correctly - edit eco/include/eco.h and change the definition of DEFAULT_NETWORK_DATA_FILENAME to a file in your environment. This file must be readable on any processor that you spawn a job from. - By default, the ECO executables will be placed in $(HOME)/pvm3/bin/$(PVM_ARCH). If you want them to be placed elsewhere, edit the definition of PVM_BIN in the top-level Makefile - define a ECO_ROOT environment variable with the name of the directory in which ECO is installed. - return to the root eco directory and type make this should compile the library, test program, network analysis program and network partitioning programs -------------- Using ECO The first step to using ECO is to characterize the network connecting the machines you are using. To do this, add all of the machines to your PVM virtual machine. Make sure you have the network characterization program compiled for all architectures in your machine. Then run the characterization routine. After a brief delay, it should output a list of network latencies. Input this to the partitioning program and save the result in the file you specified as DEFAULT_NETWORK_DATA_FILENAME during instalation. example: eco_netanal | tee latencies eco_partition < latencies | tee my_machines Would work if my_machines was the file specified in eco.h Once this file is saved, this process does not need to be repeated unless additional machines become available for computation or machines are moved. WARNING: ECO currently uses a brute force approach to optimize the communication patterns at run time. This works well up to about 8 subnets. If ECO decides that there are more than 8 subnets, you are likely to have long delays during startup. If you know the proximity of machines in your network, it is probably better to alter the results somewhat to reduce the number of subnets to 7 or fewer. ----------- Writing programs for ECO For full details on ECO's functions, please read the file doc/ECO_API. An example of a simple ECO program is in doc/minimal.c. Hopefully this code is fairly self explanatory. ----------- NOTES ECO currently can only be compiled with gcc ---------- The improvement that ECO achieves in performance will depend on the type of interconnection network(s) used, the algorithm being compared against, etc, etc. Here are some things to look at to improve performance: - verify the network topology ECO's automatic topology determination is far from infallible. It will usually get the correct results, but sometimes mistakes happen. - use multivector operations Using the multivector forms of the operations can greatly improve performance by compressing several operations into one - check the degree of the tree used on each subnet ECO defaults to using a binary tree on each subnet. Further work is planned to adapt the pattern used on each subnet, but in the meantime, you may have better luck with other degrees depending on your network and machines --------- If you are interested in comparing the results of a few trivial operations implemented both by PVM and ECO, the pvm_bench and eco_bench executables can be compiled by running make b from the root directory. These programs perform a number of broadcast and reduction operations and report how long the operations took. --------- eco_test is a program which was designed to test the basic operation of ECO. Basically it will run on all available machines and print a message as it finishes each test. If any errors are found, they will be reported. Please forward any such error messages, along with the network data file, and a copy of the output of "ps a" issued at the pvm console prompt showing where each task was spawned, to eco-help@cs.cmu.edu. ---------------------- Comments and Questions Because this is a Beta release, we are very interested in receiving any comments or questions you might have about this software. Please direct any mail, including bug reports, to eco-help@cs.cmu.edu.