Khalil Amiri, David Petrou, Greg Ganger, and Garth Gibson
Carnegie Mellon University
Effectively utilizing cluster resources is a difficult problem for distributed applications. Since remote communication is much more expensive than local communication, the performance of these applications is sensitive to the distribution of their functions across the network. As a result, using cluster resources effectively requires not only load balancing, but also proper partitioning of functionality among nodes. While software engineering techniques (e.g., modularity and object orientation) have given us the ability to partition applications into a set of interacting objects, we do not yet have solid techniques for determining where in the cluster each of these functions should run, and deployed systems continue to rely on complex manual decisions made by programmers and system administrators.
Optimal function placement in a cluster is difficult because the right answer is usually, ``it depends.'' Specifically, it depends on a variety of cluster characteristics (e.g., communication bandwidth between nodes, relative processor speeds among nodes) and workload characteristics (e.g., bytes moved among functions, instructions executed by each function). Some are basic hardware characteristics that only change when something fails or is upgraded, and thus are relatively constant for a given system. Other characteristics cannot be determined until application invocation time because they depend on input parameters. Worst of all, many change at run-time due to phase changes in application behavior or competition between concurrent applications over shared resources. Hence, any ``one system fits all'' solution will cause suboptimal, and in some cases disastrous, performance.
Approach
In this project, we focus on an important class of applications for which clusters are very appealing: data-intensive applications that selectively filter, mine, sort, or otherwise manipulate large data sets. Such applications benefit from the ability to spread their data-parallel computations across source/sink servers, exploiting the servers' computational resources and reducing the required network bandwidth. Effective function partitioning for these data-intensive applications will become even more important as processing power becomes ubiquitous, reaching devices and network-attached appliances.
We observe that these data-intensive applications have characteristics that simplify the tasks involved with dynamic function placement. Specifically, these applications all process and move significant amounts of data, enabling a monitoring system to quickly learn about the most important inter-object communication patterns and per-object resource requirements. This information allows the run-time system to rapidly identify functions that should be moved to reduce communication overheads or resource contention.
We designed and implemented a prototype system, called Abacus, which automates the placement of the objects in data-intensive applications and filesystems between clients and servers. Abacus consists of a programming model and a run-time system. The Abacus programming model encourages the programmer to compose data-intensive applications from small, functionally independent components or objects. These mobile objects provide explicit methods which checkpoint and restore their state during migration. Figure 1 represents a sketch of objects in Abacus.
The Abacus run-time system consists of (i) a migration and location-transparent invocation component; and (ii) a resource monitoring and management component, as shown in Figure 2. The first component is responsible for the creation of location-transparent references to mobile objects, for the redirection of method invocations in the face of object migrations, and for enacting object migrations.
The resource monitoring and management component uses the notifications to collect statistics about bytes moved between objects and about the resources used by active objects (e.g., amount of memory allocated, and number of instructions executed per byte processed). Moreover, this component monitors the availability of resources throughout the cluster (e.g., node load and available bandwidth on network links). An analytic model is used to predict the performance benefit of moving to an alternative placement. The model also takes into account the cost of migration - the time needed to acquiesce and checkpoint an object, transfer its state to the target node, and restore it on that node. Using this analytic model, the component arrives at the placement with the best net benefit.
![]() |
![]() |
|
|
|
Preliminary results
Figure 3 below depicts the architecture of an
object-based filesystem that we built on top of Abacus. Figure 4 depicts possible placements for RAID
objects in our environment.
![]() |
![]() |
| Figure 3. This figure shows the architecture of an object-based distributed filesystem built for Abacus. Files are bound to an object stack. | Figure 4. RAID example. This figure depicts different placements for the RAID object as the client-server network bandwidth varies. |
Our environment consists of two networks, a switched 100Mbps Ethernet, which we refer to as the SAN (server-area network) and a shared 10Mbps segment, which we refer to as the LAN (local-area network). All four storage servers are directly connected to the SAN. Four of the eight clients, are connected to the SAN (called SAN clients), and the other four clients reside on the LAN (the LAN clients). The LAN is bridged to the SAN via a 10Mbps link. Clients and servers are standard PCs running RedHat Linux 5.2 on 300MHz Pentium II processors.
Figure 5 through Figure 8 below show the performance of applications when function is statically placed at the client or at the server, and when it is dynamically placed by Abacus after being started on the client.
|
|
|
|
|
|
|
|
|
Figure 8. Concurrent filter benchmark. This figure plots the cumulative number of blocks searched by two filters versus elapsed time. In this experiment, the server's memory is artificially constrained so that only one filter can execute at the server. After the more selective Filter 2 begins running, Abacus's online placement algorithm correctly chooses to move the less selective Filter 1 back to the client in order to allow Filter 2 to be executed at the server. |
Our initial experiments demonstrate that Abacus can effectively adapt
to variations in network topology, application cache access pattern,
application data reduction (filter selectivity), contention over
shared data, significant changes in application behavior at run-time,
as well as dynamic competition from concurrent applications over
shared server resources. Our preliminary results are quite promising;
Abacus often improved application response time over 6X. Under all
these experiments, Abacus selects the best placement for each
function, "correcting" placement when function was initially started
on the "wrong" node. Under more complex scenarios, Abacus outperforms
experiments in which function was statically placed at invocation
time, converging to within 70% of the maximum achievable performance.
Furthermore, Abacus adapts placement without knowledge of the
semantics implemented by the objects. The adaptation is based only on
black-box monitoring of the object and the bytes moved between
objects.
Summary
Effectively utilizing cluster resources is a difficult problem for
distributed applications. Since remote communication is much more
expensive than local communication, the performance of these
applications is sensitive to the distribution of their functions
across the network. Optimal function placement in a cluster is
difficult because it depends on several characteristics, many
of which vary at run-time.
Black-box monitoring of application objects, in combination with an
intuitive object-based mildly-constrained programming model can
provide sufficient support for automatic partititioning of filesystem
and application function in data-intensive cluster computing. We
believe that this automatic approach will help reduce the exorbitant
management costs of clusters.
More info
For more information on Abacus: