Darwin: Resource Management for Application-Aware Networks

Peter Steenkiste, Allan Fisher, Hui Zhang

School of Computer Science, Carnegie Mellon University

This research is motivated jointly by a vision of future communication applications that involve multi-party calls moving massive amounts of time-critical data, as well as a vision of a communication marketplace in which many entities may add value in a chain between bitway supplier and end user. In this overview, we describe this vision, lay out key resource management problems that must be solved in order to realize it, and describes our plans for the design, implementation and experimental evaluation of a coordinated set of resource management mechanisms that will supply the needed functionality with high efficiency.


Motivation

Sophisticated multi-party applications will use many traffic streams with very different characteristics and will be network-aware so they can perform well on a variety of networks. At the same time, we see the emergence of an electronic service industry that is eager to deliver a wide variety of services to end-users. Services will range from low-level "bearer" services that transport bit streams over the network infrastructure to value-added services such as video conferencing, computing services, and data mining. Complex applications will support cooperation among multiple parties by combining video conferencing with access to large amounts of archived data, real time data streams, and distributed computing tasks.

Supporting this service model and this emerging class of complex services requires innovation in a number of areas. First, the requirements on how the network should handle traffic streams will be very diverse, both in terms of the ability to share resources between cooperating traffic streams and the quality of service for individual streams. Second, conditions in the network and at the endpoints will change continuously, and mechanisms are needed that allow the network, services and application to adjust quickly. Third, in many cases, applications and services have advance knowledge of changes in resource requirements, and mechanisms are needed to make use of this information to optimize performance. Fourth, we have to develop systematic methods for balancing the constraints and priorities of services competing for network resources. Finally, mechanisms have to be put in place so that services can be provided in a robust and secure manner. We outline a set of resource management mechanisms in support of such "application-aware" networks.

Network model

We view network entities as playing one of three roles, as is illustrated in the figure above:

Let us illustrate the model using a video conferencing scenario. The application provides information on participants, connectivity, limits on simultaneous activity on the connections, etc. If the conference is dynamic, the information has to include a characterization of the dynamic behavior. The video conferencing service will combine the application information with a requirements specification specific to video conferencing. This results in a session specification that is used to contract for bearer resources and computation capability to satisfy the session requirements. The service mesh thus constructed provides an abstracted view of the underlying bearer resources, in which only those links and switching points at the periphery of the bearers' meshes are visible; this is illustrated in the figure below. The service layer will also provide the bearer layer with information on how the traffic belonging to the new application should be handled. These instructions will be translated into specific switch schedules, buffer allocations, etc. by the bearer.

Throughout the video conference resource allocation will be adjusted. These adjustments can be triggered by the application (e.g. a receiver changes the video source it wants to see), by the service layer (e.g. reallocating resources in the service mesh) or by the bearer layer (e.g. changes in available bandwidth). The adjustments will be very different: the simplest changes will require only local changes on one or a small number of nodes, while others, like rerouting, will require a global view. Ideally, many changes can be accommodated through local adjustments, since this will result in more responsive and efficient service.

A useful view of this functionality is provided by the "active network" concept under discussion among a group of ARPA-supported researchers. In this view, the network conveys objects that may carry methods to be invoked at different points, and may "program" switching and processing entities in application-specific ways. Our research fits well into this model. The resource allocation capabilities we envision are needed to allow the network to host the customized computations envisioned for active networks. At the same time, active network mechanisms will be used in support of the customization of resource management to meet application needs. We call this type of network "application-aware" since it directly cooperates with applications and services to maximize network responsiveness to application requirements. In the following section, we elucidate the research issues that we plan to pursue in developing the underpinnings of such an "application-aware" network.

Responsive application-oriented networking

At the core of managing an application-aware network is a complex resource allocation problem. An important insight is that the resource allocation problem can be broken up along three dimensions: resource allocation in space when the virtual application mesh is established; resource allocation decisions at different time scales, when different amounts of information are available with different accuracy; and resource allocation decisions by different organizational entities. We plan to address these problems using three mechanisms: a virtual application mesh that represents the network resources allocated by the application, a switch node architecture that supports resource allocation on different time scales, and a resource management framework that allows the systematic representation of the goals and constraints of different service organizations. We describe these mechanisms in more detail below.

Allocating a virtual application mesh

A virtual application mesh represents the resources allocated for the application, e.g. bandwidth on links, "capacity" on switch nodes, and virtual end-point resources. Virtual meshes differ from traditional routes in two fundamental ways. First, resources are associated with an application session, and not with individual connections. Second, while switches traditionally make resource allocation decisions fairly independently, switch nodes coordinate resource allocation both at startup and runtime. These features make it possible to optimize resource allocation using global goals and constraints (e.g. limit total resource utilization) instead of local ones (e.g. fairness on a per-link and per-connection basis). The virtual application mesh also supports application awareness: the application has considerable flexibility over how resource inside the virtual mesh are allocated initially and reallocated, for example in response to changes in the conditions in the network or in the application requirements. These changes must be handled quickly and efficiently. How the virtual application mesh is laid out will impact the cost of these adjustments, so the initial mesh allocation problem has a planning component. However, allocating a virtual application mesh is considerably more complex than to traditional point-to-point or point-to- multipoint routing. It requires allocating more diverse resources, and also the simultaneous allocation of resources for a large number of application streams.

Switch point management

Service-related activities in networks take place on a wide range of time scales. However, resource allocation decisions are typically limited, e.g. per-packet scheduling inside the network combined with a transport protocol executing on the endpoint. Responsive application-aware networking will require coordinated resource allocation at many time scales. Coordination can take the form of exchanging information regarding status and present and future requirements, coordination of resource allocation activities, and the specification (using a language or program) of actions to be taken under certain conditions. These interactions will pay off in several areas. Examples include application- and service-specific dynamic sharing of resources between streams under changing network conditions; the implementation of services models that sit in between best effort and guaranteed services, for example using application provided hints; application- and service-specific flow control; and the extension of the time horizon of schedulers, for example by considering application frame boundaries during scheduling.

Critical to the implementation of this network model is the ability to safely inject application and service state and code into the network. This enables application and service specific actions on switching points without the need for expensive and time consuming interactions with endpoints. Interactions between switching points and the application running on the endpoints will of course still be required, but they can be done in a service-specific instead of a generic fashion. The application and service "presence" in the network can range from simple parameters, through specifications using more complex languages, to actual code. The full spectrum of mechanisms will be needed in a fully application-oriented network. What mechanism will be used for a specific action will not only depend on what degree of flexibility is needed, but also on practical considerations. For example, including service code for cell-level switching is impractical.

Resource management framework

While traditional networks allocate resources on a per packet or per connection basis, resource allocation in application-aware networks is subject to constraints and goals of a variety of entities including link owners, service providers, and applications. We briefly describe a hierarchical resource management framework that allows the systematic integration of these goals and constraints.

The resource allocation policy of a communication link is represented by a directed acyclic graph with a single root representing the link and leaf nodes representing individual traffic streams. Intermediate nodes represent organizational entities. Each node gets resources from its parents and specifies how its resources are distributed to its children. Examples of policies include fair-sharing at different granularities, reservation, and strict priority. This graph is a language that can be used by different entities to specify how traffic streams or collections of traffic streams should share bandwidth. By combining subgraphs, the resource management policies set by different entities (link, service providers, applications) can be represented simultaneously. Tools are used to translate a graph into a schedule that can be used by switch nodes, and incremental changes in the graph translate into incremental changes in the schedule.

Robustness and security

Our approach lends itself naturally to dealing with failure recovery in an integrated fashion throughout the resource allocation process. One of the "changes" that can be considered during the creation of the virtual mesh is the failure of nodes and links, and both the topology of the virtual application mesh and the instructions to switching points can implicitly and explicitly prepare for a quick response to failures. The degree of robustness can be application and service specific. The issue of security shows up in a number of areas. First, the network has to verify that application input (parameters, specification and programs) can be acted upon safely since incorrect input can endanger the operation of the network. This requires a combination of language, compilation and runtime techniques, and we plan to use mostly existing technology for this security aspect. Second, the network has to guarantee that only authorized entities can modify the network state. We plan to address this using existing authentication methods. Note that the virtual application mesh can form the basis for providing security. As part of the creation of the mesh, relationships of trust can be established between the nodes in the mesh. This can speed up security checking during execution.

Experimental approach

Carnegie Mellon University is developing a comprehensive suite of resource management mechanisms in support of such "application-aware" networks. We will support resource allocation along three dimensions: resource allocation in the "space" consisting of the physical network infrastructure and attached processing and storage resources; decision making on different time scales, ranging from application startup to packet and cell scheduling; and resource allocation by different organizational entities sharing the infrastructure. In all three dimensions, the mechanisms we develop will provide for extensive tailoring to application requirements. The resource management techniques will be evaluated in a testbed driven by increasingly more aggressive applications and services. Tesbed development will take place in three steps. Initially, we will use the existing Credit Net ATM network for quick experiments that will guide the design of interfaces and protocols. A second step will be a local area version of our network architecture and resource management software. Finally, we plan to perform a wide area evaluation, hopefully in cooperation with groups working on related research topics.

Last modified by prs@cs.cmu.edu in October 1996.