Darwin: Resource Management for Application-Aware Networks
Peter Steenkiste, Allan Fisher, Hui Zhang
School of Computer Science, Carnegie Mellon University
This research is motivated jointly by a vision of future communication
applications that involve multi-party calls moving massive amounts of
time-critical data, as well as a vision of a communication marketplace in
which many entities may add value in a chain between bitway supplier and end
user. In this overview, we describe this vision, lay out key resource
management problems that must be solved in order to realize it, and
describes our plans for the design, implementation and experimental
evaluation of a coordinated set of resource management mechanisms that will
supply the needed functionality with high efficiency.
Motivation
Sophisticated multi-party applications will use many traffic streams with
very different characteristics and will be network-aware so they can perform
well on a variety of networks. At the same time, we see the emergence of an
electronic service industry that is eager to deliver a wide variety of
services to end-users. Services will range from low-level "bearer" services
that transport bit streams over the network infrastructure to value-added
services such as video conferencing, computing services, and data mining.
Complex applications will support cooperation among multiple parties by
combining video conferencing with access to large amounts of archived data,
real time data streams, and distributed computing tasks.
Supporting this service model and this emerging class of complex services requires innovation in a
number of areas. First, the requirements on how the network should handle traffic streams will be
very diverse, both in terms of the ability to share resources between cooperating traffic streams and
the quality of service for individual streams. Second, conditions in the network and at the
endpoints will change continuously, and mechanisms are needed that allow the network, services
and application to adjust quickly. Third, in many cases, applications and services have advance
knowledge of changes in resource requirements, and mechanisms are needed to make use of this
information to optimize performance. Fourth, we have to develop systematic methods for
balancing the constraints and priorities of services competing for network resources. Finally,
mechanisms have to be put in place so that services can be provided in a robust and secure manner.
We outline a set of resource
management mechanisms in support of such "application-aware" networks.
Network model
We view network entities as playing one of three roles, as is illustrated in the figure above:
- a bearer provides links and switching points that move bits among endpoints according to
certain simple agreements about resource allocation.
- a service provider packages bearer functions and its own computing resources to provide
services ranging from simple services such as CBR ATM connections to complex value added
services such as multimedia conferencing services that adapt media rates and formats among
heterogeneous endpoints, or distributed computing and storage capabilities. Services can be
hierarchical, i.e. a complex service might be built on top of simpler services. We will
distinguish between services that require handling the data and those that don't. The latter can
be handled directly in the switching points, while the former might be implemented on service
nodes. Service nodes are outside of the core network and are represented in our model by "virtual
endpoints".
- an application runs on a set of computational endpoints and operates by invoking services in
the network. Applications can be very simple, (e.g. dialing a number on a phone to get
directory assistance) or more complex (e.g. a distributed computing application that uses a
collective communication service).
Let us illustrate the model using a video conferencing scenario. The
application provides information on participants, connectivity, limits on
simultaneous activity on the connections, etc. If the conference is
dynamic, the information has to include a characterization of the dynamic
behavior. The video conferencing service will combine the application
information with a requirements specification specific to video
conferencing. This results in a session specification that is used to
contract for bearer resources and computation capability to satisfy the
session requirements. The service mesh thus constructed provides an
abstracted view of the underlying bearer resources, in which only those
links and switching points at the periphery of the bearers' meshes are
visible; this is illustrated in the figure below. The service layer will
also provide the bearer layer with information on how the traffic belonging
to the new application should be handled. These instructions will be
translated into specific switch schedules, buffer allocations, etc. by the
bearer.
Throughout the video conference resource allocation will be adjusted. These adjustments can be
triggered by the application (e.g. a receiver changes the video source it wants to see), by the
service layer (e.g. reallocating resources in the service mesh) or by the bearer layer (e.g. changes
in available bandwidth). The adjustments will be very different: the simplest changes will require
only local changes on one or a small number of nodes, while others, like rerouting, will require a
global view. Ideally, many changes can be accommodated through local adjustments, since this
will result in more responsive and efficient service.
A useful view of this functionality is provided by the "active network" concept under discussion
among a group of ARPA-supported researchers. In this view, the network conveys objects that
may carry methods to be invoked at different points, and may "program" switching and processing
entities in application-specific ways. Our research fits well into this model. The resource
allocation capabilities we envision are needed to allow the network to host the customized
computations envisioned for active networks. At the same time, active network mechanisms will
be used in support of the customization of resource management to meet application needs. We call
this type of network "application-aware" since it directly cooperates with applications and services
to maximize network responsiveness to application requirements. In the following section, we
elucidate the research issues that we plan to pursue in developing the underpinnings of such an
"application-aware" network.
Responsive application-oriented networking
At the core of managing an application-aware network is a complex resource
allocation problem. An important insight is that the resource allocation
problem can be broken up along three dimensions: resource allocation in
space when the virtual application mesh is established; resource allocation
decisions at different time scales, when different amounts of information
are available with different accuracy; and resource allocation decisions by
different organizational entities. We plan to address these problems using
three mechanisms: a virtual application mesh that represents the network resources
allocated by the application, a switch node architecture that supports
resource allocation on different time scales, and a resource management
framework that allows the systematic representation of the goals and
constraints of different service organizations. We
describe these mechanisms in more detail below.
Allocating a virtual application mesh
A virtual application mesh represents the resources
allocated for the application, e.g. bandwidth on links, "capacity" on switch
nodes, and virtual end-point resources.
Virtual meshes differ from traditional routes in two fundamental ways. First, resources are
associated with an application session, and not with individual connections. Second, while
switches traditionally make resource allocation decisions fairly independently, switch nodes
coordinate resource allocation both at startup and runtime. These features make it possible to
optimize resource allocation using global goals and constraints (e.g. limit total resource utilization)
instead of local ones (e.g. fairness on a per-link and per-connection
basis).
The virtual application mesh also supports application awareness:
the application has considerable flexibility over how resource inside the
virtual mesh are allocated initially and reallocated,
for example in response to changes in the conditions in the network or in
the application requirements. These changes must be handled quickly and efficiently. How the
virtual application mesh is laid out will impact the cost of these adjustments, so the initial mesh
allocation problem has a planning component.
However, allocating a
virtual application mesh is considerably more complex than to traditional point-to-point or point-to-
multipoint routing. It requires allocating more diverse resources, and also
the simultaneous allocation of resources for a large number of application
streams.
Switch point management
Service-related activities in networks take place on a wide range of time scales. However, resource
allocation decisions are typically limited, e.g. per-packet scheduling inside the network combined
with a transport protocol executing on the endpoint. Responsive application-aware networking will
require coordinated resource allocation at many time scales. Coordination
can take the form of exchanging information regarding status and present and future requirements,
coordination of resource allocation activities, and the specification (using a language or program)
of actions to be taken under certain conditions. These interactions will pay off in several areas.
Examples include application- and service-specific dynamic sharing of resources between streams
under changing network conditions; the implementation of services models that sit in between best
effort and guaranteed services, for example using application provided hints; application- and
service-specific flow control; and the extension of the time horizon of schedulers, for example by
considering application frame boundaries during scheduling.
Critical to the implementation of this network model is the ability to safely inject application and
service state and code into the network. This enables application and service specific actions on
switching points without the need for expensive and time consuming interactions with endpoints.
Interactions between switching points and the application running on the endpoints will of course
still be required, but they can be done in a service-specific instead of a generic fashion. The
application and service "presence" in the network can range from simple parameters, through
specifications using more complex languages, to actual code. The full spectrum of mechanisms
will be needed in a fully application-oriented network. What mechanism will be used for a specific
action will not only depend on what degree of flexibility is needed, but also on practical
considerations. For example, including service code for cell-level switching is impractical.
Resource management framework
While traditional networks allocate resources on a per packet or per connection basis, resource
allocation in application-aware networks is subject to constraints and goals of a variety of entities
including link owners, service providers, and applications. We briefly describe a hierarchical resource
management framework that allows the systematic integration of these goals and constraints.
The resource allocation policy of a communication link is represented by a directed acyclic graph
with a single root representing the link and leaf nodes representing individual traffic streams.
Intermediate nodes represent organizational entities. Each node gets resources from its parents and
specifies how its resources are distributed to its children. Examples of policies include fair-sharing
at different granularities, reservation, and strict priority. This graph is a language that can be used
by different entities to specify how traffic streams or collections of traffic streams should share
bandwidth. By combining subgraphs, the resource management policies set by different entities
(link, service providers, applications) can be represented simultaneously. Tools are used to
translate a graph into a schedule that can be used by switch nodes, and incremental changes in the
graph translate into incremental changes in the schedule.
Robustness and security
Our approach lends itself naturally to dealing with failure recovery in an integrated fashion
throughout the resource allocation process. One of the "changes" that can be considered during the
creation of the virtual mesh is the failure of nodes and links, and both the topology of the virtual
application mesh and the instructions to switching points can implicitly and explicitly prepare for a
quick response to failures. The degree of robustness can be application and service specific.
The issue of security shows up in a number of areas. First, the network has to verify that
application input (parameters, specification and programs) can be acted upon safely since incorrect
input can endanger the operation of the network. This requires a combination of language,
compilation and runtime techniques, and we plan to use mostly existing technology for this
security aspect. Second, the network has to guarantee that only authorized entities can modify the
network state. We plan to address this using existing authentication
methods. Note that the virtual application mesh can form the basis for
providing security. As part of the creation of the mesh,
relationships of trust can be established between the nodes in the mesh.
This can speed up security checking during execution.
Experimental approach
Carnegie Mellon University is developing a comprehensive suite of resource
management mechanisms in support of such "application-aware" networks. We
will support resource allocation along three dimensions: resource
allocation in the "space" consisting of the physical network infrastructure
and attached processing and storage resources; decision making on different
time scales, ranging from application startup to packet and cell
scheduling; and resource allocation by different organizational entities
sharing the infrastructure. In all three dimensions, the mechanisms we
develop will provide for extensive tailoring to application requirements.
The resource management techniques will be evaluated in a testbed
driven by increasingly more aggressive applications and services. Tesbed development will take
place in three steps. Initially, we will use the existing Credit Net ATM network for quick
experiments that will guide the design of interfaces and protocols. A
second step will be a local area version of our network architecture and resource management
software. Finally, we plan to perform a wide area evaluation, hopefully in cooperation with
groups working on related research topics.
Last modified by
prs@cs.cmu.edu in October 1996.