Internet Engineering Task Force       I. Stoica, H. Zhang          CMU
     INTERNET DRAFT                        S. Shenker                  ICSI
     Expire August, 1999                   R. Yavatkar                Intel
                                           D. Stephens, A. Malis     Ascend
                                           Y. Bernet              Microsoft
                                           Z. Wang                   Lucent
                                           F. Baker                   Cisco
                                           J. Wroclawski                MIT
                                           C. Song, R. Wilder           MCI

                                           February, 1999

                Per Hop Behaviors Based on  Dynamic Packet States
                        <draft-stoica-diffserv-dps-00.txt>

     Status of this Memo

        This document is an Internet-Draft and is in full conformance with
        all provisions of Section 10 of RFC2026.

        Internet-Drafts are working documents of the Internet Engineering
        Task Force (IETF), its areas, and its working groups. Note that other
        groups may also distribute working documents as Internet-Drafts.

        Internet-Drafts are draft documents valid for a maximum of six months
        and may be updated, replaced, or obsoleted by other documents at any
        time. It is inappropriate to use Internet-Drafts as reference
        material or to cite them other than as "work in progress."

        The list of current Internet-Drafts can be accessed at
        http://www.ietf.org/ietf/1id-abstracts.txt

        The list of Internet-Draft Shadow Directories can be accessed at
        http://www.ietf.org/shadow.html.

     Abstract

        This document proposes a family of Per Hop Behaviors (PHBs) based on
        Dynamic Packet State (DPS) in the context of the differentiated
        service architecture. With these PHBs, distributed algorithms can be
        devised to implement services with flexibility, utilization, and
        assurance levels similar to those that can be provided with per flow
        mechanisms.

        With Dynamic Packet State, each packet carries in its header, in
        addition to the PHB codepoint, some PHB-specific state. The state is
        initialized by the ingress node. Interior nodes process each incoming
        packet based on the state carried in the packet's header, updating


     Stoica et al              Expires August, 1999                  [Page 1]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


        both its internal state and the state in the packet's header before
        forwarding it to the next hop. By using DPS to coordinate actions of
        edge and interior nodes along the path traversed by a flow,
        distributed algorithms can be designed to approximate the behavior of
        a broad class of "stateful" networks using networks in which interior
        nodes do not maintain per flow state.

        We give several examples of services that can be implemented by PHBs
        based on DPS. These include weighted fair share service, penalty box
        service, guaranteed service (with mathematically proven worst case
        bounds), and distributed bandwidth broker service.  We also discuss
        several possible solutions for encoding Dynamic Packet State that
        have the minimum incompatibility with IPv4.

        For more details see http://www.cs.cmu.edu/~istoica/DPS

     1. Introduction

        While the diffserv architecture [Blake98] is highly scalable, the
        services it can provide have lower flexibility, utilization, or
        assurance levels than services provided by architectures that employs
        per flow management at every node.

        As an example, it can be shown that in order for the premium service
        to achieve service assurance comparable to the guaranteed service,
        even with a relative large queueing delay bound, the fraction of
        bandwidth that can be allocated to premium service traffic has to be
        very low. In [StoZha99], an example was given to show that even when
        the fraction of bandwidth allocated to premium service is limited to
        be less than 10%, the worst case queueing delay bound can be as high
        as 240 ms. The example assumes a DS domain with diameter of 15 hops,
        in which each link has speed of 1 Gbps and it is traversed by 1,500
        flows with 1,500 byte packets. It is debatable whether these numbers
        should be of significant concern. For example, the low utilization by
        the premium traffic may be acceptable if the majority of traffic will
        be best effort, either because the best effort service is "good
        enough" for majority applications or the price difference between
        premium traffic and best effort traffic is too high to justify the
        performance difference between them. Alternatively, if the guaranteed
        nature of service assurance is not needed, i.e., statistical service
        assurance is sufficient for premium service, we can then achieve
        higher network utilization. Providing meaningful statistical service
        is still an open research problem. A discussion of these topics is
        beyond the scope of this document. Furthermore, as pointed out in
        [Bernet98], there is usually a tradeoff between the complexity of the
        QoS mechanism and the efficiency of the resource usage. While
        intserv-style guaranteed service can achieve high resource
        utilization, premium service needs much simpler mechanism.


     Stoica et al              Expires August, 1999                  [Page 2]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


        This document proposes a new set of PHBs based on Dynamic Packet
        State. These PHBs have relative low complexities (do not require per
        flow state), but can be used to implement distributed algorithms to
        provide services with flexibility, utilization, and assurance levels
        similar to those that can be achieved with per flow mechanisms. DS
        domains implemented with these type of PHBs should interoperate with
        DS domains implemented with other PHBs.

        With Dynamic Packet State, each packet carries in its header, in
        addition to the PHB codepoint, some PHB-specific state. The state is
        initialized by an ingress node, when the packet enters the DS domain.
        Interior nodes process each incoming packet based on the state
        carried in the packet's header, updating both its internal state and
        the state in the packet's header before forwarding it. By using DPS
        to coordinate actions of edge and interior nodes along the path
        traversed by a flow, distributed algorithms can be designed to
        approximate the behavior of a broad class of "stateful" networks.

        Consequently, introducing PHBs based on DPS will significantly
        increase the flexibility and capabilities of the services that can be
        built within the diffserv architecture. In particular, we will show
        that a variety of QoS services can be implemented by PHBs based on
        DPS. These include weighted fair share service, penalty box service,
        guaranteed service, and distributed admission control service.

        In this document, we use flow to refer to a subset of packets that
        traverse the same path inside a DS domain between two edge nodes.
        Thus, with the highest level of traffic aggregation, a flow consists
        of all packets between the same pair of ingress and egress nodes.

        This document is organized as follows.  Section 2 gives a general
        description of PHBs based on DPS. Section 3 presents several services
        that can be implemented with PHBs based on DPS. Section 4 discusses
        alternative techniques of storing state in the packet's header.
        Sections 5 briefly discusses some issues related to the specification
        of DPS PHB's, such as codepoints, tunneling behavior, and interaction
        with other PHB's.  Section 6 discusses security issues.


     2. Description of Per-Hop Behaviors Based on Dynamic Packet State

        Unlike common PHB codepoints [Blake98, Heinanen99, Jacobson98], a PHB
        codepoint based on DPS has extra state associated to it. This state
        is initialized by ingress nodes and carried by packets inside the DS
        domain. The state semantic is PHB dependent. The values taken by the
        state can be either flow, path, or packet dependent. The state
        carried by packets can be used by interior nodes for a variety of
        purposes such as, packet scheduling, updating the local node's state,


     Stoica et al              Expires August, 1999                  [Page 3]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


        or extending the codepoint space.

        When a PHB based on DPS is defined, in addition to the guidelines
        given in [Blake98], the following items MUST be specified:

           o state semantic - the meaning of the information carried by
                                    the packets

           o state placement - where is the information stored in the
                                     packet's header

           o encoding format - how is the information encoded in packets

        For example, consider a PHB that implements the Core-Stateless Fair
        Queueing algorithm, which is described in Section 3.1. In this case,
        the state carried by a packet is an estimate of the current rate of
        the flow to which the packet belongs. The state can be placed either
        (a) between layer two and layer three headers, (b) as an IP option,
        or (c) in the IP header (see Section 4). Finally, the rate can be
        encoded by using a floating point like format as described in Section
        4.1.1.


        In addition, the following requirement, called the transparency
        requirement, must be satisfied

          o All changes performed at ingress nodes or within the DS domain on
            packets' headers (possible for the purpose of storing the state)
            must be undone by egress nodes

        Any document defining a PHB based on DPS must specify a default
        placement of the state in the packet header and a default bit
        encoding format. However, to increase the flexibility, it is
        acceptable for document to define alternate state placements and
        encoding formats. Any router that claims to be compatible with a
        particular PHB based on DPS must support at least the default
        placement and the default bit encoding format.

     3. Examples of Services that can be Implemented by PHBs Based on DPS

        To illustrate the power and the flexibility of the PHBs based on DPS,
        we give three examples. In the first, we address the congestion
        control problem by approximating the functionality of a "reference"
        network in which each node performs fair queueing. In the second
        example, we describe a scheduling algorithm that provides per flow
        QoS guarantees.  In the third example, we present a distributed per-
        flow reservation mechanism. It is worth noting that the last two
        techniques, when used together, can implement the intserv guaranteed


     Stoica et al              Expires August, 1999                  [Page 4]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


        service semantic in a diffserv environment.

     3.1. Support for Congestion Control

        A central tenet of the Internet architecture is that congestion
        control is achieved mainly through end-host algorithms. However,
        starting with Nagle [Nagle87], many researchers observed that such
        end-to-end congestion control solutions are greatly improved when
        nodes have mechanisms that allocate bandwidth in a fair manner. Fair
        bandwidth allocation protects well-behaved flows from ill-behaved
        ones, and allows a diverse set of end-to-end congestion control
        policies to co-exist in the network [Demers89].

        In the past, fair allocations were typically achieved by using
        per-flow queueing mechanisms -- such as Fair Queueing [Demers89,
        Parekh92] and its many variants [BenZha96, Golestani94, ShrVar95] --
        or per-flow dropping mechanisms such as Flow Random Early Drop (FRED)
        [Lin97]. However, it has been recently shown in [Stoica98] that by
        having packets carrying additional state it is possible to approximate
        the functionality of a "reference" network in which each node
        implements fair queueing allocation with a network in which interior
        nodes do not maintain per flow state. The next section discusses
        briefly this algorithm, called Core-Stateless Fair Queueing. The
        algorithm details are given in [Stoica98]. Source files for the ns-2
        simulator as well as an interactive simulator are available on line at
        [CSFQ].

     3.1.1. Core-Stateless Fair Queueing (CSFQ)}

        The architecture proposed in [Stoica98] can be easily implemented by
        using a PHB based on DPS. In this case, the state associated with each
        packet represents the current estimated rate of the microflow to which
        the packet belongs. Ingress nodes compute this rate and insert it in
        the packet header. Interior nodes use FIFO queueing and keep no
        per-flow state. They employ a probabilistic dropping algorithm that
        uses the packet state along with the node's own measurement of the
        aggregate traffic. As an interior node forwards a packet, it updates
        its state if needed (e.g., if the flow rate changes due to dropping).

        To give the intuition behind the CSFQ algorithm, consider a node with
        output link speed C, where flows are modelled as continuous streams of
        bits. We assume that each flow i's arrival rate r_i(t) is known
        precisely. Max-min fair bandwidth allocations are characterized by the
        fact that all flows that are bottlenecked (i.e., have bits dropped) by
        this node have the same output rate. We call this rate the fair share
        rate of the node; let alpha(t) be the fair share rate at time t. In
        general, if max-min bandwidth allocations are achieved, each flow i
        receives service at a rate given by min(r_i(t), alpha(t)). Let A(t)


     Stoica et al              Expires August, 1999                  [Page 5]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


        denote the total arrival rate: A(t)= r_1(t) + r_2(t) + ... + r_n(t).
        If A(t) > C then the fair share alpha(t) is the unique solution to

                C = min(r_1(t), alpha(t)) + min(r_1(t), alpha(t)) + ...  (1)
                    min(r_1(t), alpha(t)).

        If A(t) <= C then no bits are dropped and we, by convention, set
        alpha(t)=max(r_1(t), r_2(t), ..., r_n(t)).

        If r_i(t) <= alpha(t), i.e., flow i sends no more than the node's fair
        share rate, all of its traffic will be forwarded. If r_i(t) >
        alpha(t), then a fraction (r_i(t) - alpha(t))/r_i(t) of its bits will
        be dropped, so it will have an output rate of exactly alpha(t). This
        suggests a very simple probabilistic forwarding algorithm that
        achieves fair allocation of bandwidth: each incoming bit of flow i is
        dropped with the probability

                P = max(0, 1 - alpha(t)/r_i(t))                (2)

        When these dropping probabilities are used, the arrival rate of flow
        i at the next hop is given by min(r_i(t), alpha(t)).

        The above algorithm can be extended to a packet system. The packet
        system still employs a drop-on-input scheme, except that now we drop
        packets rather than bits. Because the rate estimation [Stoica98]
        incorporates the packet size, the dropping probability is independent
        of the packet size and depends only, as above, on the rate r_i(t)
        and fair share rate alpha(t). Pseudocode reflecting this algorithm is
        described in Figure 1.

        Figure 1 - Algorithms performed by ingress and interior nodes in CSFQ.

        ---------------------------------------------------------------------
                                Ingress node
        ---------------------------------------------------------------------
          on packet p arrival:
            i = get_flow(p);
            r = estimate_flow_rate(p,i);
            state(p) <- r;
        ---------------------------------------------------------------------
                                Interior node
        ---------------------------------------------------------------------
          on packet p arrival:
            r <- state(p);
            alpha = estimate_fair_rate();
            /* compute dropping probability */
            P = max(1 - alpha/r, 0);
            if (unif_rand(0,1) < P)


     Stoica et al              Expires August, 1999                  [Page 6]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


              drop(p);
            else
              enqueue(p);
              /* update flow's rate if needed */
              state <- min(alpha, r);
        ---------------------------------------------------------------------

        Further, [Stoica98] proposes specific algorithms for estimating the
        flow's rate and the link's fair rate. In addition, [Stoica98] gives
        also a worst-case bound for the performance of this algorithm in an
        idealized setting.

     3.1.2. Weighted CSFQ

        The CSFQ algorithm can be extended to support flows with different
        weights. This allows to provide per flow service differentiation
        without maintaining per flow or per path state at interior nodes. Let
        w_i denote the weight of flow i. Returning to our fluid model, the
        meaning of these weights is that we say a fair allocation is one in
        which all bottlenecked flows have the same value for r_i/w_i. The only
        other major change in the algorithm from Figure 1 is that the label is
        now r_i/w_i, instead of simply r_i.

     3.1.3. CSFQ with Penalty Box

        Fair queuing can also play a beneficial role in addressing the
        unfriendly flow problem. In the literature, one approach to address
        this problem is RED with penalty box [FloFal97]. This approach relies
        on the ability to accurately identify unfriendly flows with relatively
        lightweight router mechanisms. A flow is classified as unfriendly if
        its dynamic behavior is "more aggressive" than the expected behavior
        of a TCP flow. Unfortunately, in practice it is very difficult to
        implement such a test accurately, since a router does not know all the
        parameters that influence the TCP behavior (e.g., round-trip time). In
        addition, it is an open problem whether this approach can work
        satisfactory in the presence of different end-to-end adaption schemes
        [Stoica98].

        In contrast, with fair queueing it is relatively straightforward to
        identify the unfriendly flows. One way is to simply compare the
        arrival rate of the flow to the link's fair rate. In CSFQ, this
        reduces to comparing the estimated rate carried by the incoming packet
        to the output link's fair rate estimated by the node. In addition, as
        noted in [Stoica98] with CSFQ it is possible to devise algorithms to
        control these unfriendly flows without maintaining any per flow state.


     Stoica et al              Expires August, 1999                  [Page 7]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


     3.2. Per Flow QoS Scheduling


        In this section, we show that by using a PHB based on DPS, it is
        possible to provide per flow guarantees. More precisely, our goal is
        to replicate the functionality of a network with each node
        implementing the Delay-Jitter-Controlled Virtual Clock (Jitter-VC)
        algorithm [ZhaFer94]. In the remainder of this section, we will first
        describe the implementation of Jitter-VC using per flow state, then
        present our algorithm, called Core-Jitter-VC (CJVC), using PHB based
        on DPS.


     3.2.1. Jitter Virtual Clock (Jitter-VC)}

        Jitter-VC is a non-work-conserving version of the well-known Virtual
        Clock algorithm [Zhang90]. It uses a combination of a delay-jitter
        rate-controller [Verma91, ZhaFer94] and a Virtual Clock scheduler.
        The algorithm works as follows: each packet is assigned an
        eligibility time and a deadline upon its arrival. The packet is held
        in the rate-controller until its eligible time expires. The scheduler
        orders the transmission of eligible packets according to their
        deadlines. Jitter-VC guarantees that all packets will meet their
        deadlines within the transmission time of a packet of the maximum
        length.

        For the k-th packet of a flow, its eligibility time e(j;k) and
        deadline d(j;k) at the j-th node on its path are computed as follows
        (for simplicity of notation we eliminate the flow's index):

            e(j; 1) = a(j; 1),
            e(j; k) = max(a(j; k) + g(j-1; k), d(j; k-1)), j >= 1; k > 1 (3)
            d(j; k) = e(j; k) + l(k)/r,                      j, k >= 1
        (4)

        where l(k) is the length of the packet, r is the reserved rate for
        the flow, a(j; k) is the packet's arrival time at the j-th node
        traversed by the packet, and g(j; k), written into the packet header
        by the previous node, is the amount of time the packet was ahead of
        schedule at the previous node, i.e. the difference between the
        packet's deadline and its actual departure time at the (j-1)-th node.
        The purpose of having g(k; j-1) is to compensate at node j the
        variation of delay due to load fluctuation at the previous node j-1.

        It has been shown that if a flow's long term arrival rate is no
        greater than its reserved rate, a network of Virtual Clock servers
        can provide the same delay guarantee to the flow as a network of WFQ
        servers [FigPas95, Goyal95, StiVer96]. In addition, it has been shown


     Stoica et al              Expires August, 1999                  [Page 8]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


        that a network of Jitter-VC servers can provide the same delay
        guarantees as a network of Virtual Clock servers [Georgiadis95].
        Therefore, a network of Jitter-VC servers can provide the same
        guaranteed service as a network of WFQ servers.

     3.2.2 Core-Jitter-VC (CJVC)

        In [StoZha99], a variant of Jitter-VC algorithm, called Core-Jitter-
        VC (CJVC) is proposed. CJVC can be implemented with a PHB based on
        DPS. In addition, it can be shown that a network of CJVC servers can
        provide the same guaranteed service as a network of Jitter-VC
        servers.


        Figure 2 - CJVC algorithm at ingress and interior nodes

        ---------------------------------------------------------------------
                                Ingress node
        ---------------------------------------------------------------------
          on packet p arrival:
            i = get_flow(p);
            if (first_packet_of_flow(i))
              e[i]     = crt_time;
              delta[i] = 0;
            else
              delta[i] = l(p)/r[i] + delta[i] - l[i]/r[i] +
                         max(crt_time - d[i], 0) / (h - 1);
              e[i]     = max(crt_time, d[i]);
            l[i] = length(p);
            d[i] = e[i] + l[i]/r[i];

          on packet p transmission:
            label(p) <- (r[i], d[i] - crt_time, delta[i]);
        ---------------------------------------------------------------------
                                Egress node
        ---------------------------------------------------------------------
          on packet p arrival:
            (r, g, delta) <- state(p);
            e = crt_time + g + delta; /* see Eq. (3) */
            d = e + l(p)/r;

          on packet p transmission:
            label(p) <- (r, d - crt_time, delta);
        ---------------------------------------------------------------------

        As suggested by Eq. (3) and (4), the Jitter-VC algorithm needs two
        state variables for each flow: r, which is the reserved rate for the
        flow and d(j; k), which is the deadline of the last flow's packet


     Stoica et al              Expires August, 1999                  [Page 9]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


        that was served by node j. While it is straightforward to eliminate r
        by putting it in the packet header, it is not trivial to eliminate
        d(j; k). The difference between r and d(j; k) is that while all nodes
        along the path keep the same r value for the flow, d(j; k) is a
        dynamic value that is computed iteratively at each node. In fact, the
        eligible time and the deadline of the k-th packet depend on the
        deadline of the previous packet of the same flow, i.e., d(j; k-1).

        A naive implementation based on DPS would be to pre-compute the
        eligible times and the deadlines of the packet at all nodes along its
        path and insert all of them in the header. This would eliminate the
        need for the interior nodes to maintain d(j; k). The main
        disadvantage of this approach is that the amount of information
        carried by the packet increases with the number of hops along the
        path. The challenge then is to design algorithms that compute d(j; k)
        for all nodes while requiring a minimum amount of state in the packet
        header.

        Notice that in Eq. (3), the reason for node j to maintain d(j; k) is
        that it will be used to compute the deadline and the eligible time of
        the next packet. Since it is only used in a "max" operation, we can
        eliminate the need for d(j; k) if we can ensure that the other term
        in "max" is never less than d(j; k). The key idea is then to use a
        slack variable associated with each packet, denoted delta(k), such
        that for every interior node j along the path, the following holds

                a(j; k) + g(j-1; k) + delta(k) >= d(j; k-1), j > 1    (5)

        With Ineq. (5), the computation of eligibility time is reduced from
        Eq. (3) to

                e(j; k) = a(j; k) + g(j-1; k) + delta(k),             (6)

        Therefore, by using one additional variable, delta(k), we eliminate
        the need for maintaining the deadline of the previous packet d(j; k)
        for each flow at interior nodes. The computation of delta(k) is given
        in [StoZha99]. Figure 2 shows the computation of the scheduling
        parameters e(j; k) and d(j; k) by a CJVC server. The number of hops h
        is computed when the admission control is performed.


     3.3. Distributed Admission Control

        The previous two examples focus on data path mechanisms and services.
        In this section, we will show that PHBs based on DPS can also
        implement control plane services such as distributed admission
        control.


     Stoica et al              Expires August, 1999                 [Page 10]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


        Admission control is a central component in providing quantitatively
        defined QoS services.  The main job of the admission control test is
        to ensure that the network resources are not over-committed.  In
        particular it has to ensure that the sum of the reservation rates of
        all flows that traverse any link in the network is no larger than the
        link capacity C.

        A new reservation request is granted if it passes the admission test
        at each hop along its path. There are two main approaches to
        implementing admission control. Traditional reservation-based
        networks adopt a distributed architecture in which each node is
        responsible for its local resources, and where nodes are assumed to
        maintain per flow state. To support the dynamic creation and deletion
        of fine grained flows, a large quantity of dynamic per flow state
        needs to be maintained in a distributed fashion. Complex signaling
        protocols (e.g., RSVP and ATM UNI) are used to maintain the
        consistency of this per flow state.

        A second approach is to use a centralized bandwidth broker that
        maintains the topology as well as the state of all nodes in the
        network. In this case, the admission control can be implemented by
        the broker, eliminating the need for maintaining distributed state.
        Such a centralized approach is more appropriate for an environment
        where most flows are long lived, and set-up and tear-down events are
        rare. However, to support fine grained and dynamic flows, there may
        be a need for a distributed broker architecture, in which the broker
        database is replicated or partitioned. Such an architecture
        eliminates the need for a signaling protocol, but requires another
        protocol to maintain the consistency of the different broker
        databases. Unfortunately, since it is impossible to achieve perfect
        consistency, this may create race conditions and/or resource
        fragmentation.

        A third approach is to use a simplified provisioning model that is
        not aware of the details of the network topology, but instead admits
        a new flow if there is sufficient bandwidth available for the flow's
        packets to travel anywhere in the network with adequate QoS. This
        simplified model may be based on static provisioning and service
        level agreements, or on a simple dynamic bandwidth broker. In any
        case, the tradeoff made in return for the simplicity is that the
        admission control decision must be more conservative, and a much
        smaller level of QoS-controlled service can be supported.

        In the following, we show that by using a PHB based on DPS, it is
        possible to implement distributed admission control for guaranteed
        services in a DS domain. In our scheme, for each reservation, the
        ingress node generates a request message that is forwarded along the
        path. In turn, each interior node decides whether or not to accept


     Stoica et al              Expires August, 1999                 [Page 11]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


        the request. When the message reaches the egress node it is returned
        to the ingress, which makes the final decision. We do not make any
        reliability assumptions about the request messages. In addition, the
        algorithms does not require reservation termination messages. In the
        following we describe the per-hop admission control. In [StoZha99] we
        describe how this scheme can be integrated with RSVP to achieve end-
        to-end delay.


     3.3.1. Per-Hop Admission Control

        The solution is based on two main ideas. First, we always maintain a
        conservative upper bound of the aggregate reservation R, denoted
        R_bound, which we use for making admission control decisions.
        R_bound is updated with a simple rule:

                R_bound = R_bound + r

        whenever a request for a rate r is received and R_bound + r <= C. It
        should be noted that in order to maintain the invariant that R_bound
        is an upper bound of R, this algorithm does not need to detect
        duplicate request messages, generated either due to retransmission in
        case of packet loss or retry in case of partial reservation failures.
        Of course, the obvious problem with the algorithm is that R_bound
        will diverge from R. At the limit, when R_bound reaches the link
        capacity C, no new requests can be accepted even though there might
        be available capacity.

        To address this problem, we introduce a separate algorithm, based on
        DPS, that periodically estimates the aggregate reserved rate. Based
        on this estimate we compute a second upper bound for R, and then use
        it to re-calibrate the upper bound R_bound. An important aspect of
        the estimation algorithm is that it can be actually shown that the
        discrepancy between the upper bound and the actual reserved rate R is
        bounded. Then the re-calibration process reduces to choosing the
        minimum of the two upper bounds.


     3.3.3. Estimation Algorithm for the Aggregate Reservation

        To estimate the aggregate reservation, denoted R_est, we again use
        DPS. In this case, the state of each packet consists of the amount of
        bits the flow is entitled to send during the interval between the
        time when the previous packet was transmitted up to the time when the
        current packet is transmitted. Note that unlike the previous two
        examples, in this case the state carried by the packet does not
        affect the packet's processing by interior nodes. This state is
        solely used to compute each node's aggregate reservation.


     Stoica et al              Expires August, 1999                 [Page 12]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


        Furthermore, this state is not changed as the packet is forwarded.

        The estimation performed by interior nodes is based on the following
        simple observation: the sum of state values of packets of all flows
        received during an interval (a, a + T_W] is a good approximation for
        the total number of bits that all flows are entitled to send during
        this interval. Dividing this sum by T_W, we obtain the aggregate
        reservation rate. This is basically the rate estimation algorithm,
        though we need to account for several estimation errors. In
        particular, we need to account for the fact that not all reservations
        continue for the entire duration of interval (a, a + T_W].

        We divide time into intervals of length T_W. Let (u1, u2] be such an
        interval, where u2 = u1 + T_W. Let T_I be the maximum inter-departure
        time between two consecutive packets in the same flow and T_J be the
        maximum jitter of a flow, both of which are much smaller than T_W.
        Further, each interior node is associated a global variable Ra which
        is initialized at the beginning of each interval (u1, u2] to zero,
        and is updated to Ra + r every time a request for a reservation r is
        received and the admission test is passed, i.e., R_bound + r <= C.
        Let Ra(t) denote the value of this variable at time t. Since interior
        nodes do not differentiate between the original and duplicate
        requests, Ra(t) is an upper-bound of the sum of all reservations
        accepted during the interval (u1, t]. (For simplicity, here we assume
        that a flow which is granted a reservation during the interval (u1,
        u2] becomes active no later than u2.)  Then, it can be shown that the
        aggregate reservation at time u2, R(u2), is bounded by

             R(u2) < R_est(u2)/(1-f) + Ra(u2),                         (7)

        where f = (T_I + T_J)/T_W. Finally, this bound is used to re-
        calibrate the upper bound of the aggregate reservation R_bound(u1) as
        follows

             R_bound(u2) = min(R_bound(u2), R_est(u2)/(1-f) + Ra(u2)). (8)

        Figure 3 shows the pseudocode of the admission decision and of the
        aggregate reservation estimation algorithm at ingress and interior
        nodes. We make several observations. First, the estimation algorithm
        uses only the information in the current interval. This makes the
        algorithm robust with respect to loss and duplication of signaling
        packets since their effects are "forgotten" after one time interval.
        As an example, if a node processes both the original and a duplicate
        of the same reservation request during the interval (u1, u2], R_bound
        will be updated twice for the same flow. However, this erroneous
        update will not be reflected in the computation of R_est(u3), since
        its computation is based only on the state values received during
        (u2, u3]. As a consequence, it can be show that the admission control


     Stoica et al              Expires August, 1999                 [Page 13]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


        algorithm can asymptotically reach a link utilization of C (1 - f)/(1
        + f) [StoZha99].

        A possible optimization of the admission control algorithm is to add
        reservation termination messages. This will reduce the discrepancy
        between the upper bound R_bound and the aggregate reservation R.
        However, in order to guarantee that R_bound remains an upper bound
        for R, we need to ensure that a termination message is sent at most
        once, i.e., there are no retransmissions if the message is lost. In
        practice, this property can be enforced by edge nodes, which maintain
        per flow state.

        To ensure that the maximum inter-departure time is no larger than
        T_I, the ingress node may need to send a dummy packet in the case
        when no data packet arrives for a flow during an interval T_I. This
        can be achieved by having the ingress node to maintain a timer with
        each flow. An optimization would be to aggregate all "micro-flows"
        between each pair of ingress and egress nodes into one flow, and
        compute the state value based on the aggregated reservation rate, and
        insert a dummy packet only if there is no data packet for the
        aggregate flow during an interval.


        Figure 3 - Admission control and rate estimation algorithm.

        ---------------------------------------------------------------------
                                   Ingress node
        ---------------------------------------------------------------------
          on packet p departure:
            i = get_flow(p);
            state(p) <- r[i] * (crt_time - prev_time[i]);
            prev_time[i] = crt_time;
        ---------------------------------------------------------------------
                                    Interior node
        ---------------------------------------------------------------------
           Reservation Estimation         |     Admission Control
        ---------------------------------------------------------------------
          on packet p arrival:            |  on reservation request r:
            b <- state(p);                |    /* admission ctrl. test */
            L = L + b;                    |    if (R_bound + r <= C)
                                          |      Ra = Ra + r;
          on time-out T_W:                |      R_bound = R_bound + r;
            /* estimated reservation */   |      accept(r);
            R_est = L / T_W;              |    else
            R_bound = min(R_bound,        |      deny(r);
                      R_est/(1 - f) + Ra);|
            Ra = 0;                       |
        ---------------------------------------------------------------------


     Stoica et al              Expires August, 1999                 [Page 14]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


     4. Carrying State in Packets

        There are at least three ways to encode state in the packet
        header: (1) introduce a new IP option and insert the option at the
        ingress router, (2) introduce a new header between layer 2 and layer
        3, and (3) use the existing IP header.

        Inserting an IP option, while having the potential to provide a large
        space for encoding state, will require all routers within a DS domain
        to process IP options, thus complicating packet processing. In
        addition, adding an IP option may cause packet fragmentation that
        further reduces the performance.

        Introducing a new header between layer 2 and layer 3 would require
        solutions be devised for different layer 2 technologies. In the
        context of MPLS [Rosen98, Rosen99] networks, the state can be encoded
        in a special label. One way to do this is by using a particular
        encoding of the experimental use field indicating a nested label on
        the label stack that carried the PHB-specific state information rather
        than an ordinary label. In this case, the label on the top of the
        stack would indicate the label-switched path, and the inner label the
        PHB-specific state. This would require a small addition to the MPLS
        architecture to allow two labels to be pushed or popped in unison, and
        treated as a single entity on the label stack.

     4.1. Encoding State within an IP header

        In this section, we discuss the third option: storing the additional
        states in the IP header. The biggest problem with using the IP header
        is to find enough space to insert the extra information. The main
        challenge is to remain fully compatible with current standards and
        protocols. In particular, we want the network domain to be transparent
        to end-to-end protocols, i.e., the egress node should restore the
        fields changed by ingress and interior nodes to their original values.
        In this respect, we observe that there is an ip_off field in the IPv4
        header to support packet fragmentation/reassembly which is rarely
        used. For example, by analyzing the traces of over 1.7 million packets
        on an OC-3 link [nlanr], we found that less than 0.22% of all packets
        were fragments. In addition, ther are a relatively small number of
        distinct fragment sizes. Therefore, it is possible to use a fraction
        of ip_off field to encode the fragment sizes, and the remaining bits
        to encode DPS information. The idea can be implemented as follows.
        When a packet arrives at an ingress node, the node checks whether a
        packet is a fragment or needs to be fragmented. If neither of these
        are true, all 13 bits of the ip_off field in the packet header will be
        used to encode DPS values. If the packet is a fragment, the fragment
        size is recoded into a more efficient representation and the rest of
        the bits is used to encode the DPS information. The fragment size field


     Stoica et al              Expires August, 1999                 [Page 15]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


        will be restored at the egress node.

        In the above, we make the implicit assumption that a packet can be
        fragmented only by ingress nodes, and not by interior nodes.  This is
        consistent with the diffserv view that the forwarding behavior of a
        network's components is engineered to be compatible throughout a
        domain.

        In summary, this gives us up to 13 bits in the current IPv4 header to
        encode the PHB specific state.

     4.2. Example of State Encoding

        The state encoding is PHB dependent. In this section, we give
        examples of encoding the state for the services described in
        Section 3.

     4.2.1. Encoding Flow's Rate

        Recall that in CSFQ, the PHB state represents an estimate of the
        current rate of the flow to which the packet belongs. One possible
        way to represent the rate estimate is to restrict it to only a
        small number of possible values. For example if we limit it to 128
        values, only seven bits are needed to represent this rate. While this
        can be a reasonable solution in practice, in the following we propose
        a more sophisticated representation that allows us to express a larger
        range of values.

        Let r denote the rate estimate. To represent r we use an m bit
        mantissa and an n bit exponent. Since r <= 0, it is possible to gain
        an extra bit for mantissa. For this we consider two cases: (a) if r >=
        2^m we represent r as the closest value of the form u 2^v, where 2^m
        <= u <= 2^(m+1). Then, since the (m+1)-th most significant bit in the
        u's representation is always 1, we can ignore it. As an example,
        assume m = 3, n = 4, and r = 19 = 10011. Then 19 is represented as 18
        = u*2^v, where u = 9 = 1001 and v = 1. By ignoring the first bit in
        the representation of u the mantissa will store 001, while the
        exponent will be 1. (b) On the other hand, if r < 2^m, the mantissa
        will contain r, while the exponent will be 2^n - 1. For example, for m
        = 3, n = 4, and r = 6 = 110, the mantissa is 110, while the exponent
        is 1111. Converting from one format to another can be efficiently
        implemented. Figure 4 shows the conversion code in C. For simplicity,
        here we assume that integers are truncated rather than rounded when
        represented in floating point.

        Figure 4. The C code for converting between integer and floating
                  point formats. m represents the number of bits used by
                  the mantissa; n represents the number of bits in the


     Stoica et al              Expires August, 1999                 [Page 16]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


                  exponent.

        ---------------------------------------------------------------------
          intToFP(int val, int *mantissa, int *exponent) {
            int nbits = get_num_bits(val);
            if (nbits <= m) {
              *mantissa = val;
              *exponent = (1 << n) - 1;
            } else {
              *exponent = nbits - m - 1;
              *mantissa = (val >> *exponent) - (1 << m);
            }
          }

          FPToInt(int mantissa, int exponent) {
            int tmp;
            if (exponent == ((1 << n) - 1))
              return mantissa;
            tmp = mantissa | (1 << m);
            return (tmp << exponent)
          }
        ---------------------------------------------------------------------


        By using m bits for mantissa and n for exponent, we can represent
        any integer in the range [0..(2^(m+1)-1) * 2^(2^n - 1)] with a
        relative error bounded by (-1/2^(m+1), 1/2^(m+1)). For example, with 7
        bits, by allocating 3 for mantissa and 4 for exponent, we can
        represent any integer in the range [1..15*2^15] with a relative error
        of (-6.25%, 6.25%). The worst relative error case occurs when the
        mantissa is 8. For example the number r = 271 = 100001111 is encoded
        as u = 1000, v=5, with a relative error of (8*2^5 - 271)/271 = -0.0554
        = -5.54%. Similarly, r = 273 = 100010001 is encoded as u = 1001, v =
        5, with a relative error of 5.55%.

     4.2.2. Encoding Scheduling Parameters

        As shown in Figure 2, implementing CJVC requires encoding three
        parameters: (1) the rate r or equivalently l/r, (2) the slack delay
        delta, and (3) g. Since the deadline determines the delay guarantees,
        we use a representation that trades the eligible time accuracy for the
        deadline accuracy.

        To store the scheduling information we use three fields. The first
        field encodes l/r + delta by using a floating point like
        representation. Since g is never larger than l/r we encode it as a
        fraction of l/r + delta. This allows us to save space without


     Stoica et al              Expires August, 1999                 [Page 17]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


        compromising the accuracy. More precisely, in [StoZha99] we show that
        by using three bits for mantissa and four bits for exponent to
        represent l/r + delta, and only three bits to represent g, we can
        compute the packet's deadline, i.e., d = crt_time + g + l/r + delta,
        with a relative error which is bounded by (-13%, 13%). The third field
        stores l/r also as a fraction of l/r + delta. Then, the eligible time
        is computed as e = d - l/r. By allocating another three bits for this
        field we can store all scheduling parameters in 13 bits. In [StoZha99]
        we give a slightly different encoding that uses two extra bits in the
        DS field to store the reservation state.

     4.2.3. Encoding Reservation State

        As shown in Figure 2, when estimating the aggregate reservation, the
        PHB state represents the number of bits that a flow is entitled to
        send during the interval between the time when the previous packet of
        the flow has been transmitted until the current packet is
        transmitted. This number can be simply encoded as an integer b. To
        reduce the range, a possibility is to store b/l instead of b, where l
        is the length of the packet.

     4.3. Encoding Multiple Values

        Since the space in the packet's header is a scarce resource, encoding
        multiple values is particularly challenging. In this section we
        discuss two general methods that helps to alleviate this difficulty.

        In the first method, exemplified in Section 4.2.2. the idea is to
        leverage additional knowledge about the state semantic to achieve
        efficient encoding. In particular one value can be stored as a
        function of other values. For example, if a value is known to be
        always greater than the other values, the larger value can be
        represented in floating point format, while the other values may be
        represented as fractions of this value (see Section 4.2.2).

        The idea of the second method is to have different packets within a
        flow carry different state formats.  This method is appropriate for
        PHBs that do not require all packets of a flow to carry the same
        state.  For example, in estimating the aggregate reservation (see
        Section 3.3.2) there is no need for every packet to carry the number
        of bits the flow is entitled to send between the current time and the
        time when the previous packet has been transmitted.  The only
        requirement is that the distance between any two consecutive packets
        that carry such values to be no larger than T_I. Other packets in
        between can carry different information.  Similarly, if we encode the
        IP fragment size in the packet's state, the packet has to carry this
        value only if the IP fragment is not zero.  When the IP fragment is
        zero the packet can carry other state instead.  On the other hand,


     Stoica et al              Expires August, 1999                 [Page 18]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


        note that in CSFQ, it is mandatory that every node carry the rate
        estimate as this is used to compute the dropping probability for that
        packet.

     5. Specification Issues

        This section briefly describes some issues related to drafting
        specifications for PHB's based on DPS.

     5.1. Recommended Codepoints

        At this time it is appropriate to use values drawn from the 16
        codepoints [Nichols98] reserved for local and experimental use
        (xxxx11) to encode PHBs based on DPS.

     5.2. Interaction with other PHBs

        The interaction of DPS PHB's with other PHB's obviously depends on the
        PHB semantic.  It should be noted that the presence of other PHB's in
        a node may affect the computation and update of DPS state as well as
        the actual forwarding behavior experienced by the packet.

     5.3. Tunneling

        When packets with PHBs based on DPS are tunneled, the end-nodes must
        make sure that (1) the tunnel is marked with a PHB that does not
        violate the original PHB semantic, and (2) the PHB specific state is
        correctly updated at the end of the tunnel. This requirement might be
        met by using a tunnel PHB that records and updates packet state, and
        then copying the state from the encapsulating packet to the inner
        packet at the tunnel endpoint. Alternatively, the behavior of the
        tunnel might be measured or precomputed in a way that allows the
        encapsulated packet's DPS state to be updated at the decapsulation
        point without requiring the tunnel to support DPS behavior.

     6. Security Considerations

        The space allocated for the PHB state in the packet header must be
        compatible with IPsec. In this context we note that using the fragment
        offset to carry PHB state does not affect IPsec's end-to-end security,
        since the fragment offset is not used for cryptographic calculations
        [Kent98]. Thus, as it is the case with the DS filed [Nichols98], IPSec
        does not provide any defense against malicious modifications of the
        PHB state. This leaves the door open for denial of service
        attacks. For example, in CSFQ, flow's rate overestimation may cause
        disproportionate bandwidth being allocated to the flow inside the DS
        domain. In the example in Section 3.3, the underestimation of the
        aggregate reservation can lead to resource overprovision.


     Stoica et al              Expires August, 1999                 [Page 19]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


        One way to expose denial of service attacks is by auditing. In this
        context, we note that associating state with PHBs makes it easier to
        perform efficient auditing at interior nodes. For example, in CSFQ, an
        eventual attack can be detected by simply measuring a flow rate and
        then compare it against the estimated rate carried by flow's packets.

     7. Conclusions

        In this document we propose an extension of the diffserv architecture
        by defining a new set of PHBs that are based on Dynamic Packet State.
        By using DPS to coordinate actions of edge and interior nodes along
        the path traversed by a flow, distributed algorithms can be designed
        to approximate the behavior of a broad class of "stateful" networks
        within the diffserv architecture. Such an extension will significantly
        increase the flexibility and capabilities of the services that can be
        provided by diffserv.

     8. References

        [CSFQ] Core-Stateless Fair Queueing. URL:
        http://www.cs.cmu.edu/~istoica/csfq.

        [nlanr] NLANR: Network Traffic Packet Header Traces.  URL:
        http://moat.nlanr.net/Traces/.

        [BenZha96] J.C.R. Bennett and H. Zhang.  WF2Q: Worst-case fair
        weighted fair queueing. In Proceedings of IEEE INFOCOM'96, pages
        120-128, San Francisco, CA, March 1996.

        [Bernet98] Y. Bernet, R. Yavatkar, P. Ford, F. Baker, L. Zhang,
        K. Nichols and M. Speer. A Framework for Use of RSVP with Diff-
        serv Networks, Internet Draft, draft-ietf-diffserv-rsvp-01.txt,
        November, 1998.

        [Blake98] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and
        W. Weiss. An Architecture for Differentiated Services, Internet
        Draft, draft-ietf-diffserv-arch-02.txt, October 1998.

        [Demers89] A. Demers, S. Keshav, and S. Shenker. Analysis and
        simulation of a fair queueing algorithm. In Journal of
        Internetworking Research and Experience, pages 3-26, October
        1990. Also in Proceedings of ACM SIGCOMM'89, pp 3-12.

        [FigPas95] N. Figueira and J. Pasquale.  An upper bound on delay
        for the Virtual Clock service discipline.  IEEE/ACM Transactions
        on Networking, 3(4), April 1995.

        [FloFal97] S. Floyd and K. Fall. Router mechanisms to support


     Stoica et al              Expires August, 1999                 [Page 20]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


        end-to-end congestion control, LBL Technical Report, February
        1997.

        [Georgiadis95] L. Georgiadis, R. Guerin, and V. Peris. Efficient
        network QoS provisioning based on per node traffic shaping.  In
        IEEE INFOCOM'96, San Francisco, CA, March 1996.

        [Golestani94] S. Golestani. A self-clocked fair queueing scheme
        for broadband applications. In Proceedings of IEEE INFOCOM'94,
        pages 636-646, Toronto, CA, June 1994.

        [Goyal95] P. Goyal, S. Lam, and H. Vin. Determining end-to-end
        delay bounds in heterogeneous networks.  In Proceedings of the
        5th International Workshop on Network and Operating System
        Support For Digital Audio and Video, pages 287-298, Durham, New
        Hampshire, April 1995.

        [Jacobson98] Van Jacobson, K. Poduri and K. Nichols.  An
        Expedited Forwarding PHB, November 1998. Internet Draft, draft-
        ietf-diffserv-phb-ef-01.txt, November 1999.

        [Heinanen99] Juha Heinanen, F. Baker, W. Weiss, and J.
        Wroclawski.  Assured Forwarding PHB Group, Internet Draft, draf-
        ietf-diffserv-af-04.txt, January 1999.

        [Kent98] S. Kent and R. Atkinson. IP Authentification Header,
        Request for Comments: 2402, draft-ietf-ipsec-auth-header-07.txt,
        November, 1998.

        [Lin97] D. Lin and R. Morris. Dynamics of random early detection.
        In Proceedings of ACM SIGCOMM'97, pages 127-137, Cannes, France,
        October 1997.

        [Nagle87] J Nagle. On packet switches with infinite storage. IEEE
        Trans. on Communications, 35(4):435-438, April 1987.

        [Nichols98] K. Nichols, S. Blake, F. Baker, and D. L. Black.
        Definition of the Differentiated Services Field (DS Field) in the
        ipv4 and ipv6 Headers, Internet Draft, draft-ietf-diffserv-
        header-04.txt, October 1998.

        [ns2] Ucb/lbnl/vint network simulator - ns (version 2).

        [Parekh92] A. Parekh and R. Gallager. A generalized processor
        sharing approach to flow control - the single node case. In
        Proceedings of the INFOCOM'92, 1992.

        [Rosen98] E. C. Rosen, Y. Rekhter, D. Tappan, D. Farrinaci G.


     Stoica et al              Expires August, 1999                 [Page 21]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


        Fedorkow, T. Li, and A. Conta. MPLS Label Stack Encoding,
        Internet Draft, draft-ietf-mpls-label-encaps-03.txt, September
        1998.

        [Rosen99] E. C. Rosen, A. Viswanathan, and R. Callon.
        Multiprotocol Label Switching Architecture, Internet Draft,
        draft-ietf-mpls-arch-04.txt, February 1999.

        [ShrVar95] M. Shreedhar and G. Varghese. Efficient fair queueing
        using deficit round robin. In Proceedings of SIGCOMM'95, pages
        231-243, Boston, MA, September 1995.

        [StiVer96] D. Stilliadis and A. Verma.  Latency-rate servers: A
        general model for analysis of traffic scheduling algorithms.  In
        Proceedings of IEEE INFOCOM'96, pages 111-119, San Francisco, CA,
        March 1996.

        [Stoica98] I. Stoica, S. Shenker, and H. Zhang. Core-Stateless
        Fair Queueing: Achieving Approximately Fair Bandwidth Allocations
        in High Speed Networks. In Proceedings ACM SIGCOMM'98, pages
        118-130, Vancouver, September 1998.

        [StoZha99] I. Stoica and H. Zhang. Providing Guaranteed Services
        Without Per Flow Management.  URL:
        http://www.cs.cmu.edu/~istoica/DPS/, February 1999.

        [Verma91] D. Verma, H. Zhang, and D. Ferrari. Guaranteeing delay
        jitter bounds in packet switching networks. In Proceedings of
        Tricomm'91, pages 35-46, Chapel Hill, North Carolina, April 1991.

        [ZhaFer94] H. Zhang and D. Ferrari. Rate-controlled service
        disciplines. Journal of High Speed Networks, 3(4):389-412, 1994.

        [Zhang90] L. Zhang. Virtual Clock: A new traffic control
        algorithm for packet switching networks. In Proceedings of ACM
        SIGCOMM'90, pages 19-29, Philadelphia, PA, September 1990.

     9. Author's Addresses

        Ion Stoica                            Hui Zhang
        Carnegie Mellon University            Carnegie Mellon University
        5000 Forbes Ave                       5000 Forbes Ave
        Pittsburgh, PA 15213                  Pittsburgh, PA 15213
        Phone: (412) 268-2993                 Phone: (412) 268-8945
        Email: istoica@cs.cmu.edu             Email: hzhang@cs.cmu.edu

        Scott Shenker                         Raj Yavatkar
        ICSI                                  Intel Corporation, JF3-206


     Stoica et al              Expires August, 1999                 [Page 22]

     INTERNET DRAFT              PHBs Based on DPS             February, 1999


        1947 Center Street, Suite 600         2111 NE 25th. Avenue,
        Berkeley, CA 94704-1198               Hillsboro, OR 97124
        Phone: (510) 642-4274 x300            Phone: (503) 264-9077
        Email: shenker@icsi.berkeley.edu      Email: raj.yavatkar@intel.com

        Donpaul Stephens                      Andrew. G. Malis
        Ascend Communications, Inc.           Ascend Communications, Inc.
        10250 Valley View Road, Suite 113     1 Robbins Road
        Minneapolis, MN 55344                 Westford, MA 01886
        Phone: (612) 996-6840                 Phone: (978) 952-7414
        Email: donpaul@ascend.com             Email: amalis@ascend.com

        Yoram Bernett                         Zheng Wang
        Microsoft                             Lucent Technologies
        One Microsoft Way,                    101 Crawfords Corner Road,
        Redmond, WA 98052                     Room 4C-516
        Phone: (425) 936-9568                 Holmdel, NJ 07733, USA
        Email: yoramb@microsoft.com           Phone: (732) 949-1042
                                              Email: zhwang@dnrc.bell-labs.com

        Fred Baker                            John Wroclawski
        Cisco Systems                         MIT Lab for Computer Science
        519 Lado Drive                        545 Technology Sq.
        Santa Barbara, CA 93111               Cambridge, MA 02139
        Phone: (408) 526-4257                 Phone: (617) 253-7885
        E-mail: fred@cisco.com                Email: jtw@lcs.mit.edu

        Chuck Song                            Rick Wilder
        MCI Internet Engineering              MCI Internet Engineering
        2100 Reston Parkway                   2100 Reston Parkway
        Phone: (703) 715-7082                 Phone: (703) 715-7114
        E-mail: csong@mci.net                 E-mail: wilder@mci.net


     Stoica et al              Expires August, 1999                 [Page 23]