Home Research Home

Congested Bottlenecks and Network Design
Aditya Akella, Srini Seshan, Anees Shaikh Shuchi Chawla and Arvind Kannan.

This work consists of may sub-parts, a few of which have reached completion. These sub-parts are described in some detail below. The core aim of this work is to characterize congestion in the Wide-Area Internet in terms of where it occurs and how it grows with the network size:

1. Where are the hot-spots in the Internet and what are their characteristics?
A commonly held belief about the Internet is that poor network performance arises primarily from constraints at the edges of the network. These narrow-band access links (e.g., dial-up, DSL, etc.) limit the ability of applications to tap into the plentiful bandwidth and negligible queuing available in the interior of the network. As access technology evolves, enterprises and end-users, given enough resources, can increase the capacity of their Internet connections by upgrading their access links. The positive impact on overall performance may be insignificant, however, if other parts of the network subsequently become new performance bottlenecks. Ultimately, upgrades at the edges of the network may simply shift existing bottlenecks and hot-spots to other parts of the Internet. In this study, we consider the likely location and characteristics of future bottleneck links in the Internet. Such information could prove very useful to carrier ISPs as they evolve their networks, and also to customers considering their connectivity options.

Our objective is to investigate the characteristics of these non-access bottleneck links. These are links within or between carrier ISP networks that could potentially constrain the bandwidth available to long-lived TCP flows. Using a large set of network measurements, we seek to discover and classify such links according to their location in the Internet hierarchy and their estimated available capacity. By focusing on interior links, we try to avoid considering access links near the source and destination (i.e., first-mile and last-mile hops), as these are usually obvious bottlenecks in the current Internet.

Our work complements and extends the large body of work on measuring and characterizing the Internet, which continues to draw considerable attention. In contrast with these past efforts, our focus is on identifying and characterizing potential bottleneck links through the measurements of a wide variety of Internet paths. In our view, this work makes two primary contributions, as described below.

Methodology for measuring non-access Internet bottleneck links: Our main challenge in characterizing Internet bottlenecks is to measure paths that are representative of typical routes in the Internet, while avoiding biases due to a narrow view of the network from probe sites, or probes which themselves are poorly connected. Our results are based on measurements from 26 geographically diverse probe sites located primarily in the U.S., each with very high speed access to the Internet. We measure paths from these sites to a carefully chosen set of destinations, including paths to all Tier-1 ISPs, as well as paths to a fraction of Tier-2, Tier-3, and Tier-4 ISPs, resulting in 2028 paths in total. In addition, we identify and measure 466 paths passing through public Internet exchange points in order to explore the common belief that public exchanges are a major source of congestion in the Internet

A second challenge lies in actually measuring the bottleneck link and reporting its available bandwidth and location. Due to the need for administrative privileges, or control at both ends of the path, we were unable to leverage any of the existing tools to measure the available bandwidth. Hence, we developed a tool, BFind, which measures available capacity using a bandwidth probing technique motivated by TCP's behavior, and operates in a single-ended mode without requiring superuser access.

Classification of bottleneck links: We apply our measurement methodology to determine empirically the locations, estimated available bandwidth, and delay of non-access bottleneck links. In classifying these links, we draw extensively on recent work on characterizing AS relationships. Our results show that nearly half of the paths we measured have a non-access bottleneck link with available capacity less than 50Mbps. Moreover, the percentage of observed paths with bottlenecks grows as we consider paths to lower-tier destinations. Surprisingly, the bottlenecks identified are roughly equally split between intra-ISP links and peering links between ISPs. Also, we find that low-latency links, both within and between ISPs have a significant probability of constraining available bandwidth. Of the paths through public exchanges that had a bottleneck link, the constrained link appeared at the exchange point itself.

We believe that our observations provide valuable insights into the location and nature of performance bottlenecks in the Internet, and in some cases negate (or provide support for) commonly held notions. In addition, we hope that our work could prove instrumental in improving the performance of future network protocols in terms of which bottlenecks to avoid (and how to avoid them), and in providing guidance for which links within carrier ISPs to upgrade in order to eliminate hot-spots.

2. How does the worst congestion in the Internet grow?
The Internet grows in size every day. As time progresses, more end-hosts are added to the edge of the network. Correspondingly, to accommodate these new end-hosts, ISPs add more routers and links. History has shown that the addition of these links maintains the power-law properties of the Internet topology. The addition of new end-hosts places a greater load on the network as a whole. Fortunately, the improvement of network technology, operates over the same time period. We expect the network links at the edge and core of the network to improve by a similar performance factor as the growth of traffic over time, since they both typically follow similar Moore's Law-like technology trends.

Unfortunately, due to the topology of the network and behavior of Internet routing, the increase in load may be different on different links. As a result, it may be necessary for the speed of some hot-spot links in the network to improve much more quickly than others. If this is true, then these parts of the network are likely to eventually become bottlenecks and the network has poor scaling properties. In such a situation, we would either need to adjust the routing behavior, remove the power-law nature of the topology or accept that end-to-end network performance of the network will not improve as rapidly as individual links. If, on the other hand, the worst congestion scales well with the network size then we can expect the network to continue to operate as it does now.

We use a combination of analysis and simulation to address this issue of how the maximum congestion in the Internet scales with the network size. In our analysis, we employ simple combinatorial/probabilistic arguments to give bounds on the maximum congestion in a model of network evolution based on Preferential Connectivity and a simple model of traffic in which a unit amount of flow between every pair of nodes is routed along the shortest path between them. We complement these analytical results with a large set of detailed simulations for computing the congestion on the links in the network, based both on real and on synthetically generated AS-level topologies. Through our simulations, we also investigate the impact of several key factors on the worst congestion in the network. These include: (1) the routing algorithm employed by the nodes in the network (BGP policy-driven routing vs. shortest-path routing) (2) sophisticated models of communication, modeling realistic traffic demands and factoring in the popularity of a few nodes in the network over others and (3) Alternate degree distributions, e.g., an exponential distribution and power law trees evolving from Preferential Connectivity.

The key contribution of our work is to show that maximum congestion scale poorly with the growing size of the Internet graph. Specifically, the maximum congestion for shortest path routing is at least as bad as $n^{1+\Omega(1)}$, with the exponent depending on the exponent of the power-law degree distribution of the graph. Our simulations show that policy routing in the AS graph results in roughly the same maximum congestion as shortest path routing, but certainly not worse. When more realistic, non-uniform traffic models are considered, the congestion scaling properties of power-law graphs worse substantially. We also show that in terms of the maximum congestion, power law trees are considerably worse than power law graphs. In contrast, graphs with exponential degree distribution have very good congestion properties. Another key contribution of our work is the discussion of simple guidelines on adding redundant edges between nodes that result in a dramatic improve in the congestion scaling properties of the Internet graph.

Related Publications and Talks