%%%%%%%%	Article style document.		%%%%%%%%

%\documentstyle[fullpage,times,psfig,11pt]{cmu-art}
\documentstyle[fullpage,times,psfig,11pt]{article}
%\documentstyle[titlepage,times,11pt,picture]{article}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Get the support for EPSF in subdirectory
% Will typeset incorporated pictures on any unix system
% Will typeset and preview incorporated picture on macintoshes (Textures)
%      if custom EPSF.sty is installed. Available from tomstr@cs.cmu.edu 
%\input epsf.sty

%\def\mac{mac}
%\ifx\sys\mac\def\figpath{:fig:}
%  \else\def\figpath{./fig/}
%\fi
%\def\figpath{../graph/}
\def\figpath{./}

%%%%%%%%	Margin declarations.		%%%%%%%%

\oddsidemargin  0in
\evensidemargin 0in
\topmargin 0in
\textwidth 6.5in
\textheight 8.75in

%%%%%%%%	Macro definitions.		%%%%%%%%

% allows multiple line comments enclosed by brackets.
\newcommand{\comment}[1]{}

% define un-numbered footnote for all kinds of disclaimers and support messages
\makeatletter
\long\def\unmarkedfootnote#1{{\long\def\@makefntext##1{##1}\footnotetext{#1}}}
\makeatother

% allows multiple line comments enclosed by brackets.
\newcommand{\ignore}[1]{}

% allows multiple line comments enclosed by brackets (more visible).
\newcommand{\COMMENT}[1]{}

% allows multiple line comments enclosed by brackets.  These are to be
% omitted from the current version of the document but should be included
% in the final version.
\newcommand{\OMIT}[1]{}

% changes output to double spaced.
\newcommand{\doublespace}{\baselineskip 22pt}

% changes output to single spaced.
\newcommand{\singlespace}{\baselineskip 16pt}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%			DOCUMENT STARTS HERE				%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\begin{document}
% set line spacing to \singlespace or \doublespace.
\singlespace

% choose the output format for the bibliography.
% @use(bibliography = /afs/cs/usr/prs/bib/nectar.bib)
% @use(bibliography = /afs/cs/usr/prs/bib/network.bib)

%\bibliographystyle{cmu-bib}
%%%% to use\bibliographystyle{plain}
%\bibliographystyle{alpha}

%%%%%%%%		TITLE PAGE		%%%%%%%%

\title{A High-Bandwidth Network Service}
%\author{
%Qingming Ma\\
%School of Computer Science\\  
%Carnegie Mellon University\\
%Pittsburgh, PA 15213\\
%qma@cs.cmu.edu\\
%\\
%(412) 268-8079
%}
\date{}
\maketitle
\unmarkedfootnote{
This research was sponsored by the Defense Advanced Research Projects Agency
(DOD) under contract number MDA972-90-C-0035, in part by the National
Science Foundation and the Defense Advanced Research Projects Agency under
Cooperative Agreement NCR-8919038 with the Corporation for National Research
Initiatives.
}

\section{Introduction}

	Applications are growing larger and with increased diversity and variety.
	Examples include scientific computation, World-Wide Web, and distributed 
	multimedia. In scientific computation, data sets as large as a few
	gigabytes need to be processed in a few seconds. Web browsers are 
	retrieving increasingly large images, pictures, and animation files from
	a few kilabytes to a few megabytes. Low communication latency is 
	essential to keep up with end-systems' processing capacity, therefore, 
	to improve overall application performance. 

	Recent advance in broadband networking and switch-based networking
	technology, including ATM and switched-LANs (e.g., switched-Fibrechannel,
	switched-Ethernet, and switched-FDDI), has greatly increased network 
	bandwidth available to applications. 

	It is a well-known fact that the "best-effort" service class does not
	work well for traffic class that requires QoS guarantee. For this reason, 
	new service models that support real-time traffic have been intensively 
	studied in recent years, including guaranteed and predicated service 
	classes that support admission control and bandwidth reservation. 

	Since most existing applications don't require firm QoS guarantee, 
	achieving high performance for these applications is more fundamental
	but has not received enough attention. 

	Data traffic in most existing applications falls into either of the 
	following two traffic classes.
	\begin{itemize}
		\item {\bf Low-latency traffic.}
		Applications can use partial data as soon as it arrives. The 
		end-to-end delay is the main concern, and packets should be
		sent to their destination as soon as possible. Applications 
		include telnet, rlogin, and RPC.

		\item {\bf High-bandwidth traffic.}
		Applications wait until all requested data arrives. Once traffic
		starts to flow, it is bursty and persistent. The total elapsed 
		time for sending the whole data (i.e., the throughput) rather 
		than the delay of individual packets is the main concern. 
		Date should be sent as much as possible. Applications include 
		ftp, Web browser, and I/O intensive scientific computation.
	\end{itemize}

	With no firm QoS requirements, the network resource is dynamically shared
	by many applications. When the number of messages and the message size 
	grow large, it is important to allocate the network resource appropriately
	so as to make efficient use of the network resource and satisfy increasing 
	appliciation demands.

	Resource allocation is typically done at two levels. At the higher level, 
	routing units dictate traffic along less congested paths to achieve high 
	bandwidth and to balance the network load. At the lower level, congestion 
	control units make it possible for the network resource to be maximally 
	utilized without jamming the network. 

	Following earlier APARNET experience, data traffic is served by a single
	"best effort" delivery service. In this service, the network allocates 
	bandwidth among all the instaneous users as best it can, but without any 
	commitment with regarding to rate, delay, or bandwidth. The following 
	is a brief summary how the current best-effort service works.

	\begin{itemize}
	\item	{\bf Window-based congestion control.} Rate adjustment is performed
		by end systems. A source gradually increases sending rate until
		receiving an indication of congestion (implicitly by experienced 
		packet delay) and slows down. Along with the increasing and 
		decreasing of queue length, the impact on users is a reduction 
		in throughput. 

	\item 	{\bf Shortest-path routing.} Average packet delay is used as link 
		metric. The hypothesis is that short delay is a good indication 
		of less congestion and low link utilization, therefore, high 
		available bandwidth. However, it has been observed recently that, 
		due to the TCP dynamic adjustment of sending rate, the average 
		packet delay does not change much whether the network load is 
		high or low. The low end-to-end delay does not reflect high 
		available bandwidth.
	\end{itemize}

%	In summary, the 
%	to ask is: does this single best-effort service class work well for all
%	data traffic of various size in high-speed network?

	Hence, the current best effort service minimizes individual packet delay 
	which is good for low-latency traffic. The obviously different service 
	requirement for high-bandwidth traffic lead us to believe that the 
	best-effort service will unlikely work well for high-bandwidth traffic,
	and it is desirable to separate high-bandwidth traffic from low-latency 
	traffic. Thus, a new service class that supports high-bandwidth traffic 
	is needed.

	To support high-bandwidth traffic, a routing and congestion control 
	mechanism is needed. Recent work on rate and credit-based congestion 
	control with max-min fairness is particular suitable for congestion
	control of high-bandwidth traffic. It improves window-based congestion 
	control in two ways. First, delay estimate is no longer used by the 
	traffic source to decide whether to increase or decrease its sending 
	rate. Instead, explicit rate information or congestion indication or 
	credits are feedbacked to the traffic source. Packets are unlikely to
	be dropped if the source obey the rate specified by the network. 
	Hence, the traffic can move smoothly at a stable rate, except transit 
	period when other flows entering or leaving the newtork. Second, max-min 
	fairness is enforced. Any active connection inside the network will 
	get a fair share. These two improvements will help high-bandwidth 
	traffic to achieve high sustain bandwidth.

	However, routing high-bandwidth traffic has been largely ignored somehow. 
	It is routing that can balance the network load and select paths with 
	potentially highest sustain bandwidth. A good routing decision is even 	
	more important for a virtual circuit network like ATM, where all packets 
	travel along the same path selected at the beginning and dynamic rerouting 
	is expensive.

	In this paper, we focus on routing high-bandwidth traffic. We show that
	how routing cooperates with congestion control to make sure the traffic
	routed to achieve high sustain rate. The issues are studied include 
	routing algorithms and routing information. Multipath routing has been 
	proved to be effective to achieve even higher bandwidth. We study issues 
	related to multi-path routing
	in network with max-min fair share. A key issue to be solved is how to 
	use multiple path without grabing bandwidth from other sessions using 
	only single path. We introduce a prioritized multi-level max-min 
	faireness model, in which multipath is assigned different priority.
	The main advantage of this new multi-path routing is: no traffic using
	only a single path is affected and secondary paths use only unused 
	bandwidth.

%	An easy-to-use service interface is important for high-level applications.
%	However, it is more crucial, before getting into the design details of a 
%	service interface, to obtain insights about what technique will help 
%	improve the throughput for high-bandwidth traffic as well as network 
%	utlization, and what commitments the network and applications need to make.


%	traffic starts entering the network,
%	the network does its best through scheduling and buffer management to 
%	make sure traffic flows as smooth as possible without suffering 
%	interference of other traffic or attack of malicious users. so that 

%	and monitors traffic rate so that the network is neither congested nor 
%	under-utilized. This needs the cooperation 
%	between applications and the network. To enable users to achieve high 
%	sustain	rate, the network should provide sufficient feedback so that 
%	applications can adjust traffic rate to minimize packet loss while 
%	making efficiently use of the available bandwidth. 


\section{Related work}

\subsection{Service models}

	Work on introducing new no-real-time data service models
	
	David Clark: extending the Interent by adding features that permit 
	allocating different service levels to differnet users. Pay more 
	to get more.

		- hard to apply to public networks

	Scott Shenker: ASAP vs AMAP

		- no detail work

	ATM Forum: ABR and UBR

\subsection{Congestion control and TCP performance}

	Improve and enhence TCP performance has been the main way to 
	improve throughput and low delay.

	Van Jackbson: slow start, fast retransmit, scale bit for large bandwidth 
	and delay product

	Larry Peterson: TCP vega

\subsection{Routing}

	Early routing work for APARNET 

	Dynamic and adaptive re-routing:

        Dynamic or adaptive re-routing is a way to reblance network load 
	and make better use of network resources and get increased network 
	throughput.

        However, it causes problems in practice. The earlier APARNET 
        experience suggests that doing adaptive routing frequently can cause 
        the well-know oscillation problem because each swicth makes routing 
        dicision indepentently. Also, packets for the same source and sink
        pair can be delivered out of order, which may greatly affect TCP window 
        size. 

        In a network using virtual-circuit, unless routing decision is decided 
        by a centralized routing server, the simiular oscillation problem can
        still occur. Moreover, signalling overhead of tearing down old and 
        setting up connections can not be neglected, although data set is large
        in high-bandwidth traffic.

        In the studies by Rudin and Mueller and by Grandjean, it was shown that
        for circuit-switched networks, dynamic routing can improve network 
        throughput or delay, but only over a limited range of network 
        utilizations, and can worsen performance at very high utilizations.

	alternative path when the network is congested
		
	multi-path routing

	Other misc: burst reservation 

	Routing for different network models (different goals)
        metric used for routing: hops, residual bandwidth, ...

    Routing algorithms

	Two main classes of routing algorithms have been studied in the 
	literature. The first class, generally called the shortest-path 
	algorithms. provides a least-cost path between a given source-destination
	pair, where the ``cost'' of path is typically defined as the linear 
	sum of the costs of each hop in a given path. Routing algorithms used in
	most existing networks fall into this category, althgough cost of a hop
	is defiend from network to network. The second class of routing 
	algorithms are optimal to minimize the network average network-wide
	delay. The use of this performance leads to multi-path routing. 


\section{Network model}

%        nature of sharing: max min fair, reserved, ...

	Network model has a great impact on the design of a new service class.
	In this paper, we are interested in the network that is shared by many 
	users. No users are descriminated in favor of others.
%	Users are selfish and non-coorperative in the sense that they all try to 
%	better their own performance. But they do obey certain rules set by the 
%	network.

\subsection{ATM network}
	A packet switched network uses either connectionless datagram service 
	(e.g., Internet) or virtual-circuit (e.g., ATM). We use an abstraction 
	of ATM network as our network model, since ATM networking is new and 
	and many unsolved issues appear to be challenging. Most importantly, 
	high-bandwidth traffic will likely to be more pervasive in ATM network. 

	ATM Forum defined Available Bit Rate (ABR), a service class for data 
	traffic that does not require firm bandwidth or delay guarantee. The 
	core of this service is a congestion control mechanism with two features.
	One feature is that explicit congestion information (e.g., explicit 
	rate or congestion bit) is feedbacked to the traffic source periodically.
	The other is that all existing ABR connections are treated fairly with 
	max-min fairness. Rate-based congestion control has been adopted by ATM
	Forum. We refer \cite{} for details.

	We note that the study in this paper is applicable to any network that 
	is connection-oriented and supports a congestion control mechanism 
	based on max-min fairness.

\subsection{Max-min fairness}
	Intuitively, within max-min fair share, all existing connections in 
	the network are treated fairly in the sense that the available link
	bandwidth is evenly spilt among all connections who share the link.
	If some connections can not make full use their fair share because 
	they are bottlenecked by some other links, the unused portion is 
	split fairly among those who can use more bandwidth. 

	Given a network $\langle V, E\rangle$, a set $S$ of connections, and
	a bandwidth function $B: E\rightarrow R$. Let $S_{e}$ be the set
	of connections in $S$ that use link $e$. $B(e)$ is the total bandwidth 
	available to all connections in $S_{e}$. The maximal rate $r_{s}$
	that a connection $s$ can achieve within max-min fair share is called 
	max-min rate. Once a connection gets its max-min rate, it is said 
	saturated. A link $e$ is said to become bottleneck if all connections
	use the link are saturated.

	A simple centralized algorithm to calculate the max-min rate is to
	gradually increase the rate of connections and remove bottleneck links
	and saturated connections from $S$ until all connections are saturated.
%	
%	Let ${\sf us-}S$ be the set of unsaturated connections and 
%	${\sf us-}S_{e}$ the subset of unsaturated connections that use link 
%	$e$. Initially ${\sf us-}S$ is set to $S$ 

	The algorithm loops until all connections are saturated. At each loop, 
	it increases the rate of all connections so that one or more links 
	become bottleneck links, therefore, more connections become saturated.

	\begin{enumerate}
	\item
	For every connection $s$, $r_{s}$ is initially set to $0$ and maked
	unsaturated. Links that are not used by any connection in $S$ are 
	marked saturated immediately.

	\item
	Find a link $e$ with a minimal incremental rate defined by 
	\[ 
	{\sf inc}_{e} = 
	\frac{B(e) - \sum_{e\in S_{e}} r_{e}}{\rm number\ of\ unsaturated\ 
	connections\ using\ link\ e}
	\]

	\item
	Add this incremental rate to all unsaturated connections, the link $e$ 
	and possible other links will become new bottlenecks. Remove all new 
	bottlecks and mark all connections use these bottlenecks saturated.
	\end{enumerate}

\section{General Framework}

	High-bandwidth traffic sends large volume of data. In I/O intensive
	scientific applications, processing units may need to process from
	a few megabytes to gigabytes of data stored in remote file servers. A
	Web browser may also from time to time retrieve large files of a few
	megabytes. For these applications, user satisfaction is measured by
	the elapsed time, rather end-to-end packet delay, to transfer the 
	most part or the whole set of date. The number of bytes to be sent is
	typically known a priori or can be estimated. 

%	To support high-bandwidth traffic, traffic should be routed along a path
%	with is to slelect paths with
%	potential high sustain rate and control traffic appropriately to 
%	avoid packet loss while maximizing link utilization. Namely, a congestion
%	control and a routing mehanism are needed. We briefly discuss congestion
%	control in the rest of this section and address routing component in 
%	the following section.

	Assume that the network is with max-min fairness among all existing data
	connections. The first design choice to be made is whether high-bandwidth
	traffic should be totally isolated from low-latency traffic at congestion
	control level. In other words, should all data flows be treated equally 
	by one level congestion control or should they are treated different by
	two-level max-min fair share, one level for high-bandwidth traffic and 
	the other for low-latency traffic?

	In a public network, it's hard to set up a policy that divides available
	bandwidth between high-bandwidth traffic and low-latency traffic, since
	there is no reason to discriminate one user from others. Even in one
	application, control messages typically require to be sent as soon as
	possible while data messages are sent as much as possible. It may turn
	out to be odd for any policy dividing bandwidth between high-bandwidth 
	and low-latency without taking applications and actual link load into 
	consideration. Moreover, such an isolation will certainly introuce extra 
	complexity of max-min fair share calculation and reduce the share of the 
	network resource.

	We believe that treating high-bandwidth and low-latency traffic equally
	inside the network is important for a public network shared among many 
	users. We also believe that imporving dynamic sharing of network resource 
	between high-bandwidth and low-latency traffic will increase the network 
	resource ustilization. Hence, we will use one level congestion control 
	for both traffic classes.

	{\em To add routing for low-latency traffic: 1) mesaured delay is largely
	determined by the number of hops and the physical distance. 2) Queueing 
	delay can be very small if traffic is shaped at the traffic source. 
	3) permanent VCs in ATM network. 4) Generic routing, since volume of 
	low-latency traffic is high. Hence use path with minimal number of hops.
	If there are more than one such path, pick one with maximal max-min rate.}
	
\section{High-bandwidth routing}
	Routing data traffic for the network with underlying congestion control
	based on max-min fairness is new. The performance goal for high-bandwidth 
	routing is to maximize the average throughput of all connections. 

	Of the two main aspects of a routing problem, the first one is routing
	algorithms: how to select routes to achieve high performance. The second 
	one is a routing protocol: who make the routing decision, what 
	routing information is needed, and how the information is distributed. 

	In our proposed routing paradigm, link-state routing is advocated.
	Traffic source makes routing decision on a per-request base. Routing 
	and congestion control is combined in a nice way that routing decision 
	is made according to link congestion condition.

\subsection{Routing algorithms}

	A good routing algorithm will keep the network load well-balanced so as
	to achieve high application throughput. In a virtual circuit network, 
	dynamic rerouting can be expensive and resource available for a path 
	is notreserved. It is important to select a path that is not only good 
	at the time when making the routing decision but also good when the 
	network load changes.

	From an application's point of view, it is the bandwidth of a path, 
	rather the length of a path (the number of hops), mainly concerned,
	since the number of bytes to be sent is huge and packets flow in
	the network in a pipeline fashion. It is the obtainable bandwidth 
	of a path that decides the elapsed time of the whole data transfer.
	
	What is the obtainable bandwidth of a path? In a network with max-min 
	fair share, after a path is selected and the coneection is set up, the 
	obtainable bandwidth for this path is the max-min rate of the connection, 
	i.e., the minimal max-min fair share of all links in the path. This
	obtainable bandwidth is rather different from residual bandwidth used 
	in traditional routing study. For example, a link may be saturated by 
	just one or two large bursty traffic and leaves residual bandwidth $0$. 
	However, a new connection will be able to obtain half or one third 
	link bandwidth. One the other hand, a link with thousands of connections 
	may only use a portion of the link bandwidth and yield much high residual 
	bandwidth. But a new connection may just be able to get a rate as high
	ad the residual bandwidth.

	Thus, to the best interests of an application, a path with maximum
	max-min rate is preferred. We call such a path widest-path. However, 
	due to the sharing nature of the max-min fairness, a connection will 
	have to share the bandwidth along its path with not only the existing 
	connections but also future incoming connections. The bandwidth found 
	at the routing time may be decreased or increased, which depends on 
	future traffic load. In short, a path with highest sustaining rate is 
	what we want.

	The difficult part is that the future traffic is unpredictable. The best
	we can do at routing time is to select a path with the hope that will 
	suffer less bandwidth loss to future new connections. Assume that the 
	network load is evenly distributed. The chance for an existing connection 
	to maintan high sustain rate depends on two things: the path length (the 
	number of hops) and the path rate. A path with more hops has higher chance
	to lose bandwidth and lower chance to gain bandwidth. This suggests us
	to select a path with as few number of hops as possible. But it can cause
	low utilization of network resource and less satisfaction to applications.
	A path with higher rate will complete the whole transfer more quickly, 
	therefore, suffers less to the future increase of traffic load. This 
	suggests us to select a path with as high rate as possible. But it can
	lead to very long paths. 

	The main task of routing algorithms is to leverage the rate and the 
	length when selecting a path. Two extream cases are paths with minimal 
	number of hops and paths with maximal obtainable max-min rate. 

	{\bf Widest-shortest path.} A path is a shortest one if it is with 
	minimal number of hops. Among all paths with the same number of hops, 
	one with maximum max-min rate is called widest-shortest path. A path 
	with fewest number of hops will have the least chance to lose its 
	bandwidth to future connections. But it may lead to low application 
	throughput and low utilization of the network resources. 
	
	{\bf Shortest-widest path.} A path is a widest one if it is with maximum
	max-min rate among all possible paths. If there are more than one path 
	with the same width, one with the fewest number of hops is a 
	shortest-widest path. A shortest-widest path gains higest bandwidth
	at the begning of a connection, but have higher chance to lose it.

	To leverage the path selection, we introduce a family of shortest-path
	algorithms based on polynomial link cost. Let $r_{1}, \cdots, r_{k}$ be 
	the max-min rates of a path $P$ with $k$ hops. The distance of $P$ with 
	power $n$ is defined as
	\[
		{\rm dist}(P, n) = \sum_{i = 1}^{k} \frac{1}{r_{i}^{n}}
	\]
	The distance ${\rm dist}(P, n)$ is dominated by the minimal rate of
	a link over the path when $n$ is sufficiently large, and the algorithm 
	is approaching to widest-shortest path algorithm.  On the other hand, 
	the number of summands plays more significant role if the $n$ closed 
	to $0$. When $n = 0$, a shortest-path is a path with minimal number 	
	of hops. By adjusting $n$ for a specified network topology and traffic 
	load, we may be able to select a $n$ that will be best for the network 
	environment. 

	{\bf Shortest-${\rm dist}(P, 1)$ path.} When $n$ equals to $1$, the 
	polynomial distance would be the bit transmission delay from the 
	traffic source to destination if the connection should get the rate 
	$r_{i}$ at the $i$-hop. But it is not, since the max-min rate is 
	min($r_{i}$). Even if we think the distance as the end-to-end 
	transmisson delay, it is still very different from the measured 
	end-to-end packet delay used in the current best effort service. First, 
	the measured delay is the average packet dealy, including propagation, 
	processing, queueing, and transmission delay. In a high speed network, 
	the sum of porpagation delay and processing delay, which is decided 
	by the number of hops and the physical distance between the source and 
	destination, can be very significant. Second, the bit tranmission delay 
	at a hop is for this particular connection, not a average one for all 	
	connections passing through this hop.

\subsection{Routing information} 
	Routing information refers to network states that are needed by routing 
	units to select paths among possible candidates. State information 
	used in existing routing protocols includes topology (for reachability 
	routing \cite{EGP}), hop counts (e.g., in RIP \cite{hedrick}), the 
	number of packets waiting in a link queue (used in earlier ARPANET 
	routing \cite{Bertsekas:Gallager}), the average latency of packets 
	experienced in a fixed time period (used in revised ARPANET routing 
	\cite{mcquillan}). Routing information is updated periodically. 

	In our case, we want to select paths that will have high sustain rate. 
	In theory, to calculate the rate for a new connection, we need to know 
	the network topology and paths of all existing connections so as to 
	mimic max-min fairness algorithm. In reality, this may not be practical
	and unnecessary since the amount of routing information can be very 
	large when if thousands of connections exist in the network.

	We propose an approximate way to calculate the rate of a new connection,
	under which each switch only needs to send a vector of counters of 
	discrete bandwidth. For example, assume that the raw link bandwidth 
	for a link is $155 Mb/s$. For each output port of a switch, we have
	a vector. The entries of the vector is number of connections with a
	certain amount of bandwidth assigned.


\subsection{Location of routing decision} 
Routing decision can be made either by switch nodes in a hop-by-hop base or by 
the traffic source once for a complete route. Under hop-by-hop routing, each 
switch node makes an independent decision to determine the next hop. In source 
routing, the source of traffic selects paths and intermediate switch nodes 
forward data packets according to paths selected. 

Hop-by-hop routing is more suitable to for generic path selection. Source routing
is particularly flexible to select paths with special requirements. 
	In high-bandwidth routing, path should be selected on a per-flow base,
	since two flows with the same source and destination may be routed 
	differently if other paths with higher bandwidth are available. It is
	also computationally less expensive than if hop-by-hop routing is used
	since no switch nodes need to run special routing algorithms for all
	high-bandwidth connections. An added advantage to source routing is that, 
	in a supercomputing environment, large volume of data may be striped 
	over multiple file servers, processing units read data from these servers 
	simultaneously. Source routing makes it much easier to coordinate 
	multiple flows and to achieve even better performance.

\subsection{Distance-vector versus link-state protocols} 
Existing routing protocols can be classified into two categories: distance-vector 
protocols (e.g., BGP \cite{BGP}, IDRP \cite{IDRP}, and RIP \cite{hedrick}) and 
link-state protocols (e.g., IDPR \cite{RFC1478}, OSPF \cite{ospfv2}, and IS-IS 
\cite{isis,perlmanBook}) according to the way that routes are calculated, which 
decides the way that routing information is distributed. 

Distance-vector protocols select the best next-hop and distribute routing 
information only to its direct neighbor nodes. Link-state protocols, on the
other hand, compute the whole path to the destination and flood routing 
information to all nodes in the network. There have been many studies on
both distance-vector and link-state protocols and the comparison of the two
types of protocols.

Since traffic source makes routing decision, link-state routing is a more natural
choice.

\subsection{Multi-path routing}
It often happens in a network that a single path does not provide enough bandwidth 
for an end system, while some unused bandwidth is available from other paths. 
Multi-path routing is a technique to obtain more bandwidth for a hungry 
source-destination pair. Algorithmically, multi-path routing can mimic network 
flow model to minimize average packet delay inside the network. However, it has 
not been adopted in any real network. One problem is that packets can arrive out 
of order due to the difference of path length, which may have to be dropped and 
retransmitted. Also routing algorithms based on flow optimization are much more 
complicated to implement than shortest-path algorithms.

\subsubsection{Prioritizing max-min fair share}
To use multi-path in a network with max-min fair share, we must first regulate 
bandwidth sharing so that no seesion that uses only a single path be affected.
Without regulation, a session using multi-path may grab bandwidth from others.
Encouraging using disjoint paths seems to work, but it again can take multiple 
shares than others who use only one path, and disjoint paths may not always exist.
Our sharing semantics is that multi-path is only allowed to take advantage of 
unused network bandwidth.

We propose a multi-level prioritized max-min fair share model. In this model,
connections are prioritized and those with the same priority sit in the same
max-min plane sharing the bandwidth left over from connections with higher 
priorities. Unused bandwidth are left for lower priority connections.
The multi-level prioritized max-min fair share enjoys the following merits.

\begin{itemize}
\item Making unused bandwidth available to applications that are hungry for 
	bandwidth. Connections with lower priority never affect ones with
	higher priorities.

\item While the traffic in and out, the network resource is better shared by 
	dynamically shifting of bandwidth allocated to different level of 
	max-min planes. There is no need to set up policies with regarding 
	how to patition the bandwidth among several levels.

\item The calculation of max-min rate can be done by simply repeating, for
	each priority class, any algorithm used for a single level max-min 
	rate calculation, starting from high level to low level on the reduced
	network.

\item Rate-based congestions control adopted by ATM forum can be easily extended 
	to multi-level. The only support needed is that a priority is assigned 
	to each connection, switch maintains multiple lists of connections with 
	different priorities, and calculate rate of connections from higher 
	priority to low priority.
\end{itemize}
	
\subsubsection{Multi-path routing}
With the extension of one level max-min fair model to prioritized multi-level, 
multi-path routing works as follows. 
\begin{itemize}
	\item
	Starting from the max-min fair share plane with the highest priority, 
	a path is selected by using any assigned single path routing algorithm.
	\item
	Update bandwidth information by taking into consideration of this new 
	selected path and get residual for next max-min fair share plane.
	\item 
	Repeat the above two steps to select a path with next priority until
	no more path is needed or no path can be found.
	\item Set up a connection for each path selected. The whole data set 
	is divided over the multi-path by the end system according to their
	available bandwidth. The amount of data sent through each path is 
	dynamically adjusted when a path completes its sending of all data. 
	Source informs the receiver the change of data arrangement through
	control packets. Receiver integrates the data from different paths 
	and hand it over to the higher-level application.
\end{itemize}

In summary, in our proposed multi-path routing, multiple paths are assigned 
different priority. Lower priority path yields to higher priority path. Data
lay out and integrate is done by the end systems. 

\section{Simulator Design}

\subsection{Simulator}
	We designed and implemented a connection-level event-driven routing 
	simulator. Our goal is to evaluate various routing algorithms and 
	the multi-path routing scheme in a network with congestion control 
	based on with max-min fair share. The simulator mimics connection 
	setup and tear-town and max-min faireness calculation. An incoming 
	request gives the number of bytes to be sent. Paths are found and 
	connections are set up. The rate of connections are dynamically 
	adjusted according to the traffic load. After completing sending 
	all data, connections are torn town.
	
	{\bf Performance measure.} In contrast to traditional ``routed bandwidth''
	in a network with residual bandwidth model, we use average throughput of
	all sessions as out perfromance measurement. Higher average throughput 
	means more satisfaction of users and high utilization of the network 
	resource. We also demonstrate the variance of bandwidth gain by different 
	seesions. 

	{\bf Topology.} The network topology being picked has big impact on 
	simulation results. For this resaon, we pick three topologies as shown
	in figure ... Two of them with 8 switch nodes and one with 16 swicth 
	nodes. The difference between the two 8-node topologies is that one 
	is the symmetricity and connectivity. The 16-node topology will let us 
	to find out how performnce mesaure changes when the network grows large.
	We assume that a fixed number of host nodes are attached to each switch. 
	This allows us to explore the network performance by partitioning host 
	nodes into clients and servers so as generate more realistic and 
	unbalanced traffic load.

	{\bf Traffic load.} There are a few dimentions in the traffic load we used
	in our simulation. 
	\begin{itemize}
	\item Traffic classes. Two traffic classes, high-bandwidth and low-latency,
	coexist in the network. High-bandwidth traffic sends a few megabytes to 
	a gigabytes of data and low-latency traffic typically sends a few 
	kilabytes of data. The exact partition between high-bandwidth and 
	low-latency depends on application. In our simulation, we assume that
	any request below one megabytes is sent through low-latency traffic, others
	through high-bandwidth traffic. We also show the sensitive of this 
	this cut-off between high-bandwidth and low-latency tarffic to the 
	simulated network performance.

	We also assume that, for high-bandwidth traffic, the traffic source can 
	make full use of bandwidth assigned by the network. For low-latency 
	traffic, traffic source suggests a peck rate (PCR) and can make full 
	use of assigned bandwidth that is no higher than the peak rate. Data are 
	assumed to flow out from traffic source smoothly.

	\item Request arrival rate. 

	\item Ratio of bytes in high-bandwidth and low-latency traffic.
	\end{itemize}

\subsection{Simulation parameters}

	= routing algorithms: shortes-widest, widest-shortes path, hybrid

	= routing information distribution interval

	= client-server vs uniformal host nodes

	= message size distribution: uniform vs long-tail

	= number of paths

	= ratio of high-bandwidth over low latency traffic

\section{Simulation results}

\subsection{Algorithms}

	widest-shortes path for low-latency traffic,
	widest-shortes, shortest-widest, and hybrid for high-bandwidth

	change routing information distribution interval,
	change topology (both size and connectivity),
	change traffic load and load ratio of high-bandwidth over low-latency
		traffic ratio
	change all client mode to client server mode

\subsection{Multi-path routing}

	high connectivity

	a fixed algorithm from the previous subsection

	change network load and load ratio,
	change topology (increasing connectivity),
	change all client mode to client-server mode
	change number of paths to use


\subsection{Summary}

\section{Conclusions}

\subsection{Dynamic rerouting}

Pro: better resource allocation
When should source reroute?
Cons: oscillation (if multiple flows reroute at the same time. overhead in 
      practice


\subsection{Dynamic rerouting}

	fixed algorithm, single path
	change the condition when rerouting should happen
	change topology (connectivity)


Separate high bandwidth traffic from less critical traffic
        + motivation
Assuming fair sharing
        + motivation
Also evaluating multipath
        + motivation

widest-shortest versus shortest-widest
        influence of topology+traffic on results
        cut off between the two traffic classes
        effect of using fair sharing
multi-path
        influence of topology+traffic on results
        effect of "fairness" algorithm
        effect of using using fair sharing

- Algorithms
shortest-widest versus widest-shortest
        sources of overhead
hybrid
multi-path
        sources of overhead

	adopted by PNNI, and be used to find a pat on per-connection bese.

	\begin{itemize}
		\item {\bf Variety} Applications requires quality-of-service 
			guarantee, e.g., digital audio and video, interactive 
			learning, and video conference

		\item {\bf Size} The amount of data being transferred is 
			becoming increasingly larger due to the growing 
			application size and complexity. For example, in 
			scientific computation (e.g., ...), data sets being 
			processed can be as large as a few gigabytes in a few 
			seconds. Also, The size of files (including images, 
			pictures, and animation files) retrieved through FTP 
			and Web browsers is also growing, from a few hundreds of
			kilabytes to megabytes.
	\end{itemize}


		- without adding much complexity to the existing service models	
		- switch-based
		- virtual circuits
		- high link bandwidth



\end{document}