Clean Slate Architectures for Network Management

100x100 logo

While Internet Protocol (IP) has been a runaway success, today's IP networks are difficult to manage well. We take a clean slate approach for redesiging different aspects of network control and management, guided by the following three principles:

Network-level objectives: Running a robust data network depends on satisfying objectives for performance, reliability, and policy that can (and should) be expressed as goals for the entire network, separately from the low-level network elements.

Network-wide views: Timely, accurate, network-wide views of topology, traffic, and events are crucial for running a robust network.

Direct control: The decision logic should provide network operators with a direct interface to configure network elements; this logic should not be implicitly or explicitly hardwired in protocols distributed among switches. 

These design principles have been embodied in three research initiatives:
  1. An architecture for centralizing network decision logic
  2. The theory and practice of interconnecing multiple routing instances
  3. The design of new flow monitoring solutions 

The 4-D Architecture

Layers of the 4D architecture

Despite the early design goal of minimizing the state in network elements, tremendous amounts of state are distributed across routers and management platforms in IP networks. We believe that the many, loosely-coordinated actors that create and manipulate the distributed state introduce substantial complexity that makes both backbone and enterprise networks increasingly fragile and difficult to manage. In the 4D architecture, we decompose the functions of network control into 4 planes: A decision plane that is responsible for creating a network configuration (e.g. computing FIBs for each router in the network); a dissemination plane that gathers information about network state (e.g. link up/down information) to the decision plane, and distributes decision plane output to routers; a discovery plane that enables devices to discover their directly connected neighbors; and a data plane for forwarding network traffic.


Theory and Practice of Interconnecting Multiple Routing Instances  

Today, a large body of research exists on the correctness of existing routing protocols. However, analytical frameworks for studying routing dynamics have mostly focused on one single routing protocol instance at a time. In reality, the Internet is composed of, not one (e.g., BGP) but, a multitude of protocol instances that need to interact. For example, routes must be exchanged between BGP and OSPF. The interactions between these protocol instances are governed by the routing glue component. However, despite its wide usage and essential role, there has been no formal investigation into how safe its usage is. We develop analytical models to rigorously analyze the interactions between multiple routing protocol instances, and its impacts on a network-wide level. We show that making routing protocols safe alone is not sufficient to ensure the correctness of Internet routing but the routing glue plays an equally important part: Its usage can result in a wide range of routing anomalies including persistent forwarding loops and permanent route oscillations. This routing glue deserves further attention from the networking community.


Technical Reports:

Rethinking Flow Monitoring: A Coordinated RISC Architecture for Network Flow Monitoring

RISC vs. application-specific approaches      Example of a network-wide RISC approach

Flow monitoring supports several critical network management tasks such as traffic engineering, accounting, anomaly detection, identifying and understanding end-user applications, understanding traffic structure at various granularities, detecting worms, scans, and botnet activities, and forensic analysis. These require high-fidelity estimates of traffic metrics relevant to each application. The set of network management and security applications is a moving target, and new applications arise as the nature of both normal and anomalous traffic patterns changes over time. We make the case for  a "RISC" approach for flow monitoring which employs simple collection primitives on each monitoring device and manages them in an intelligent network-wide fashion, to ensure that the collected data will support computation of metrics of interest to various applications. A RISC architecture dramatically reduces the implementation complexity of monitoring elements; enables router vendors and researchers to focus their energies on building efficiently implementing a small number of primitives; and allows late binding to what traffic metrics are important, thus insulating router implementations from the changing needs of flow monitoring applications.


Rethinking NetFlow