Peer-to-Peer Content Distribution

Overview
P2P

Content distribution on the Internet uses many different service architectures, ranging from centralized client-server to fully distributed. The recent wide-spread use of peer-to-peer applications such as SETI, Napster, and Gnutella indicate that there are many potential benefits to fully distributed peer-to-peer systems. Peer-to-peer content distribution provides more resilience and higher availability through wide-scale replication of content at large numbers of peers.

We are involved in several ongoing projects that study different flavors of peer-to-peer content distribution. We study the use of a simple, yet powerful observation called interest-based locality to provide scalable and high-performance content lookups and retrievals in peer-to-peer systems. We are also studying how to scale Gnutella, a popular file-sharing application. And, we are exploring how selective use of peer-to-peer communications can enhance existing client-server systems in the CoopNet project.



I. Interest-Based Locality in Peer-to-Peer Content Distribution Systems

Current work on peer-to-peer content location has focused on designing scalable algorithms. However, in a heterogeneous environment such as the Internet, performance is an equally important consideration. We study techniques to enhance the performance of peer-to-peer systems. In particular, we exploit a simple, yet powerful property interest-based locality in the context of peer-to-peer content location, which says that if a peer has a particular piece of content that we are interested in, it is very likely that it will have other pieces of content that we are interested in as well. Therefore, peers that share similar interests can benefit from direct cooperation. We propose a technique called interest-based shortcuts to link peers that share similar interests closer together. Peers run a fully distributed algorithm to incrementally construct their own set of shortcuts without the use of any global state or global communication. In addition, shortcuts are modular and can be implemented as a performance enhancement layer on top of any existing peer-to-peer content location system. As a result, shortcuts yield higher lookup performance without sacrificing scalability.

In addition to improving content location performance, interest-based shortcuts can be used as a primitive for a rich class of higher-level services. For instance, keyword or string matching searches for content and performance-based content retrieval are two examples of such services. Our SIGCOMM 2001 poster studies how performance-based content retrieval can implemented using interest-based shortcuts. The goal of such a service is to retrieve content from the peer with the best performance. Most peer-to-peer systems assume short-lived interaction on the order of single requests. However, shortcuts provide an opportunity for a longer-term relationship between peers. Given this relationship, peers can afford to carefully test out shortcuts and select to use the best ones. In addition, the amount of state peers need to allocate for interest-based shortcuts is small and bounded. Therefore, peers can store performance history for all of their shortcuts. Peers can even perform active probing of shortcuts when needed.

Publications

II. Characteristics of Gnutella Queries

The surging increase in the popularity of peer-to-peer applications had led to a dramatic need for a scalable and high performance content location protocol. Gnutella, a peer-to-peer file-sharing protocol, broadcasts queries to locate content and, thus, suffers from an overwhelming amount of query and reply traffic. We study the characteristics of queries on Gnutella and its implications on scaling. We find that the popularity of search strings follows a Zipf-like distribution. Taking advantage of such a popularity distribution by caching a small number of query results significantly decreases the amount of traffic seen on the network. We evaluate the effectiveness of caching and find that caching at one Gnutella node can result in up to a 3.7-time reduction in traffic while using only a few megabytes of memory. As more nodes implement caching, more traffic is reduced. Caching is a short-term solution to increasing the scalability of Gnutella.

Publications

III. Cooperative Networking (CoopNet)
Coopnet

Coopnet

In CoopNet, we seek to improve the performance of client-server systems through selective use of peer-to-peer communications. We focus on the Web flash crowd problem and show that client cooperation offers an effective solution. We evaluate CoopNet using traces gathered at the MSNBC website during the flash crowds that occurred on September 11, 2001. This is joint work with the Systems and Networking Group at Microsoft Research. For more information, please visit the project website.

Publications
  • Distributing Streaming Media Content Using Cooperative Networking, Venkata N. Padmanabhan, Helen J. Wang, Philip A. Chou, and Kunwadee Sripanidkulchai. NOSSDAV '02. Paper (pdf).

  • The Case for Cooperative Networking, Venkata N. Padmanabhan and Kunwadee Sripanidkulchai. IPTPS '02. Paper (pdf) and presentation (PowerPointShow | pdf | ps.gz).


Related Links
Content Location Protocols Based on Distributed Hash Tables Peer-to-Peer Applications Misc.
Kay Sripanidkulchai
Email