Project Camel at Carnegie Mellon

Camel : A Distributed and Scalable Content Discovery System

Overview | Members | Publications | Presentations | Software

Project Overview

A Content Discovery System (CDS) is a distributed system that enables the discovery of contents. A node in a CDS can publish and provide contents, issue queries looking for contents, store contents or contents' meta-data published by other nodes, and resolve other nodes' queries. There exists a wide spectrum of distributed applications that either themselves are CDS systems or use a CDS as one of their major components. Examples include service discovery services, peer-to-peer (P2P) object sharing systems, sensor networks and publication-subscription (pub/sub) systems.
The primary task of a CDS is to efficiently locate the set of contents that matches a client's query. Existing CDS systems have difficulties in achieving both rich functionality and scalability. At one end, they may be able to scale to the Internet level but offer limited functionality, e.g., they support exact content name lookup [Chord, CAN, Pastry, Tapestry] only, or the search of strictly hierarchical content names [DNS], or they consider static contents only, e.g., search engines [Google]. At the other end, they may offer general searching capability of both static and dynamic contents, but their searching mechanisms are not scalable [Gnutella, KaZaa].

In this project, we design Camel, a distributed and scalable CDS that overcomes the above difficulties and enables powerful content discovery on the Internet. Camel uses a Distributed Hash Tables (DHT) as an overlay network substrate, and possesses the following properties:

Scalability.
Camel achieves scalability through the use of Rendezvous Points (RPs), and thus avoids system-wide message flooding at both content registration and query time.

Load Balancing.
Camel deploys a novel mechanism that uses Load Balancing Matrices (LBMs) to dynamically balance both registration and query load in a truly distributed fashion to ensure its throughput, even under extremely skewed load, such as flash crowds.

Rich searchability.
Camel utilizes a flexible attribute-value based naming scheme for searching, and provides efficient support for complex queries, such as subset matching based queries, range and similarity queries.

Camel is designed as a generic software layer such that high level applications can be built on top of it. We have implemented Camel in a simulator as well as a real Internet implementation. As a proof of concept, we integrated Camel with a content-based music classification engine, and implemented a distributed music information retrieval system.
Please refer to our publications for more technical details.

Members

Jun Gao (Grad. student)
Adam Kushner (Undergrad)
Peter Steenkiste (Faculty)
George Tzanetakis (Collaborator)

Publications

Efficient Support for Similarity Searches in DHT-based Peer-to-Peer Systems.
Jun Gao and Peter Steenkiste.
To appear in Proceedings of the 2007 IEEE International Conference on Communications (ICC'07), Glasgow, Scotland, June 2007.
FULL TEXT: (154KB)
A Distributed and Scalable Peer-to-Peer Content Discovery System Supporting Complex Queries.
Jun Gao.
Ph.D. Thesis., CMU Technical Report, CMU-CS-04-170, Computer Science Department, Carnegie Mellon University, Oct. 2004.
FULL TEXT: (1.8MB)
An Adaptive Protocol for Efficient Support of Range Queries in DHT-based Systems.
Jun Gao and Peter Steenkiste.
In Proceedings of the 12th IEEE International Conference on Network Protocols (ICNP'04), pages 239-250, Berlin, Germany, Oct. 2004.
FULL TEXT: (277KB)
(A previous version of this paper is published as CMU Technical Report, CMU-CS-03-215, Dec. 2003.)
Design and Evaluation of a Distributed Scalable Content Discovery System.
Jun Gao and Peter Steenkiste.
IEEE Journal on Selected Areas in Communications (JSAC), 22(1):54-66, January 2004. Special Issue on Recent Advances in Service Overlay Networks.
FULL TEXT: (544KB)
A Scalable Peer-to-Peer System for Music Information Retrieval.
George Tzanetakis, Jun Gao, and Peter Steenkiste.
Computer Music Journal, 28(2):24-33, June 2004. The MIT Press.
FULL TEXT: (89KB; Available from MIT Press)
(Previous version appeared in Proceedings of the Fourth International Conference on Music Information Retrieval (ISMIR'03), pages 209-214, Baltimore, MD, October, 2003.) FULL TEXT: (109KB)
Content-Based Retrieval of Music in Scalable Peer-to-Peer Networks.
Jun Gao, George Tzanetakis, and Peter Steenkiste.
In Proceedings of 2003 IEEE International Conference on Multimedia & Expo(ICME'03), pages 309-312, volume I, Baltimore, MD, July, 2003.
FULL TEXT: (90KB)
Rendezvous Points-Based Scalable Content Discovery with Load Balancing.
Jun Gao and Peter Steenkiste.
In Proceedings of the Fourth International Workshop on Networked Group Communication (NGC'02), pages 71-78, Boston, MA, Oct. 2002.
FULL TEXT: (360KB)
Distributed Scalable Content Discovery Based on Rendezvous Points.
Jun Gao
Ph.D. Thesis Proposal., Computer Science Department, Carnegie Mellon University, May 20th, 2002.
FULL TEXT: (340KB)

Presentations

Meeting of the Minds, May 2004. (Adam's poster, here is one graph that shows registration load balancing on a PlanetLab experiment.)
CWBN Spring Review Talk, April 2004.
Invited Talk at IBM TJ Watson, March, 2004.
CMU Student Seminar Talk, March 2004. (Abstract)
ISMIR'03 talk (George).
ICME'03 poster.
NGC'02 talk. PowerPoint | PDF | Gzip'd Postscript ( Best Student Presentation Award)
Proposal Talk PowerPoint | PDF | PostScript

Software

CDS Simulator. A comprehensive event-driven simulator that implements all functionalities of Camel.

Real implementation. Added a separate library on top of Chord. Exports simple API to applications that use Camel. Currently runs on the Planet Lab testbed.

We will be releasing the simulator and our real implementation soon.

Please email any comments to Jun Gao.

Maintained by Jun Gao.

Last updated on July 1, 2004.