The goal of this thesis is to demonstrate a collaborative filtering system capable of supporting exploratory, refining, and group moderation on large scale distributed information systems. Because such systems involve replication or non-local servers, the filtering system design must balance such issues as replication versus response time. Because of the large numbers of articles and users on such systems, data completeness must be weighed against storage requirements. Because of the variety of ways in which data can be grouped the rigidity of data organization must be balanced against the ease of providing data to answer queries.
Usenet Net News is the natural target for a collaborative filtering system implementation for several reasons. First, it is an existing system with over 2.5 million potential users world-wide. Second, it meets all three criteria for an information system on which we expect collaborative filtering to work well - information is manipulated as articles, the articles are reasonably self contained, and each article is read by many people. Third, almost all its vocal users say it suffers from a low signal to noise ratio and is in dire need of more filtering capabilities.
The drawback for trying to implement a collaborative filtering system for a pre-existing information system is that our system must fit within the constraints imposed by Usenet Net News. As an example, Usenet users bring with them expectations of how Net News should work with which we must not conflict. Further, Usenet system administrators are more open to some types of software changes than others.
The following list sums up the key engineering requirements and constraints placed on our system by either collaborative filtering in general, or the Usenet Net News domain in particular. Each of these points will be addressed in more detail in the following chapters.
[Low Overhead:] The resources required to transport and store filtering information should be a small fraction of the resources required by the information stream as a whole. This is difficult as we hope to have more people providing filtering information than the number who provide the base information stream.
[Minimum Hassle:] Since our goal was to create a system that people would actually incorporate into their existing Net News system, our software had to be designed for ease of integration. This is difficult given the large number of platforms on which Net News runs and the many configurations in which Net News systems come.
[Respect for Conventions:] There are many vocal members of the Net News community with strong opinions on what acceptable social conventions are. Our system should be consistent with these.
[Streamlined:] The average user spends so little time reading most articles that any operation we expect a majority of users to perform must be exceedingly quick and consistent with the flow of the interface they already use. Further, the collaborative filtering system must respond very quickly to requests for information.