Future work

Working with our collaborative filtering system system has highlighted several questions that are in need of further effort to resolve. The key undecided issue is whether or not a system of the type we designed will actually help users to filter their Net News. The final answer to this question will most likely require the installation and use of our system in a far larger setting than that just PARC. We anticipate the success or failure of the system will be affected by many things, including the social interactions between eager and lazy readers[13], the minute details of the user interfaces, and the ways in which filtered articles are presented to users. Still other unanswered questions important to success of a collaborative filtering system are given below.

A major issue that is a direct result of our work is deciding what types of information should be collected from users and distributed by the collaborative filtering system. While our initial impression was that having users evaluate articles on a scale from ``terrible'' through ``great'' would be the clearest and easiest task for them, several users reported that this is not so. They point out the difficulty of evaluating articles that might be very funny and interesting, but completely factually incorrect - are those ``good'' articles or ``terrible'' ones? They suggested that they would find it easier to make a series of binary decisions in a multi-dimensional space than to try to collapse their evaluation to a point on a more finely divided single dimensional scale. Perhaps asking the users to rate articles as being ``correct'' or ``incorrect'' and ``interesting'' or ``not interesting'' would be a better evaluation method. As noted in section , it would be easy to modify our vote servers and interface module to support such information.

Related to the question of how users evaluate articles is the question, ``Do users need to rate articles explicitly at all?'' We might assume that users would want to cast explicitly any vote identified with them, but users might not be so worried about the accuracy of a vote cast anonymously. For these anonymous votes, it should be possible to estimate a user's opinion of an article simply by noting how the user processes the article. An article skimmed quickly and passed over might rate a low positive vote. An article the user saved or printed could receive a high positive vote, and an article that caused the user to purge all articles on the same subject could safely be given a negative vote. Studying the behaviors of users as they manipulate a news reader could provide interesting insight on how to automatically extract user evaluations from user behavior.

Another issue left unresolved by this thesis is the security of votes and vote files. For vote files to be truly useful, there must be better and simpler means of controlling who has the ability to access or create identified votes. Our approach of using ftp and file system permissions, while workable, is clearly not an ideal solution. To a large extent, vote files can be viewed as yet another network information resource, and so work on universal resource names (URNs) may solve this problem.

Finally, there is no reason that our techniques of collaborative filtering could not be applied to other types of distributed information resources. Since our vote system is free standing, it should be relatively straight forward to use our framework to compile ratings on other systems such as Gopher or World Wide Web(WWW). As an example of how our system could interact with WWW, we can imagine indexing votes by URL just as we currently index votes by message-id. We could group together the votes for all the links out of a Web page just as we currently group together the votes for all the articles in a newsgroup. Such a system would not only help users navigate the Web by providing information about the most popular Web pages and the most popular routes through the Web, but would also create within the vote server a map of the Web. If the votes were periodically expired, the map would keep itself up to date as Web pages appeared, disappeared, and were improved over time.

In ending, it seems clear that the rapidly increasing bandwidth and interconnectivity between computers has resulted in users being deluged with more information than they can possibly use. Yet, through techniques such as collaborative filtering that same interconnectivity can help us find the gems in the haystack by putting humans in better contact with other humans.

Next: Vote Server Communication Up: Results and Conclusions Previous: Conclusions

David A. Maltz (dmaltz@cs.cmu.edu)