Trends in the data



next up previous contents
Next: Where are we? Up: Behaviors of Net Previous: Obtaining data on

Trends in the data

Once analyzed, the collected data show the following trends which we believe are relevant to users' ability to find interesting articles in the Net News system:

  1. Users read an average of 15 newsgroups each session. If we ignore sessions in which users read only 0 to 4 newsgroups as representing ``quicky'' sessions, our data on the number of newsgroups users read nicely correlates with data from a net wide survey of Net News reading habits taken by Jolicoeur.[11] This correlation helps to establish that the PARC Net News community is similar to the net wide community.

     

  2. Often users read none of the articles they subscribe to. Figure gif is a histogram of the fraction of available articles in a newsgroup that the users actually read. The graph shows that the vast majority of the time, users enter a newsgroup, view a list of the available articles, and then exit the newsgroup without reading any articles. Presumably the users subscribed to the groups because the users thought the groups would contain useful information. This failure to read any of the articles indicates that either the information content of the group really was very low, or more likely, the user was unable to easily identify any of the articles as being of possible interest.

      
    Figure: Histogram of the fraction of available articles users actually read. The height of each bar represents the number of times users read the listed fraction of available articles.

  3. A scatter plot of the number of articles users read versus the number of articles available to be read shows a telling breakpoint at between 200 - 300 articles (see figure gif). If there are fewer than 200 articles available to be read in a newsgroup, users read some proportion of the available articles. In groups containing more than 200 available articles however, few users read any articles at all. The common behavior is to simply skip all the articles rather than searching for ones that might be interesting.

      
    Figure: Scatterplot of the number of articles available to be read in a newsgroup versus the number of articles users actually read.

  4. Far more people read Net News than post. Based on data gathered from 632 sites by the Network Measurement Project at the DEC Network Systems Laboratory, it is clear that for all groups, there are more ``lurkers'' than ``posters.''[21][20] This is true regardless of the number of people who read the group or the number who post articles to it. Table gif shows some sample data for groups with the largest readership, the smallest readership, the greatest traffic, and several others.

      
    Table: Posting volume and estimated worldwide readership for several news groups. (Data from Reid, USENET Readership report for Aug 93)

  5. The time a user spends on each article, even after he or she has taken an action to display the full text of the article, is very short. The data taken so far show that users spend an average of 40 seconds reading an article.

We reduce our observations into a list of design requirements for the collaborative filtering system as follows:

[Majority Focus] Help the majority of Usenet users. There is a huge number of silent readers - all groups have far more lurkers than active posters. These lurkers are the people we believe collaborative filtering can help the most, and also the people who least want to wade through all the traffic on Usenet. These are also the people least willing to announce their presence (ie: they do not post), and most likely the ones who will least tolerate any extra overhead in their news reading.

[Streamline] Do not interrupt the existing flow. On average, users spend so little time reading most articles that any operation we expect a majority of users to perform must be exceedingly quick and consistent with the flow of the interface they use for reading articles. Voting for articles must be seamlessly integrated into the way the user processes Net News.

[Intelligible] The filter must behave in a simple way that is easy to understand. If users can not easily understand how the system is trying to help them, the system will only be getting in the way of the user's tasks.

To help users find interesting articles we need to find a way of cutting down the number of articles they must consider reading, otherwise they will tend not to read any. If we can accomplish that, we will probably help users meet their goal of increasing the number of newsgroups they can read. Our data suggest that even if our filtering is not very accurate at picking out the best articles, it may still help users find articles interesting to them by reducing the psychological burden of sifting through a huge number of available articles.





next up previous contents
Next: Where are we? Up: Behaviors of Net Previous: Obtaining data on



David A. Maltz (dmaltz@cs.cmu.edu)