Machine learning used by Personal WebWatcher

Dunja Mladenic

This paper describes design of personal browsing assistant Personal WebWatcher that suggests interesting hyperlinks on the requested Web documents. Machine learning is used to generate a model of user's interests. We consider two approaches that differ in the information included in training examples: (1) include information presented to the user, that is a part of the text from the document that contains a hyperlink and (2) include information that was not presented to the user, that is the content of the document pointed to by a hyperlink. We compare two classification algorithms k-Nearest Neighbor and Naive Bayes. Bag of words document representation is used and features are selected using Information gain. Preliminary experiments show that there is no significant difference between the used classifiers and that using only a small number of features gives almost the same results as using all features. In all experiments the achieved classification accuracy is the same or slightly higher than the default accuracy. Since the default accuracy is higher for approach (1) than for approach (2), the results of approach (1) show higher classification accuracy.