Conference Review Session

Modified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization
ICML-2003
Jian Zhang

Logistic Regression (LR) has been widely used in statistics for many years, and has received extensive study in machine learning community recently due to its close relations to Support Vector Machines (SVM) and AdaBoost. In this paper, we use a modified version of LR to approximate the optimization of SVM by a sequence of unconstrained optimization problems. We prove that our approximation will converge to SVM, and propose an iterative algorithm called ``MLR-CG'' which uses Conjugate Gradient as its inner loop. Multiclass version ``MMLR-CG'' is also obtained after simple modifications. We compare the MLR-CG with SVM_light over different text categorization collections, and show that our algorithm is much more efficient than SVM_light when the number of training examples is very large. Results of the multiclass version MMLR-CG is also reported.

Exploration and Exploitation in Adaptive Filtering Based on Bayesian Active Learning
ICML-2003
Yi Zhang

In the task of adaptive information filtering, a system receives a stream of documents but delivers only those that match a person's information need. As the system filters it also refines its knowledge about the user's information needs based on relevance feedback from the user. Delivering a document thus has two effects: i) it satisfies the user's information need immediately, and ii) it helps the system better satisfy the user in the future by improving its model of the user's information need. The traditional approach to adaptive information filtering fails to recognize and model this second effect.

We propose Utility Divergence as the measure of model quality. Unlike the model quality measures used in most active learning methods, utility divergence is represented on the same scale as the filtering system's target utility function. Thus it is meaningful to combine the expected immediate utility with the model quality, and to quantitatively manage the trade-off between exploitation and exploration. The proposed algorithm is implemented for setting the filtering system's dissemination threshold, a major problem for adaptive filtering systems. Experimental results on TREC-9 and TREC-10 filtering data will be reported. We will also discuss the relationship between Utility Divergence and other active learning algorithms.

A Loss Function Analysis for Classification Methods in Text Categorization
ICML-2003
Li Fan

We presents a formal analysis of popular text classification methods, focusing on their loss functions whose minimization is essential to the optimization of those methods, and whose decomposition into the training-set loss and the model complexityenables cross-method comparisons on a common basis from an optimization point of view. Those methods include Support Vector Machines, Linear Regression, Logistic Regression, Neural Network, Naive Bayes, K-Nearest Neighbor, Rocchio-style and Multi-class Prototype classifiers. Theoretical analysis (including our new derivations) is provided for each method, along with evaluation results for all the methods on the Reuters-21578 benchmark corpus. Using linear regression, neural networks and logistic regression methods as examples, we show that properly tuning the balance between the training-set loss and the complexity penalty would have a significant impact to the performance of a classifier. In linear regression, in particular, the tuning of the complexity penalty yielded a result (measured using macro-averaged F1) that outperformed all text categorization methods ever evaluated on that benchmark corpus, including Support Vector Machines.

Back to the Main Page

Charles Rosenberg

Last modified: Thu Sep 18 22:03:35 EDT 2003