**Modified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization**

ICML-2003

Jian Zhang

Logistic Regression (LR) has been widely used in statistics for many years, and has received extensive study in machine learning community recently due to its close relations to Support Vector Machines (SVM) and AdaBoost. In this paper, we use a modified version of LR to approximate the optimization of SVM by a sequence of unconstrained optimization problems. We prove that our approximation will converge to SVM, and propose an iterative algorithm called ``MLR-CG'' which uses Conjugate Gradient as its inner loop. Multiclass version ``MMLR-CG'' is also obtained after simple modifications. We compare the MLR-CG with SVM_light over different text categorization collections, and show that our algorithm is much more efficient than SVM_light when the number of training examples is very large. Results of the multiclass version MMLR-CG is also reported.

**Exploration and Exploitation in Adaptive Filtering Based on Bayesian Active Learning**

ICML-2003

Yi Zhang

In the task of adaptive information filtering, a system receives a stream of documents but delivers only those that match a person's information need. As the system filters it also refines its knowledge about the user's information needs based on relevance feedback from the user. Delivering a document thus has two effects: i) it satisfies the user's information need immediately, and ii) it helps the system better satisfy the user in the future by improving its model of the user's information need. The traditional approach to adaptive information filtering fails to recognize and model this second effect.

We propose Utility Divergence as the measure of model quality. Unlike the model quality measures used in most active learning methods, utility divergence is represented on the same scale as the filtering system's target utility function. Thus it is meaningful to combine the expected immediate utility with the model quality, and to quantitatively manage the trade-off between exploitation and exploration. The proposed algorithm is implemented for setting the filtering system's dissemination threshold, a major problem for adaptive filtering systems. Experimental results on TREC-9 and TREC-10 filtering data will be reported. We will also discuss the relationship between Utility Divergence and other active learning algorithms.

**A Loss Function Analysis for Classification Methods in Text Categorization**

ICML-2003

Li Fan

We presents a formal analysis of popular text classification
methods, focusing on their loss functions whose minimization is
essential to the optimization of those methods, and whose
decomposition into the *training-set loss* and the
*model complexity*enables cross-method comparisons on a common basis from an
optimization point of view. Those methods include Support Vector
Machines, Linear Regression, Logistic Regression, Neural Network,
Naive Bayes, K-Nearest Neighbor, Rocchio-style and Multi-class
Prototype classifiers. Theoretical analysis (including our new
derivations) is provided for each method, along with evaluation
results for all the methods on the Reuters-21578 benchmark corpus.
Using linear regression, neural networks and logistic regression
methods as examples, we show that properly tuning the balance between
the training-set loss and the complexity penalty would have a
significant impact to the performance of a classifier. In linear
regression, in particular, the tuning of the complexity penalty
yielded a result (measured using macro-averaged F1) that outperformed
all text categorization methods ever evaluated on that benchmark
corpus, including Support Vector Machines.

Charles Rosenberg Last modified: Thu Sep 18 22:03:35 EDT 2003