Performance Comparison with Popular Classifiers |
Here we just make a brief introduction for the above popular methods we tried. For each method, we use cross validation to perform model selection. The plots shown in the above graph represented these methods with the best parameters found.
Here one additional note is that for all the algorithms other than our method, before the training-testing, the missing values were filled by the mean value of the corresponding attribute, which actually reduce the difficulty of the prediction.
We use SVMLight
tool. It is an implementation of Support Vector Machines (SVMs)
in C. It uses fast optimization algorithm and the working set selection is
based on steepest feasible descent. It
can handle several hundred-thousands of training examples
The basic assumption made by Naïve Bayes is that of feature independence. But Naïve Bayes performs surprisingly well in many real-world
domains. Most of those domains have clear feature dependencies.
Logistic regression can be used to
predict a dependent variable on the basis of independents and to determine the
percent of variance in the dependent variable explained by the independents; to
rank the relative importance of independents; to assess interaction effects;
and to understand the impact of covariate control variables.
Logistic regression applies maximum
likelihood estimation after transforming the dependent into a logit variable (the natural log of the odds of the
dependent occurring or not). In this way, logistic regression estimates the
probability of a certain event occurring. Note that logistic regression
calculates changes in the log odds of the dependent, not changes in the
dependent itself as OLS regression does.
Decision Tree J48 (C4.5-Quinlan) is
an extension of the base algorithm ID3. It incorporated numerical (continuous)
attributes. And it has post-pruning after induction of trees, e.g. based on
test sets, in order to increase accuracy. Also C4.5 can deal with incomplete
information (missing attribute values).
Freund
et al (1996) have developed the widely used AdaBoost.M1 which weights instances
depending whether or not they were misclassified by previous trees. This allows
attention to be directed to the instances that have caused errors in previous
iterations.
Note 1:
We use machine learning
toolkit WEKA[1]’s naïve Bayes
classifier ( -K ), logistic regression classifier, J48 classifier, and Adaboost classifier in our experiments.
Note 2: When making the performance comparison,
besides the gold standard set and the features used, the training set’s size
also matters. In our comparison, our training set’s size is ~ 15,000.
[1]. Ian
H. Witten and Eibe Frank,
Data Mining: Practical machine learning tools with Java implementations, Morgan
Kaufmann, San Francisco, 2000