Next: Previous Work: Imbalanced datasets Up: SMOTE: Synthetic Minority Over-sampling Previous: Introduction

Performance Measures

The performance of machine learning algorithms is typically evaluated by a confusion matrix as illustrated in Figure 1 (for a 2 class problem). The columns are the Predicted class and the rows are the Actual class. In the confusion matrix, TN is the number of negative examples correctly classified (True Negatives), FP is the number of negative examples incorrectly classified as positive (False Positives), FN is the number of positive examples incorrectly classified as negative (False Negatives) and TP is the number of positive examples correctly classified (True Positives).

Predictive accuracy is the performance measure generally associated with machine learning algorithms and is defined as Accuracy = (TP+TN)/(TP+FP+TN+FN). In the context of balanced datasets and equal error costs, it is reasonable to use error rate as a performance metric. Error rate is 1-Accuracy. In the presence of imbalanced datasets with unequal error costs, it is more appropriate to use the ROC curve or other similar techniques [19,23,1,13,24].

**Figure 1:** Confusion Matrix
$\begin{figure} \centerline{ \psfig {figure=conf.eps} }\end{figure}$

**Figure 2:** Illustration of sweeping out a ROC curve through under-sampling. Increased under-sampling of the majority (negative) class will move the performance from the lower left point to the upper right.
$\begin{figure} \centerline{ \psfig {figure=roc-under.eps,width=4.0in} }\end{figure}$

ROC curves can be thought of as representing the family of best decision boundaries for relative costs of TP and FP. On an ROC curve the X-axis represents $\%FP=FP/(TN+FP)$ and the Y-axis represents $\%TP=TP/(TP+FN)$ .The ideal point on the ROC curve would be (0,100), that is all positive examples are classified correctly and no negative examples are misclassified as positive. One way an ROC curve can be swept out is by manipulating the balance of training samples for each class in the training set. Figure 2 shows an illustration. The line y = x represents the scenario of randomly guessing the class. Area Under the ROC Curve (AUC) is a useful metric for classifier performance as it is independent of the decision criterion selected and prior probabilities. The AUC comparison can establish a dominance relationship between classifiers. If the ROC curves are intersecting, the total AUC is an average comparison between models [14]. However, for some specific cost and class distributions, the classifier having maximum AUC may in fact be suboptimal. Hence, we also compute the ROC convex hulls, since the points lying on the ROC convex hull are potentially optimal [25,1].

Next: Previous Work: Imbalanced datasets Up: SMOTE: Synthetic Minority Over-sampling Previous: Introduction

Nitesh Chawla (CS)
6/2/2002