next up previous
Next: Additional comparison to changing Up: Experiments Previous: ROC Creation

AUC Calculation

The Area Under the ROC curve (AUC) is calculated using a form of the trapezoid rule. The lower leftmost point for a given ROC curve is a classifier's performance on the raw data. The upper rightmost point is always (100%, 100%). If the curve does not naturally end at this point, the point is added. This is necessary in order for the AUC's to be compared over the same range of %FP.

The AUCs listed in Table 5.3 show that for all datasets the combined synthetic minority over-sampling and majority over-sampling is able to improve over plain majority under-sampling with C4.5 as the base classifier. Thus, our SMOTE approach provides an improvement in correct classification of data in the underrepresented class. The same conclusion holds from an examination of the ROC convex hulls. Some of the entries are missing in the table, as SMOTE was not applied at the same amounts to all datasets. The amount of SMOTE was less for less skewed datasets. Also, we have not included AUC's for Ripper/Naive Bayes. The ROC convex hull identifies SMOTE classifiers to be potentially optimal as compared to plain under-sampling or other treatments of misclassification costs, generally. Exceptions are as follows: for the Pima dataset, Naive Bayes dominates over SMOTE-C4.5; for the Oil dataset, Under-Ripper dominates over SMOTE-Ripper. For the Can dataset, SMOTE-classifier (classifier = C4.5 or Ripper) and Under-classifier ROC curves overlap in the ROC space. For all the other datasets, SMOTE-classifier has more potentially optimal classifiers than any other approach.

Table 3: AUC's [C4.5 as the base classifier] with the best highlighted in bold.
Dataset Under 50 100 200 300 400 500
Pima 7242   7307        
Phoneme 8622   8644 8661      
Satimage 8900   8957 8979 8963 8975 8960
Forest Cover 9807   9832 9834 9849 9841 9842
Oil 8524   8523 8368 8161 8339 8537
Mammography 9260   9250 9265 9311 9330 9304
E-state 6811   6792 6828 6784 6788 6779
Can 9535 9560 9505 9505 9494 9472 9470

next up previous
Next: Additional comparison to changing Up: Experiments Previous: ROC Creation
Nitesh Chawla (CS)