next up previous
Next: Experiments Up: SMOTE: Synthetic Minority Over-sampling Previous: SMOTE

Under-sampling and SMOTE Combination

The majority class is under-sampled by randomly removing samples from the majority class population until the minority class becomes some specified percentage of the majority class. This forces the learner to experience varying degrees of under-sampling and at higher degrees of under-sampling the minority class has a larger presence in the training set. In describing our experiments, our terminology will be such that if we under-sample the majority class at 200% , it would mean that the modified dataset will contain twice as many elements from the minority class as from the majority class ; that is, if the minority class had 50 samples and the majority class had 200 samples and we under-sample majority at 200%, the majority class would end up having 25 samples. By applying a combination of under-sampling and over-sampling, the initial bias of the learner towards the negative (majority) class is reversed in the favor of the positive (minority) class. Classifiers are learned on the dataset perturbed by ``SMOTING'' the minority class and under-sampling the majority class.

Nitesh Chawla (CS)