Complete Cross-Validation

Rahul Sukthankar
Intel Research Pittsburgh & Robotics Institute, Carnegie Mellon

Abstract

Cross-validation is an established technique for estimating the accuracy of a classifier and is normally performed either using a number of random test/train partitions of the data, or using k-fold cross-validation. We present a technique for calculating the complete cross-validation for nearest-neighbor classifiers: i.e., averaging over all desired test/train partitions of data. This technique is applied to several common classifier variants such as K-nearest-neighbor, stratified data partitioning and arbitrary loss functions. We demonstrate, with complexity analysis and experimental timing results, that the technique can be performed in time comparable to k-fold cross-validation, though in effect it averages an exponential number of trials. We show that the results of complete cross-validation are biased equally compared to subsampling and k-fold cross-validation, and there is some reduction in variance. This algorithm offers significant benefits both in terms of time and accuracy.

Most of the talk is based on:
M. Mullin, R. Sukthankar.
Complete Cross-Validation for Nearest Neighbor Classifiers
Proceedings of ICML, 2000.

Back to the Main Page

Pradeep Ravikumar

Last modified: Thu Feb 5 16:16:30 EST 2004