Correcting Sample Selection Bias
This package contains a Matlab implementation of a kernel-based covariate shift correction method, as described in GreEtAl08 and HuaEtAl07. I would strongly recommend reading the book chapter before running the code, since it contains important notes on parameter selection. See also these talk slides.
The code is written by Marcel Schmittfull.
Given sets of observations of training and test data, we consider the problem of re-weighting the training data such that its distribution more closely matches that of the test data. We achieve this goal by matching covariate distributions between training and test sets in a high dimensional feature space (specifically, a reproducing kernel Hilbert space). This approach does not require distribution estimation. Instead, the sample weights are obtained by a simple quadratic programming procedure. While our method is designed to deal with the case of simple covariate shift, we have also found benefits for sample selection bias on the labels. Our correction procedure yields its greatest and most consistent advantages when the learning algorithm returns a classifier/regressor that is "simple", relative to the optimal decision boundary.
Code may be downloaded here.
|[GreEtAl08a]||Gretton, A., Smola, A., Huang, J., Schmittfull, M., Borgwardt, K., and Schoelkopf, B., Covariate Shift and Local Learning by Distribution Matching. In Dataset Shift in Machine Learning, MIT Press, Cambridge, MA, pp.131--160.|
|[HuaEtAl07]||Huang, J., Smola, A., Gretton, A., Borgwardt, K., and Schoelkopf, B., Correcting Sample Selection Bias by Unlabeled Data. NIPS 19, pp.601--608, 2007.|