Covariate shift is a prevalent setting for supervised learning in the wild when the training and test data are drawn from different time periods, different but related domains, or via different sampling strategies. This study addresses a transfer learning setting, with covariate shift between a labeled source and an unlabeled target domain. The goal of transfer learning methods is to account for the difference between the two domains for the purpose of prediction in the target domain. Most existing methods for correcting covariate shift exploit density ratios of the features to reweight the source-domain data, and when the features are high-dimensional, the estimated density ratios may suffer large estimation variances, leading to poor performance of prediction under covariate shift. In this work, we investigate the dependence of covariate shift correction performance on the dimensionality of the features, and propose a correction method that finds a low-dimensional representation of the features, which takes into account which features are relevant to the target variable Y, and exploits the density ratio of this representation for importance reweighting. We discuss the factors that affect the performance of our method, and demonstrate its capabilities on real-world applications.
This is joint work with Kun Zhang, Mingming Gong and Jaime Carbonell, and is currently in submission.
Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.