I am also afflilated with the MPI for Biological Cybernetics as a research scientist.
KPC , Software to implement Nonlinear directed acyclic structure learning with weakly additive noise models (by Robert Tillman)
Consistent Nonparametric Tests of Independence , JMLR 2010
Hilbert Space Embeddings and Metrics on Probability Measures , JMLR 2010
Discussion of: Brownian distance covariance , Ann. App. Stat. 2009
Nonparametric Tree Graphical Models , AISTATS 2010
- Covariate Shift by Kernel Mean Matching
Assume we are given sets of observations of training and test data, where (unlike in the classical setting) the training and test distributions are allowed to differ. Thus for learning purposes, we face the problem of re-weighting the training data such that its distribution more closely matches that of the test data. We consider specifically the case where the difference in training and test distributions occurs only in the marginal distribution of the covariates: the conditional distribution of the outputs given the covariates is unchanged. We achieve covariate shift correction by matching covariate distributions between training and test sets in a high dimensional feature space (specifically, a reproducing kernel Hilbert space). This approach does not require distribution estimation, making it suited to high dimensions and structured data, where distribution estimates may not be practical.
We first describe the general setting of covariate shift correction, and the importance weighting approach. While direct density estimation provides an estimate of the importance weights, this has two potential disadvantages: it may not offer the best bias/variance tradeoff, and density estimation might be difficult on complex, high dimensional domains (such as text). We then describe how distributions may be mapped to reproducing kernel Hilbert spaces (RKHS), and review distances between such mappings. We demonstrate a transfer learning algorithm that reweights the training points such that their RKHS mapping matches that of the (unlabeled) test points. The sample weights are obtained by a simple quadratic programming procedure. Our correction method yields its greatest and most consistent advantages when the learning algorithm returns a classifier/regressor that is "simpler" than the data might suggest. On the other hand, even an ideal sample reweighting may not be of practical benefit given a sufficiently powerful learning algorithm (if available).
Talk slides and video
Software to implement covariate shift correction
Book chapter and NIPS paper
NIPS 2009 Workshop on Transfer Learning for Structured Data
- Introduction to Independent Component Analysis
Independent component analysis (ICA) is a technique for extracting underlying sources of information from linear mixtures of these sources, based only on the assumption that the sources are independent of each other. To illustrate the idea, we might be in a room containing several people (the sources) talking simultaneously, with microphones picking up multiple conversations at once (the mixtures), and we might wish to automatically recover the original separate conversations from these mixtures. More broadly, ICA is used in a very wide variety of applications, including signal extraction from EEG, image processing, bioinformatics, and economics. I will present an introduction to ICA, which includes a description of principal component analysis (PCA), and how ICA differs from PCA, the maximum likelihood approach, the case where fixed nonlinearities are used as heuristics for source extraction, some more modern information theoretic approaches, and a kernel-based method. I will also cover two optimization strategies, and provide a comparison of the various approaches on benchmark data, to reveal the strengths and failure modes of different ICA algorithms (with a focus on modern, recently published methods).
Software for Fast Kernel ICA demo
MLD student research symposium