
Given a new learning problem, one of the first things you
need to do is figure out what features you want to use.
Alternatively, there has been substantial work on kernel functions
which provide implicit feature spaces, but then how do you pick the
right kernel? In this talk I will survey some theoretical results
that can provide some help or at least guidance for these tasks. In
particular, I will talk about:
 Algorithms designed to handle large feature spaces when it is
expected that only a small number of the features will actually be
useful (so you can pile a lot on when you don't know much about
the domain).
 Kernel functions. Can theory provide some guidance into selecting
or designing a kernel function in terms of natural properties of
your domain?

Combining the above. Can we use kernels to generate explicit
features?
[in addition to the survey nature, part of this talk will sneak in
some work joint with Nina Balcan]
Slides: Features, kernels, similarity functions
