On the geometry of output-code multi-class learning
March 30, 2016

One of the main tasks in machine learning is classification, where the computer learns to categorize input objects into a set of pre-defined classes. For example, identifying which of your friends appears in a photograph can naturally be modeled as a classification task. Even though many real-world tasks are multi-class problems having more than two classes, most of our theoretical understanding is restricted to the special case of binary classification, especially in modern settings where large amounts of unlabeled data is available. In order to leverage our understanding of the binary case, many learning algorithms approach multi-class problems by decomposing them into a collection of binary classification tasks.

In this talk, I will present a new perspective on two such algorithms that are very common in practice: one-vs-all and (error correcting) output code learning. We show that in cases where these algorithms work well, they implicitly assume structure on how the classes are related. We show that by making this implied structure explicit, we can design algorithms to recover the classes based on limited labeled data. In particular, we provide results for commonly studied cases where the code words are well separated: learning a linear one-vs-all classifier for data on the unit ball and learning a linear error correcting output code when the Hamming distance between the codewords is large (at least d+1 in a d-dimensional problem). We additionally consider the more challenging case when the codewords are not well separated, but satisfy a boundary features condition.

This is joint work with Nina Balcan and Yishay Mansour.

BIO:

Travis is second year PhD student in the computer science department at Carnegie Mellon university advised by Nina Balcan. His research interests are in learning theory and distributed learning.