Use of Hash in Machine Learning

Abstract

I will describe the Vowpal Wabbit online learning system (http://hunch.net/~vw), which enables learning on datasets up to perhaps a sparse terafeature. Many of the tricks used here are simply good implementations of well-known technology, but one of them is not yet so well known: using a hash function to define feature indicies. This trick, on first impression is worrisome, but turns out to be phenomenally useful. I will describe the trick, as well as some analysis showing that the hash function acts as a sparsity preserving projection.

Venue, Date, and Time

Venue: Wean Hall 4623

Date: Friday, March 20, 2009

Time: 3pm