CMU Theory Lunch

Searching near and far. Similarity search in high dimensions.

February 17, 2016

Given a set S in some metric space (X,D) similarity search is an umbrella term for problems that relate to searching through S for elements, x in S, that fit some criteria defined by a query point q in X and the ''similarity'': D(q,x). Perhaps most famously the r-Near Neighbor problem (NN). For low dimensional spaces efficient space partitioning is possible ie. based on the voronoi diagram or k-d trees to allow for O(log n) query time, however for high dimensional spaces all such solutions degrade to O(n) query time. In a breakthrough paper from 1998 Indyk and Motwani introduced Locality Sensitive Hashing (LSH), yielding sublinear query time O(dn^(1/c^p)) for c-approximate NN in l_(p=1,2).

We introduce the Annulus Query problem, a similarity search problem asking for restrictions on D(q,x) from both sides. Combining LSH techniques with a novel algorithm for Approximate Furthest Neighbor we present a sublinear query time (O-tilde(dn^2/c^2)) data structure for the Approximate Annulus Query problem in l_2.

This talk will give a brief general introduction to the area of high dimensional similarity search based on locality sensitive hashing, and present our recent result for the Approximate Furthest Neighbor and Approximate Annulus Query problems.

BIO:

Johan is a PhD student in the Scalable Similarity Search project at the IT University of Copenhagen, under supervisor Rasmus Pagh. He is currently visiting CMU for a semester hosted by Anupam Gupta. His interest include space partitioning, hash functions and randomized algorithms. If you are working on anything remotely related he would love to hear about it, he can be found in GHC 7108.