PhD Research: As part of my PhD thesis research I investigated the problem of efficient and effective search of large-scale document collections.
Search engine indexes for large document collections are often divided into multiple disjoint partitions ('shards') that are distributed across multiple computers and searched in parallel to provide rapid interactive search.
Typically, all index shards are searched for each query (exhaustive search).
My research proposes an alternative, 'selective search', that partitions collections into topical shards and searches only a few relevant shards for each query.
As per the 'cluster hypothesis' ('similar documents tend to be relevant to the same request') topical organization of the document collection has the effect of concentrating the relevant documents for any given query into a few shards.
Such an organization of documents enables selective search to ignore large portions of the collections without degrading the search accuracy.
In summary, selective search is an efficient alternative to the current de-facto search paradigm of exhaustive search.