next up previous
Next: Fingerprint Generation Up: Scalable Document Fingerprinting (Extended Previous: Document Noise

Reducing False Positives


One of the goals of our system is to provide reliable low level match information, and in particular, to reduce the number of false positives. This is important for a number of reasons. First, the identification of low level matches appears to be an interesting search paradigm for locating related documents. Second, it helps offset some of the limitations of fixed length fingerprints. Third, it has important performance implications: in the context of a database of millions of documents, false positives can significantly increase the cost of searching.

Nevin Heintze
Thu Oct 3 20:48:58 EDT 1996