SCS Faculty Candidate

  • Gates Hillman Centers
  • ASA Conference Room 6115
  • Research Fellow
  • Department of Biomedical Informatics
  • Harvard Medical School

From non-uniform data distributions to asymptotic speed and memory improvements: compressive genomics and HyperMinHash

Designing efficient algorithms requires that we understand the underlying distributions of the data. In the "compressive genomics" part of this talk, I show how we exploit the natural structure of biological data to speed up similarity search. We prove that by organizing the database to facilitate clustered search, our time-complexity scales with metric entropy (number of covering hyperspheres) if the fractal dimension of the dataset is low. This is the key insight behind our compressively accelerated versions of standard tools in genomics (CORA, 10-100x speedup for all-mapping of NGS reads), metagenomics (MICA, 3.5x speedup Diamond), and chemical informatics (Ammolite, 150x speedup SMSD).

However, understanding the underlying distributions is also crucial even when the distributions are completely artificial. In the "HyperMinHash" part of this talk, I discuss a method for reducing the space complexity from O(log n) to O(log log n) of the probabilistic MinHash sketch for finding Jaccard distance (a.k.a. similarity) between sets. We achieve this by taking advantage of the non-uniform distribution of minimum values from a collection of uniform random hashes. In ongoing work, we are applying these types of sketches to enable orders of magnitude faster Boolean queries on aggregate patient medical records.

Yun William Yu is a Research Fellow in the Department of Biomedical Informatics at Harvard Medical School, where he works on sketching and streaming algorithms for aggregate patient medical records. He received a BS in mathematics and a BA in chemistry from Indiana University, and completed an MRes in biomedical physical chemistry and an MPhil in mathematics at Imperial College London on a Marshall Scholarship. Supported by a Hertz Fellowship, he did his PhD in applied mathematics under Professor Bonnie Berger at the Massachusetts Institute of Technology.

Faculty Host: Carl Kingsford (CompBio)

For More Information, Please Contact: