CS Thesis Proposal

  • Computer Science Department
  • Carnegie Mellon University
Thesis Proposals

Data-Intensive System Designs for High Performance and Capacity

The key contributions of my thesis are algorithms and data structures for storing and processing a large amount of fine-grained data on modern hardware, and software architectures that combine and tune these technologies together to build data-intensive systems that achieve high performance and use memory efficiently. This thesis will describe SILT and MICA, which are key-value stores providing a hash table-like interface, as examples that demonstrate how these algorithms, data structures, and software architectures apply to data-intensive system designs. SILT, which is based upon Entropy-Coded Tries that index items sorted by the hash of their keys, requires only 0.7 bytes per item in memory, serving requests at flash drive speeds of tens to hundreds of thousands of items per second. MICA, which uses lossy and lossless hash indexes, circular logs, an client-assisted hardware-based request direction, handles 65.6 million remote operations per second per server node for items stored in memory. My proposed work will apply an additional set of algorithms and data structures to strengthen SILT and MICA's benefits and improve their robustness on diverse hardware.

Thesis Committee:
David G. Andersen (Chair)
Andrew Pavlo
Michael Kaminsky
Eddie Kohler (Harvard University)

Thesis Document




For More Information, Please Contact: