Parallel Data Laboratory Talk
- Remote Access - Zoom
- Virtual Presentation - ET
- NATHAN BECKMANN
- Assistant Professor
- Computer Science Department and Department of Electrical and Computer Engineering
- Carnegie Mellon University
Making Data Access Faster and Cheaper via Ubiquitous Flash Caching
- 12:00 pm → Welcome and intro — Greg Ganger, Professor and Director, Parallel Data Laboratory (PDL)
- 12:10 pm → Speaker Presentation
Caches are critical to achieving good performance in datacenter applications. However, as data sizes continue to grow, caches themselves have grown to the point where storing them in DRAM is very expensive. There is a huge opportunity to save cost and energy by shifting caches to media like flash, but doing so comes with a host of challenges. This talk will cover recent and ongoing work in the Parallel Data Lab at CMU that solves these problems to enable flash caching even on the most challenging workloads. I will describe the CacheLib library, developed at Facebook, that makes it easy to spin-up new hybrid (i.e., DRAM + flash) caches and consolidates best practices in cache design. CacheLib is widely deployed at Facebook, and this has led to several important lessons learned from several years of production. I will then discuss two ongoing projects that aim to make flash caches widely applicable. First, flash caches add a new dimension to cache design because of their limited write budget -- i.e., the number of bytes that can be written without wearing out the device. We are designing new cache admission policies that "spend writes wisely" using a combination of algorithmic analysis and machine learning. Second, Kangaroo is a flash cache optimized for billions of tiny objects, an adversarial workload that requires either prohibitive DRAM size or flash write rate on traditional flash-cache designs. Kangaroo combines existing cache designs in a new way to improve hit ratios & reduce cost, while limiting DRAM and flash writes.
Nathan Beckmann is an assistant professor in the Computer Science Department and (by courtesy) the Electrical and Computer Engineering Department at Carnegie Mellon University. His work is about reducing data movement in computer systems, spanning datacenters to the Internet of Things, by keeping data closer to where it is needed. He earned his PhD from MIT in 2015 under the supervision of Daniel Sanchez. His awards include the George M Sprowls Award for “outstanding PhD thesis in Computer Science at MIT”, the NSF CAREER Award, Google Faculty Research Awards in 2017 and 2019, and the Google Research Scholar Award in 2021.
Zoom Participation. See announcement.