15-849: Advanced Storage Systems: HPC Storage
Spring 2007

PAGE ALWAYS UNDER REVISION

OVERVIEW

HPC computing scales at 100% per year in total computing speed and total storage capacity and speed. Scaling the distributed file system to parallel access at terabytes per second, hundreds of thousands of file creates per second in the same billion-entry directory, with multiple failures per day experienced and recovered without losing any data, and giving good interactive response time, is leading to ever more asynchronous (that is, complex) implementations. More complex implementations designed for systems so large that they do not yet exist will generally take longer to write, debug, stabilize and tune, and will have latent problems uncovered only in the end-user site. But the pace of technology and the dollars invested does not allow for longer development times and unstable biggest-in-class installations. What is needed is better understanding of the problems of large scale distributed implementation problems and tools and processes to counter their instability.

In this course we will pick a state-of-the-art scaling problem -- huge and highly concurrent directories -- and develop and test multiple implementations using simulated supercomputers based on virtual machines. We will put the new directory implementations into PVFS, test with virtual machine simulation, then, time allowing, layer the emerging IETF NFSv4.1 "parallel NFS" implementation on top and test it at scale. Our goal is to develop directory implementations that are good enough to give back to PVFS and find and fix scale-related bugs in PVFS directories, PVFS itself, perhaps, and pNFS, for sure.

We will also spend classroom time on other tools and techniques for developing and testing large scale distributed file systems, with an eye to identifying promising research on this very hard problem.

NEWS

GENERAL INFORMATION

Meeting Time/Place: MW 3:00-4:30, WeH 4615a

Contacts

Instructor: 
Garth Gibson
office: WeH 8219
phone: x8-5890
office hours:
email: garth@cs.cmu.edu
Teaching Assistant:
NONE
Course Secretary: 
Angela Miller
office: WeH 8215
phone: x8-6645
hours: 
email:amiller@cs.cmu.edu

PREREQUISITES

Members of this class are expected to have taken an operating systems course equivalent to CMU's 15-410 and achieved a grade of A or better. This includes familiarity as a user with an interactive operating system (e.g., Unix) and solid understanding of basic concepts in the design and implementation of operating systems. Students without 15-712 knowledge may also struggle.

849h is a graduate-level class, and thus operates differently from an undergraduate class; particularly interested and prepared undergraduates can participate, with explicit permission of the instructor.

COMPONENTS

TOPICS PLANNED

PROJECT

This is a shared project course. This means that all class members are working on parts of the overall project. There will be individual project reports and a group paper.

GRADING

CAVEAT

* Everything here is subject to change.

^TOP

Last updated 24.01.2007 | ©2006 Carnegie Mellon University