NEWS
- Any important info will be listed here.
Last updated 24.01.2007
HPC computing scales at 100% per year in total computing speed and total storage capacity and speed. Scaling the distributed file system to parallel access at terabytes per second, hundreds of thousands of file creates per second in the same billion-entry directory, with multiple failures per day experienced and recovered without losing any data, and giving good interactive response time, is leading to ever more asynchronous (that is, complex) implementations. More complex implementations designed for systems so large that they do not yet exist will generally take longer to write, debug, stabilize and tune, and will have latent problems uncovered only in the end-user site. But the pace of technology and the dollars invested does not allow for longer development times and unstable biggest-in-class installations. What is needed is better understanding of the problems of large scale distributed implementation problems and tools and processes to counter their instability.
In this course we will pick a state-of-the-art scaling problem -- huge and highly concurrent directories -- and develop and test multiple implementations using simulated supercomputers based on virtual machines. We will put the new directory implementations into PVFS, test with virtual machine simulation, then, time allowing, layer the emerging IETF NFSv4.1 "parallel NFS" implementation on top and test it at scale. Our goal is to develop directory implementations that are good enough to give back to PVFS and find and fix scale-related bugs in PVFS directories, PVFS itself, perhaps, and pNFS, for sure.
We will also spend classroom time on other tools and techniques for developing and testing large scale distributed file systems, with an eye to identifying promising research on this very hard problem.
Meeting Time/Place: MW 3:00-4:30, WeH 4615a
| Instructor: Garth Gibson office: WeH 8219 phone: x8-5890 office hours: email: garth@cs.cmu.edu |
Teaching Assistant: NONE |
Course Secretary: Angela Miller office: WeH 8215 phone: x8-6645 hours: email:amiller@cs.cmu.edu |
Members of this class are expected to have taken an operating systems course equivalent to CMU's 15-410 and achieved a grade of A or better. This includes familiarity as a user with an interactive operating system (e.g., Unix) and solid understanding of basic concepts in the design and implementation of operating systems. Students without 15-712 knowledge may also struggle.
849h is a graduate-level class, and thus operates differently from an undergraduate class; particularly interested and prepared undergraduates can participate, with explicit permission of the instructor.
This is a shared project course. This means that all class members are working on parts of the overall project. There will be individual project reports and a group paper.