CAREER: Bridging Databases and Computer Architecture: Optimizing DBMS for Deep Memory Hierarchies

IIS-0133686


Principal Investigator

Anastassia Ailamaki
Computer Science Department

Carnegie Mellon University
5000 Forbes Avenue

Pittsburgh, PA 15213

Phone: 412-268-7848
fax: 412-268-5574

natassa@cmu.edu
http://www.cs.cmu.edu/natassa

Keywords

Database system cache behavior

Cache buffer management

Memory hierarchy space management

Data placement on caches, memory, and disks

Data Locality

Project Summary

Database management systems are currently used as the supporting back-end for a large number of internet applications, and the dominant commercial software running on high-end enterprise servers. As processor and memory speeds grow further apart, database researchers face an important problem: the performance bottleneck is shifting away from I/O performance, and the data transfer time between the processor and the memory is becoming the real show-stopper. To alleviate the processor/memory performance gap, computer designers employ a hierarchy of cache memories in which each level trades off higher capacity for faster access times. Caches keep the most recently used memory items close to the processor to eliminate the long memory access latency. The key to high performance is to maximize cache utilization and to keep data that are likely to be referenced in the hierarchy. Yet, previous research has mainly focused on workload characterization studies and optimization of isolated algorithms. The research component of this proposal seeks to bridge database research to computer architecture by making database systems cache-resident — i.e., provide the performance illusion that the data is always present in the cache when the system needs it. We propose a systematic approach to eliminate unnecessary memory references, thereby optimizing database systems on modern processors with deep memory hierarchies. The approach incorporates workload characterization for both cache and I/O, as well as data placement and cache management techniques. The education component of this proposal seeks to bridge database system and computer architecture education, to raise the awareness of design issues in modern database systems, and to prepare the students with necessary skills to overcome these challenges. The proposed educational activities include: a revamped undergraduate and graduate database curriculum, a weekly database seminar to develop students’ presentation skills and critical ability, and involvement in various projects to expose students to how databases are being used in the real world.

Publications and Products

[1]   A. Ailamaki, D.J. DeWitt, and M.D. Hill., "Data Page Layouts for Relational Databases on Deep Memory Hierarchies", The VLDB Journal, vol. 11 (3), (2002), p. 198.

[2]   M. Wang, A. Ailamaki, and C. Faloutsos, "Capturing the Spatio-Temporal Behavior of Real Traffic Data", the 22nd edition of the IFIP WG 7.3 International Symposium on Computer Modeling, Measurement and Evaluation (Performance 2002), vol. , (2002), p. 147.

[3]   M. Shao and A. Ailamaki. “DBMbench: Microbenchmarking Database Systems in a Small, yet Real World”, in submission, 2003. Technical Report CMU-CS-03-161.

[4]   S. Papadomanolakis and A. Ailamaki. “AutoPart: Automating Schema Design for Large Scientific Databases Using Data Partitioning”, in submission, 2003. Technical Report CMU-CS-03-159.

[5]   A. Ailamaki and J. Hellerstein. “Exposing Undergraduate Students to Database System Internals”, Sigmod Record 32(3), September 2003.

Project Impact

We expect this project to direct information system research towards an architecture- and platform-conscious mentality, that will enable information management software to use the enormous power provided by the underlying computing platforms. In addition, bringing architecture knowledge into the field of databases will enable simpler and easier to handle benchmarks that computer architects will be able to use for better tuning their hardware towards the needs of information systems. The impact is also high in the inductrial world because the proposed techniques are easy to implement in large commercial systems, as they require modifications to a limited part of the code, and minimize interference with the rest of the system. Finally, improving performance of database systems on current and future computer architectures has a direct positive impact on the technology that fuels important Internet applications such as digital libraries, e-commerce, and reservation systems, used by millions of people every day.

Goals, Objectives and Targeted Activities                                                                                                                                                       

Research: Building on the PI's previous work on workload characterization and hardware behavior of modern database applications, in the second year of the grant we worked on (a) developing models to create realistic workloads, (b) optimizing query processing with cache performance in mind on modern uniprocessor and multiprocessor platforms, by designing new data placement methods on for disk page layouts and by altering the database system architecture to improve locality, and (c) automatically designing the data placement on disk for large scientific databases. Observations are reported in the technical reports and papers listed above.

 

Education: I introduced two new graduate courses and redesigned the undergraduate database course (in collaboration with Joe Hellerstein at Berkeley, see publication [5] above) in the CMU teaching curriculum. I also founded and organize the Database Systems Seminar at the CMU School of Computer Science. As well as the database group web site (http://www.db.cs.cmu.edu/).

Area Background

The project investigates a new research area: the interaction between the database software and the underlying computer's architecture. The work in the project requires familiarity with database management system architecture, query processing algorithms, processor and memory system microarchitecture.

Area References