CAREER: Bridging Databases and Computer Architecture: Optimizing DBMS for
Deep Memory Hierarchies
IIS-0133686
Principal Investigator
Anastassia Ailamaki
Computer Science Department
Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, PA 15213
Phone: 412-268-7848
fax: 412-268-5574
natassa@cmu.edu
http://www.cs.cmu.edu/natassa
Keywords
Database system cache behavior
Cache buffer management
Memory hierarchy space management
Data placement on caches, memory, and disks
Data Locality
Project Summary
Database management systems are currently
used as the supporting back-end for a large number of internet applications,
and the dominant commercial software running on high-end enterprise servers. As
processor and memory speeds grow further apart, database researchers face an
important problem: the performance bottleneck is shifting away from I/O
performance, and the data transfer time between the processor and the memory is
becoming the real show-stopper. To alleviate the processor/memory performance
gap, computer designers employ a hierarchy of cache memories in which each
level trades off higher capacity for faster access times. Caches keep the most
recently used memory items close to the processor to eliminate the long memory
access latency. The key to high performance is to maximize cache utilization
and to keep data that are likely to be referenced in the hierarchy. Yet,
previous research has mainly focused on workload characterization studies and
optimization of isolated algorithms. The research component of this proposal
seeks to bridge database research to computer architecture by making database
systems cache-resident — i.e., provide the performance illusion that the data
is always present in the cache when the system needs it. We propose a
systematic approach to eliminate unnecessary memory references, thereby
optimizing database systems on modern processors with deep memory hierarchies.
The approach incorporates workload characterization for both cache and I/O, as
well as data placement and cache management techniques. The education component
of this proposal seeks to bridge database system and computer architecture
education, to raise the awareness of design issues in modern database systems,
and to prepare the students with necessary skills to overcome these challenges.
The proposed educational activities include: a revamped undergraduate and
graduate database curriculum, a weekly database seminar to develop students’
presentation skills and critical ability, and involvement in various projects
to expose students to how databases are being used in the real world.
Publications and Products
[1] A. Ailamaki,
D.J. DeWitt, and M.D. Hill., "Data Page Layouts for Relational Databases
on Deep Memory Hierarchies", The VLDB Journal, vol. 11 (3), (2002), p.
198.
[2]
M. Wang, A. Ailamaki, and C. Faloutsos, "Capturing the
Spatio-Temporal Behavior of Real Traffic Data", the 22nd edition of the
IFIP WG 7.3 International Symposium on Computer Modeling, Measurement and
Evaluation (Performance 2002), vol. , (2002), p. 147.
[3]
M. Shao and A. Ailamaki. “DBMbench: Microbenchmarking Database
Systems in a Small, yet Real World”, in submission, 2003. Technical Report
CMU-CS-03-161.
[4]
S. Papadomanolakis and A. Ailamaki. “AutoPart: Automating
Schema Design for Large Scientific Databases Using Data Partitioning”, in
submission, 2003. Technical Report CMU-CS-03-159.
[5]
A. Ailamaki and J. Hellerstein. “Exposing Undergraduate
Students to Database System Internals”, Sigmod Record 32(3), September 2003.
Project Impact
We expect this project to direct
information system research towards an architecture- and platform-conscious
mentality, that will enable information management software to use the enormous
power provided by the underlying computing platforms. In addition, bringing
architecture knowledge into the field of databases will enable simpler and
easier to handle benchmarks that computer architects will be able to use for
better tuning their hardware towards the needs of information systems. The
impact is also high in the inductrial world because the proposed techniques are
easy to implement in large commercial systems, as they require modifications to
a limited part of the code, and minimize interference with the rest of the
system. Finally, improving performance of database systems on current and
future computer architectures has a direct positive impact on the technology
that fuels important Internet applications such as digital libraries,
e-commerce, and reservation systems, used by millions of people every day.
Goals, Objectives and Targeted Activities
Research: Building on the PI's
previous work on workload characterization and hardware behavior of modern
database applications, in the second year of the grant we worked on (a)
developing models to create realistic workloads, (b) optimizing query
processing with cache performance in mind on modern uniprocessor and
multiprocessor platforms, by designing new data placement methods on for disk
page layouts and by altering the database system architecture to improve
locality, and (c) automatically designing the data placement on disk for large
scientific databases. Observations are reported in the technical reports and
papers listed above.
Education: I introduced two new graduate courses and redesigned
the undergraduate database course (in collaboration with Joe Hellerstein at
Berkeley, see publication [5] above) in the CMU teaching curriculum. I also
founded and organize the Database Systems Seminar at the CMU School of Computer
Science. As well as the database group web site (http://www.db.cs.cmu.edu/).
Area Background
The project investigates a new research area: the
interaction between the database software and the underlying computer's
architecture. The work in the project requires familiarity with database
management system architecture, query processing algorithms, processor and
memory system microarchitecture.
Area References
- A.
Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood, DBMSs on a
Modern Processor: Where Does Time Go?, Proceedings of the VLDB
Conference, September 1999
- A. Ailamaki, D. J. DeWitt, M.
D. Hill, and D. A. Wood, Weaving
Relations for Cache Performance, Best paper award in Proceedings of
the VLDB Conference, September 2001
- Parthasarathy Ranganathan,
Kourosh Gharachorloo, Sarita V. Adve, and Luiz Andre Barroso, Performance
of Database Workloads on Shared-Memory Systems with Out-of-Order
Processors, Proceedings of ASPLOS, pages 307-318, 1998.
- Luiz Andre Barroso, Kourosh
Gharachorloo, and Edouard Bugnion, Memory
System Characterization of Commercial Workloads, Proceedings of the
25th Annual International Symposium on Computer Architecture, pages
3-14, June 1998.
- K. Keeton, D. A. Patterson,
Y. Q. He, R. C. Raphael, and W. E. Baker, Performance
Characterization of a Quad Pentium Pro SMP Using OLTP Workloads, Proceedings
of the 25th International Symposium on Computer Architecture,
Barcelona, Spain, June 1998