The course material will be drawn both from "classic" papers and from the recent database literature. Copies of instructor's transparencies and notes, as well as copies of selected articles will be made available. We will cover a substantial number of the articles and material in the following required text:

  1. Readings in Database Systems, Third Edition - edited by Michael Stonebraker and Joe Hellerstein, Morgan Kaufmann Publisher, March 1998 Several of the papers in this book are available through the ACM digital library. The book will be on reserve in the library. Links to the papers that are available online will be provided in the course readings web page.
  2. If you have never taken a database course before, you should acquire a database textbook, for example Database System Concepts or Database Management Systems.These are introductory textbooks and provide overviews of the basic topics.
Other reference text includes:


Here is a list of papers and supplementary reading from textbooks that will be covered by the course. The main sources are the books suggested above, as well as the ACM digital library that is available to all CMU students. For the majority of the papers there are hyperlinks to the electronic version (PS or PDF), and the rest of them appear in Stonebraker/Hellerstein readings, 3rd edition (on reserve at the E&S library).

The list is divided in Sections.
Each Section contains a Background subsection with pointers to fundamental concepts/prerequisites for each section.
Items marked with an asterisk (*) are required reading.
The rest of the articles are recommended reading directly related to the basic material (e.g., more in-depth, followup or evaluation papers).

The Roots

Background: Entity-Relationship Model and Relational Model (from textbook), Stonebraker and Hellerstein, Introduction to the Section "The Roots", pages 1-4

  1. Astrahan, M. , et al., System R: Relational Approach to Database Management, ACM TODS, 1(2), 1976 (*)
  2. Chamberlin, D., et al., A History and Evaluation of System R, Communications of the ACM, 24(10), 1981 (*)
  3. Stonebraker, M., et al., (1976) The Design and Implementation of INGRES, ACM TODS, 1(3), 1976 (*)
  4. Stonebraker, M., (1980) Retrospection on a Database System, ACM TODS, 5(2), 1980

Concurrency Control

Background:Gray and Reuter, Sections 7.3 and 7.4

  1. Gray, J., et al., "Granularity of Locks and Degrees of Consistency in a Shared Database", IFIP Working Conference on Modelling of Database management Systems, AFIPS Press, 1976 (*)
  2. H. T. Kung and John T. Robinson, On Optimistic Methods for Concurrency Control, VLDB ACM TODS 6(2), June 1981, pp.213-226 (*)
  3. Philip L. Lehman, S. Bing Yao, Efficient Locking for Concurrent Operations on B-Trees, ACM TODS 6(4): 650-670, 1981 (*)
  4. Mohan, C., ARIES/KVL: A Key-Value Locking Method for Concurrency Control of Multiaction Transactions Operating on B-Tree Indexes, Proceedings of the 16th VLDB Conference, Brisbane, August 1990
  5. Srinivasan, V., and Carey, M., Performance of B-Tree Concurrency Control Algorithms, Proceedings of the ACM SIGMOD Conference, Denver, CO, June 1991.

Logging and Recovery

Background:Gray and Reuter, Chapters 9-11

  1. Franklin, M., et. al. Crash Recovery in Client-Server EXODUS, Proceedings of the ACM SIGMOD Conference, June 1992. pp. 165-175. Read Section 3 only for an overview of ARIES. (*)
  2. Mohan, C., et al., ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging, ACM TODS, 18(1), 1991 (*)
  3. Mohan, C., Repeating History Beyond ARIES, Proceedings of VLDB conference, 1999

Query Processing and Optimization

  1. Shapiro, L., Join Processing in Database Systems with Large Main Memories, ACM TODS, 11(3), September 1986 (*)
  2. DeWitt, D., and Naughton, J., "Dynamic Memory Hybrid Hash Join Algorithm", Handout (*)
  3. Selinger, P., et al., Access Path Selection in a Relational Database Management System, Proceedings of the ACM SIGMOD Conference, Boston, MA, 1979 (*)
  4. Ioannidis, Y., Kang, Y., Randomized Algorithms for Optimizing Large Join Queries, Proceedings of the ACM SIGMOD Conference, Atlantic City, NJ, May 1990
  5. Graefe, G., Dynamic query evaluation plans: Some course corrections?, IEEE Database Engineering 23(2), Jume 2000
  6. Chaudhuri, S., An Overview of Query Optimization in Relational Systems, Proceedings of the ACM PODS Conference,, Seattle, WA, 1998
  7. V. Poosala, P.J. Haas, Y. Ioannidis, and E.J. Shekita. Improved histograms for selectivity estimation of range predicates, Proceedings of the ACM SIGMOD Conference, 1996
  8. IEEE Database Engineering, Special issue on Query Processing in Commercial Database Systems, 16(4), Dec. 1993

Buffer Management and OS

Background: Gray and Reuter, Chapter 13, Ramakrishnan and Gehrke, Chapter V.16 and Section 3.3.2

  1. Chou, H., and DeWitt, D., An Evaluation of Buffer Management Strategies for Relational Database Systems, Proceedings of the 11th VLDB Conference, 1985 (*)
  2. O'Neil, E. , O'Neil, P., and Weikum, G., The LRU-K Replacement Algorithm for Database Disk Buffering, Proceedings of the ACM SIGMOD Conference, Washington, D.C., June 1993 (*)
  3. Stonebraker, M., Operating System Support for Database Management, Communications of the ACM, 24(7), 1981 (*)

Distributed Database Systems

  1. Mohan, Lindsay, and Obermark, Transaction Management in the R* Distributed Database Management System, TODS 11(4), 1986 (*)
  2. Gray, Helland, O'Neil, and Shasha, The Dangers of Replication and a Solution, Proceedings of the ACM SIGMOD Conference, 1996 (*)
  3. Stonebraker et. al., Mariposa: A Wide-Area Distributed Database, VLDB Journal 5, 1996 (*)
  4. Jeff Sidell, Paul M. Aoki, Adam Sah, Carl Staelin, Michael Stonebraker, Andrew Yu. Data Replication in Mariposa, In Proceedings of the ICDE, 1996

Spatial Applications

  1. A. Guttman, R-Trees: A Dynamic Index Structure for Spatial Searching, Proceedings of the ACM SIGMOD Conference, 1984 (*)
  2. H. Samet and W.G. Aref, Spatial Data Models and Query Processing, Modern Database Systems, 1995 (*)
  3. Y. Manolopoulos, E. Nardelli, A. Papadopoulos, and G. Proietti, QR-Tree: A Hybrid Spatial Data Structure, , 1996
  4. H. Samet, The Design and Analysis of Spatial Data Structures, Addison-Wesley, Reading, MA, 1990
  5. H. Samet, The Applications of Spatial Data Structures: Computer Graphics, Image Processing and GIS, Addison-Wesley, Reading, MA, 1990

Parallel Database Systems

  1. D. J. DeWitt and J. Gray, Parallel Database Systems: The Future of High Performance Database Processing, Communications of the ACM, June 1992 (*)
  2. D. J. DeWitt, S. Ghandeharizadeh, D. Schneider, H. Hsiao, A. Bricker, and R. Rasmussen, The GAMMA Database Machine Project, IEEE Transactions on Knowledge and Data Engineering, Vol. 2, No. 1, March 1990 (*)
  3. R. D. Sloan, A practical implementation of the data base machine-Teradata DBC/1012, Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences, January 1992.


Background: Common Sorting Algorithms (quicksort, binary sort, etc)
Ramakrishnan and Gehrke Chapter 11

  1. Nyberg, C., Barclay, T., Cvetanovic, Z., Gray, J., Lomet, D., AlphaSort: A RISC Machine Sort, Proceedings of the ACM SIGMOD Conference, May 1994 (*)
  2. Agarwal, R., A super scalar sort algorithm for RISC processors, Proceedings of the ACM SIGMOD Conference, June 1996
  3. Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, David E. Culler, Joseph M. Hellerstein and David A. Patterson, High-performance sorting on networks of workstations , Proceedings of the ACM SIGMOD Conference, May 1997

Benchmarking and Performance

Background:Gray's Benchmark handbook (on the web!), Chapters 1 and 3

  1. Anon et al., "A Measure of Transaction Processing Power" (in red book), Datamation, 31(7), 1985 (*)
  2. Eisenberg, A., and Melton, J., Standards in Practice, ACM SIGMOD Record, September 1998
  3. A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood, DBMSs on a Modern Processor: Where Does Time Go?, Proceedings of the VLDB Conference, September 1999 (*)
  4. Parthasarathy Ranganathan, Kourosh Gharachorloo, Sarita V. Adve, and Luiz Andre Barroso, of Database Workloads on Shared-Memory Systems with Out-of-Order Processors, Proceedings of ASPLOS, pages 307-318, 1998.
  5. Luiz Andre Barroso, Kourosh Gharachorloo, and Edouard Bugnion, Memory System Characterization of Commercial Workloads, Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 3-14, June 1998.

Data Mining

  1. J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirakesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals, Data Mining and Knowledge Discovery 1, 29 , 1997 (*)
  2. Agrawal, R. and R. Srikant, Fast Algorithms for Mining Association Rules, Proceedings of the VLDB Conference, 1994 (*)
  3. Rakesh Agrawal, Tomasz Imielinski and Arun Swami, Mining Association Rules Between Sets of Items in Large Databases, Proceedings of the ACM SIGMOD Conference, May 1993
  4. M. Mehta, R. Agrawal and J. Rissanen, SLIQ: A Fast Scalable Classifier for Data Mining, Proceedings of the Fifth International Conference on Extending Database Technology, Avignon, France, March 1996

Object-Oriented and Object-Relational Database Systems

Background:Atkinson, M., et al., The Object-Oriented Database System Manifesto (HTML version), First International Conference on Deductive and Object-Oriented Databases, Kyoto, Japan, 1989 (also appeared in Proceedings of ACM SIGMOD 1990)

  1. Lamb et al., The ObjectStore System, Communications of the ACM, 34(10), 1991 (*)
  2. Stonebraker, M., "Inclusion of New Types in Relational Database Systems", Proceedings of the IEEE Conference on Data Engineering, 1986 (*)