Course readings

Watch for updates! This file is NOT YET FINALIZED.

Books

The course material will be drawn both from "classic" papers and from the recent database literature. Copies of instructor's transparencies and notes, as well as copies of selected articles will be made available. We will cover a substantial number of the articles and material in the following required text:

  1. Readings in Database Systems, Fourth Edition - edited by Michael Stonebraker and Joe Hellerstein, Morgan Kaufmann Publisher, March 1998. Lecture notes for the contents in the book can be found here. Several of the papers in this book are available through the ACM digital library.
  2. If you have never taken a database course before, you should acquire a database textbook, for example Database System Concepts or Database Management Systems.These are introductory textbooks and provide overviews of the basic topics.
Other reference text includes:

Papers

Here is a list of papers and supplementary reading from textbooks that will be covered by the course. The main sources are the books suggested above, as well as the ACM digital library that is available to all CMU students. For the majority of the papers there are hyperlinks to the electronic version (PS or PDF), and the rest of them appear in Stonebraker/Hellerstein readings, 3rd edition (on reserve at the E&S library).

The list is divided in Sections.
Each Section contains a Background subsection with pointers to fundamental concepts/prerequisites for each section. R&G refers to the third edition of the book Database Management Systems by Ramakrishnan and Gehrke.
Items marked with an asterisk (*) are required reading.
The rest of the articles are recommended reading directly related to the basic material (e.g., more in-depth, followup or evaluation papers).

The Roots

Background: Entity-Relationship Model and Relational Model (from a textbook such as R&G chapters 2 and 3 or the 15-415 lectures), and Stonebraker and Hellerstein, Introduction to the Section "The Roots", pages 1-4

  1. Astrahan, M. , et al., System R: Relational Approach to Database Management, ACM TODS, 1(2), 1976 (*)
  2. Chamberlin, D., et al., A History and Evaluation of System R, Communications of the ACM, 24(10), 1981
  3. Stonebraker, M., et al., (1976) The Design and Implementation of INGRES, ACM TODS, 1(3), 1976
  4. Stonebraker, M., (1980) Retrospection on a Database System, ACM TODS, 5(2), 1980

Concurrency Control

Background:Gray and Reuter, Sections 7.3 and 7.4 or Concurrency control chapter from any undergraduate database textbook (e.g., R&G chapters 16 and 17) or the corresponding 15-415 (e.g., 15-415 lectures on transaction management). Also Gray and Reuter, Section 15.4 "B-Trees" (Recommendation: Read the whole Chapter 15)

  1. Gray, J., et al., "Granularity of Locks and Degrees of Consistency in a Shared Database", IFIP Working Conference on Modelling of Database management Systems, AFIPS Press, 1976 (*)
  2. H. T. Kung and John T. Robinson, On Optimistic Methods for Concurrency Control, VLDB ACM TODS 6(2), June 1981, pp.213-226 (*)
  3. Philip L. Lehman, S. Bing Yao, Efficient Locking for Concurrent Operations on B-Trees, ACM TODS 6(4): 650-670, 1981 (*)
  4. Rakesh Agrawal, Michael J. Carey, Miron Livny. Concurrency Control Performance Modeling: Alternatives and Implications.. ACM Trans. Database Syst., 12(4), 1987, 609-654. (*)
  5. Mohan, C., ARIES/KVL: A Key-Value Locking Method for Concurrency Control of Multiaction Transactions Operating on B-Tree Indexes, Proceedings of the 16th VLDB Conference, Brisbane, August 1990
  6. Srinivasan, V., and Carey, M., Performance of B-Tree Concurrency Control Algorithms, Proceedings of the ACM SIGMOD Conference, Denver, CO, June 1991.

Logging and Recovery

Background:Gray and Reuter, chapters 9-11 or Recovery chapter from any undergraduate database textbook (such as R&G chapter 18) or the corresponding 15-415 (e.g., 15-415 lectures on logging and recovery).

  1. Franklin, M., et. al. Crash Recovery in Client-Server EXODUS, Proceedings of the ACM SIGMOD Conference, June 1992. pp. 165-175. Read Section 3 only for an overview of ARIES. (*)
  2. Mohan, C., et al., ARIES: A Transaction Recovery Method Supporting Fine-Granularity Locking and Partial Rollbacks Using Write-Ahead Logging, ACM TODS, 18(1), 1991 (*)
  3. Mohan, C., Repeating History Beyond ARIES, Proceedings of VLDB conference, 1999

Query Processing

Background:Chapters on query processing from a textbook (such as R&G chapters 12,14) or the corresponding 15-415 (e.g., 15-415 lectures on relational operators).

  1. Shapiro, L., Join Processing in Database Systems with Large Main Memories, ACM TODS, 11(3), September 1986 (*)
  2. DeWitt, D., and Naughton, J., Dynamic Memory Hybrid Hash Join Algorithm, Handout (*)
  3. Ron Avnur, Joseph M. Hellerstein. Eddies: Continuously Adaptive Query Processing. Proc. SIGMOD Conference, 2000, 261-272. (*)
  4. T. Urhan & M. Franklin, XJoin: A Reactively-Scheduled Pipelined Join Operator, IEEE Data Engineering Bulletin, June 2000, pp. 27-33 (*)
  5. Graefe, G., Query Evaluation Techniques for Large Databases, ACM Computing Surveys, 25(2), June 1993 (Read only Sections 1 through 5)
  6. Graefe, G., Dynamic query evaluation plans: Some course corrections?, IEEE Database Engineering 23(2), Jume 2000
  7. Goetz Graefe, The Value of Merge-Join and Hash-Join in SQL Server, Proceedings of the VLDB Conference, September 1999
  8. IEEE Database Engineering, Special issue on Adaptive Query Processing, 23(2), Dec. 1993
  9. IEEE Database Engineering, Special issue on Query Processing in Commercial Database Systems, 16(4), Dec. 1993

Query Optimization

Background: Gray and Reuter, Section 15.4 "B-Trees" (Recommendation: Read the whole Chapter 15), and a chapter on hash-based indexing such as R&G, Chapter 11. Also a chapter on optimization such as R&G, Chapter 15.

  1. Selinger, P., et al., Access Path Selection in a Relational Database Management System, Proceedings of the ACM SIGMOD Conference, Boston, MA, 1979 (*)
  2. Kabra, N., and DeWitt, D., Efficient Mid-query Reoptimization of Sub-optimal Query Execution Plans, Proceedings of the ACM SIGMOD Conference, Seattle, WA, 1998 (*)
  3. Stillger, M., Lohman, G., Markl, V., and Kandil, M., LEO - DB2's LEarning Optimizer, Proceedings of the 27th VLDB Conference, Roma, Italy, 2001 (*)
  4. Chaudhuri, S., An Overview of Query Optimization in Relational Systems, Proceedings of the ACM PODS Conference, Seattle, WA, 1998
  5. V. Poosala, P.J. Haas, Y. Ioannidis, and E.J. Shekita. Improved histograms for selectivity estimation of range predicates, Proceedings of the ACM SIGMOD Conference, 1996
  6. Ioannidis, Y., Kang, Y., Randomized Algorithms for Optimizing Large Join Queries, Proceedings of the ACM SIGMOD Conference, Atlantic City, NJ, May 1990

Buffer Management and OS

Background: Gray and Reuter, Chapter 13, and a chapter on data storage and files from a textbook such as R&G, Chapter 9

  1. Chou, H., and DeWitt, D., An Evaluation of Buffer Management Strategies for Relational Database Systems, Proceedings of the 11th VLDB Conference, 1985 (*)
  2. Megiddo, N., and Modha, D., ARC: A Self-Tuning, Low Overhead Replacement Cache, USENIX File & Storage Tech. Conf. (FAST), San Francisco, CA, March 2003 (*)
  3. O'Neil, E. , O'Neil, P., and Weikum, G., The LRU-K Replacement Algorithm for Database Disk Buffering, Proceedings of the ACM SIGMOD Conference, Washington, D.C., June 1993
  4. Johnson, T. and Shasha, D., 2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm, Proceedings of VLDB, 1994
  5. Stonebraker, M., Operating System Support for Database Management, Communications of the ACM, 24(7), 1981

Distributed Database Systems

  1. Williams, R. et al. R*: An Overview of the Architecture, Technical report RJ3325, IBM Research Lab, San Jose, CA, 1981 (*)
  2. Mohan, Lindsay, and Obermark, Transaction Management in the R* Distributed Database Management System, TODS 11(4), 1986 (*)
  3. Gray, Helland, O'Neil, and Shasha, The Dangers of Replication and a Solution, Proceedings of the ACM SIGMOD Conference, 1996 (*)
  4. Stonebraker et. al., Mariposa: A Wide-Area Distributed Database, VLDB Journal 5, 1996
  5. Jeff Sidell, Paul M. Aoki, Adam Sah, Carl Staelin, Michael Stonebraker, Andrew Yu. Data Replication in Mariposa, In Proceedings of the ICDE, 1996

Spatial Access Methods

Background: Gray and Reuter, Section 15.4 "B-Trees" (Recommendation: Read the whole Chapter 15), and a chapter on hash-based indexing such as R&G, Chapter 11.

  1. A. Guttman, R-Trees: A Dynamic Index Structure for Spatial Searching, Proceedings of the ACM SIGMOD Conference, 1985 (*)
  2. H. Samet and W.G. Aref, Spatial Data Models and Query Processing, Modern Database Systems, 1995 (*)
  3. Y. Manolopoulos, E. Nardelli, A. Papadopoulos, and G. Proietti, QR-Tree: A Hybrid Spatial Data Structure, , 1996
  4. H. Samet, The Design and Analysis of Spatial Data Structures, Addison-Wesley, Reading, MA, 1990
  5. H. Samet, The Applications of Spatial Data Structures: Computer Graphics, Image Processing and GIS, Addison-Wesley, Reading, MA, 1990

Parallel Database Systems

  1. D. J. DeWitt, S. Ghandeharizadeh, D. Schneider, H. Hsiao, A. Bricker, and R. Rasmussen, The GAMMA Database Machine Project, IEEE Transactions on Knowledge and Data Engineering, Vol. 2, No. 1, March 1990 (*)
  2. D. J. DeWitt and J. Gray, Parallel Database Systems: The Future of High Performance Database Processing, Communications of the ACM, June 1992 (*)
  3. Goetz Graefe. Encapsulation of Parallelism in the Volcano Query Processing System.. Proc. SIGMOD Conference, 1990, 102-111. (*)
  4. R. D. Sloan, A practical implementation of the data base machine-Teradata DBC/1012, Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences, January 1992.

Sorting

Background:Common Sorting Algorithms (quicksort, binary sort, etc), or a chapter on sorting from a textbook such as R&G, Chapter 13

  1. Nyberg, C., Barclay, T., Cvetanovic, Z., Gray, J., Lomet, D., AlphaSort: A RISC Machine Sort, Proceedings of the ACM SIGMOD Conference, May 1994 (*)
  2. Agarwal, R., A super scalar sort algorithm for RISC processors, Proceedings of the ACM SIGMOD Conference, June 1996
  3. Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, David E. Culler, Joseph M. Hellerstein and David A. Patterson, High-performance sorting on networks of workstations , Proceedings of the ACM SIGMOD Conference, May 1997

Benchmarking

Background:Gray's Benchmark handbook (on the web!), Chapters 1 and 3

  1. Anon et al., "A Measure of Transaction Processing Power" (in red book), Datamation, 31(7), 1985 (*)
  2. Eisenberg, A., and Melton, J., Standards in Practice, ACM SIGMOD Record, September 1998

Data Mining

  1. Agrawal, R. and R. Srikant, Fast Algorithms for Mining Association Rules, Proceedings of the VLDB Conference, 1994 (*)
  2. Rakesh Agrawal, Tomasz Imielinski and Arun Swami, Mining Association Rules Between Sets of Items in Large Databases, Proceedings of the ACM SIGMOD Conference, May 1993
  3. J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, M. Venkatrao, F. Pellow, and H. Pirakesh, Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals, Data Mining and Knowledge Discovery 1, 29 , 1997
  4. M. Mehta, R. Agrawal and J. Rissanen, SLIQ: A Fast Scalable Classifier for Data Mining, Proceedings of the Fifth International Conference on Extending Database Technology, Avignon, France, March 1996

Object-Oriented and Object-Relational Database Systems

Background:Atkinson, M., et al., The Object-Oriented Database System Manifesto (HTML version), First International Conference on Deductive and Object-Oriented Databases, Kyoto, Japan, 1989 (also appeared in Proceedings of ACM SIGMOD 1990)

  1. Lamb et al., The ObjectStore System, Communications of the ACM, 34(10), 1991 (*)
  2. Stonebraker, M. and Hellerstein, J., What Goes Around Comes Around, unpublished manuscript
  3. Stonebraker, M., "Inclusion of New Types in Relational Database Systems", Proceedings of the IEEE Conference on Data Engineering, 1986 (*)

Disks

  1. Gray, J., and Graefe, G., The Five Minute Rule Ten Years Later and Other Computer Storage Rules of Thumb, ACM SIGMOD Record, December 1997 (*)
  2. David A. Patterson, Garth A. Gibson, Randy H. Katz. A Case for Redundant Arrays of Inexpensive Disks (RAID). Proc. SIGMOD Conference, 1988, 109-116. (*)
  3. Ruemmler, C., and Wilkes, J., An Introduction to Disk Drive Modelling, IEEE Computer, 27 (3), March 1994
  4. Gray, J., and Shenoy, P., Rules of Thumb in Data Engineering, Proceedings of the IEEE Conference on Data Engineering, 2000.

DBMS on new hardware

  1. A. Ailamaki, D. J. DeWitt, M. D. Hill, and D. A. Wood, DBMSs on a Modern Processor: Where Does Time Go?, Proceedings of the VLDB Conference, September 1999 (*)
  2. A. Ailamaki, D. J. DeWitt, M. D. Hill, and M. Skounakis, Weaving Relations for Cache Performance, Proceedings of the VLDB Conference, September 2001 (*)
  3. , M. Shao, J. Schindler, S. W. Schlosser, A. Ailamaki, and G. R. Ganger, Clotho: Decoupling Memory Page Layout from Storage Organization, Proceedings of the 30th VLDB Conference, Toronto, Canada, August 2004 (*)
  4. K. Keeton, D. A. Patterson, Y. Q. He, R. C. Raphael, and W. E. Baker, Performance Characterization of a Quad Pentium Pro SMP Using OLTP Workloads, Proceedings of the 25th International Symposium on Computer Architecture, Barcelona, Spain, June 1998
  5. J. L. Lo, L. A. Barroso, S. J. Eggers, K. Gharachorloo, H. M. Levy, and S. S. Parekh. An analysis of database workload performance on simultaneous multithreaded processors, Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 39-50, June 1998.
  6. Parthasarathy Ranganathan, Kourosh Gharachorloo, Sarita V. Adve, and Luiz Andre Barroso, Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors, Proceedings of ASPLOS, pages 307-318, 1998.
  7. Luiz Andre Barroso, Kourosh Gharachorloo, and Edouard Bugnion, Memory System Characterization of Commercial Workloads, Proceedings of the 25th Annual International Symposium on Computer Architecture, pages 3-14, June 1998.

Stream DBs

  1. Motwani et al., Query Processing, Resource Management, and Approximation and in a Data Stream Management System. CIDR 2003. (*)
  2. Abadi et al., Aurora: A New Model and Architecture for Data Stream Management. VLDB Journal, August 2003. (*)

Main-memory Database Systems

  1. Hector Garcia-Molina and Kenneth Salem, Main memory database systems: an overview, TKDE 4(6), 1992 (*)
  2. Philip Bohannon et al., The architecture of the Dali main-memory storage manager, Journal of Multimedia Tools and Applications, 1997 (*)

    More references are available in Mengzhi's last slide.

Self-tuning Database systems

  1. Chaudhuri, S. and Weikum, G. Rethinking Database System Architecture: Towards a Self-tuning RISC-style Database System, Proceedings of the VLDB conference, September 2000 (*)

Distributed and Mobile Database Systems

  1. Swarup Acharya, Rafael Alonso, Michael J. Franklin, and Stanley B. Zdonik, Broadcast Disks: Data Management for Asymmetric Communications Environments, Proceedings of SIGMOD Conference 1995 (*)
  2. Daniel Barbara, Mobile Computing and Databases - A Survey, TKDE 11(1), 108-117, 1999 (*)

Advanced Query Optimization

  1. Markl, V., and Lohman, G., Learning Table Access Cardinalities with LEO, Proceedings of the ACM SIGMOD Conference, Madison, WI, 2002
  2. Kabra, N., and DeWitt, D., Efficient Mid-query Reoptimization of Sub-optimal Query Execution Plans, Proceedings of the ACM SIGMOD Conference, Seattle, WA, 1998
  3. T. K. Sellis. Multiple Query Optimization. ACM Transactions on Database Systems, 13(1):23-52, March 1988 (*)
  4. N. N. Dalvi, S. K. Sanghai, P. Roy, and S. Sudarshan. Pipelining in Multi-Query Optimization. In Proc. PODS, 2001

Web Data Management

  1. Amiri et al., DBProxy: A dynamic data cache for Web applications, IEEE Conference on Data Engineering, 2003. (*)
  2. Labrinidis and Roussopoulos, WebView Materialization, Proceedings of the ACM SIGMOD Conference, May 2000. (*)

Semistructured Data and XML

Background:S. Abiteboul, P. Buneman, D. Suciu, Data on the Web: From Relations to Semistructured Data and XML, Morgan Kaufmann, 2000. Chapters/sections: 2, 3.1, 3.2.1, 3.2.2, 3.3.1, 4.1, 4.2

  1. D. Suciu, An Overview of Semistructured Data, SIGACT News, 29(4):28-38, Dec 1998 (*)
  2. J. Shanmugasundaram, G. He, K. Tufte, C. Zhang, D. DeWitt, J. Naughton, Relational Databases for Querying XML Documents: Limitations and opportunities Proceedings of the 25th VLDB, 302-314, 1999 (*)
  3. J. Shanmugasundaram, E. Shekita, R.Barr, M. J. Carey, B. G. Lindsay, H. Pirahesh, B. Reinwald, Efficiently Publishing Relational Data as XML Documents, Proceedings of the 26th VLDB, 65-67, 2000
  4. Angela Bonifati, Stefano Ceri, Comparative Analysis of Five XML Query Languages, SIGMOD Record, vol 28, p. 68-79, March, 2000

Automating Physical Database Design

  1. Agrawal s., Chauuri S.,Narasayya V., Automated Selection of Materialized Views and Indexes for SQL Databases, VLDB 2000. (*)
  2. Agrawal S, Chaudhuri S., Das A., Narasayya V. Automating Layout of Relational Databases, ICDE, 2003
  3. Chaudhuri S., Gupta A.K., Narasayya V. Compressing SQL Workloads, SIGMOD,2002.
  4. Rao J., Zhang C. Lohman G., Megiddo N. Automating Physical Database Design in a Parallel Database System, SIGMOD, 2002.

Sensor Databases and Adaptive Query Processing

  1. Joseph M. Hellerstein, Peter J. Haas, and Helen J. Wang, Online Aggregation, SIGMOD, 1997.(*)
  2. Tolga Urhan and Michael J. Franklin, XJoin: A Reactively-Scheduled Pipelined Join Operator, Special issue on Adaptive Query Processing, 23(2), Dec. 1993 (*)
Supplemental reading (not required):
  1. Joseph M. Hellerstein, Michael J. Franklin, Sirish Chandrasekaran, Amol Deshpande, Kris Hildrum, Sam Madden, Vijayshankar Raman, Mehul A. Shah, Adaptive Query Processing: Technology in Evolution, Special issue on Adaptive Query Processing, 23(2), Dec. 1993
  2. Luc Bouganim, Françoise Fabret, and C. Mohan, A Dynamic Query Processing Architecture for Data Integration Systems, Special issue on Adaptive Query Processing, 23(2), Dec. 1993
  3. Zachary G. Ives, Alon Y. Levy, Daniel S. Weld, Daniela Florescu, Marc Friedman, Adaptive Query Processing for Internet Applications, Special issue on Adaptive Query Processing, 23(2), Dec. 1993

Data Streams

  1. Abadi et al. (New England group), Aurora: A New Model and Architecture for Data Stream Management. VLDB Journal, August 2003. (*)
  2. Motwani et al. (Stanford group), Query Processing, Resource Management, and Approximation and in a Data Stream Management System. CIDR 2003.
Supplemental reading (not required):
  1. Chandrasekaran et al. (Berkeley group), TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. CIDR 2003.
  2. Golab Ozsu. Issues in Data Stream Management. SIGMOD Record, June 2003.
  3. Chen et al., NiagaraCQ: A Scalable Continuous Query System for Internet Databases SIGMOD 2000.
  1. Carney et al. Operator Scheduling in a Data Stream Manager. VLDB 2003. (*)
  2. Babcock, Babu, Datar and Motwani. Chain: Operator Scheduling for Memory Minimization in Data Stream Systems. SIGMOD 2003.
Supplemental reading (not required):
  1. Hammad, Franklin, Aref and Elmagarmid. Scheduling for Shared Window Joins over Data Streams. VLDB 2003.
Supplementary Texts