Carnegie Mellon University
15-826 Multimedia Databases and Data Mining
Spring 2006 - C. Faloutsos
Final exam study guide
Reminders:
- Exam duration: 3 hours, on Friday May 12, 8:30-11:30, Wean Hall 5403.
- All aids allowed, EXCEPT laptop (due to its wireless
connection)
- The exam will be comprehensive, with more emphasis on the
material after the midterm
- Several of the links are internal to CMU.
- The reading list below is a slightly shrunk version of the
original reading list.
Required text
Recommended text
- [HK] Jiawei
Han and Micheline Kamber, Data Mining: Concepts
and Techniques, Morgan Kaufmann, 2000.
- [PTVF] William H. Press Saul A. Teukolsky
William T. Vetterling Brian P. Flannery Numerical Recipes in C
Cambridge University Press, 1992, 2nd Edition. On-line evaluation copy
- Undergraduate DB
textbook, for those who took a db class too long ago:
- Raghu Ramakrishnan, Johannes Gehrke, "Database Management
Systems," McGraw-Hill 2002 (3rd ed).
Foils:
In pdf
A. Multimedia Indexing
- Primary key access methods
- Secondary key and spatial access methods
- A. Guttman R-Trees:
a Dynamic Index Structure for Spatial Searching, Proc. ACM
SIGMOD, June 1984, pp. 47-57, Boston, Mass.
- J. Orenstein, Spatial
Query Processing in an Object-Oriented Database System, Proc.
ACM SIGMOD, May, 1986, pp. 326-336, Washington D.C..
- Textbook, chapters 4 and 5.
- Fractals
- Ibrahim Kamel and Christos Faloutsos, Hilbert
R-tree: An improved R-tree using fractals Proc. of VLDB
Conference, Santiago, Chile, Sept. 12-15, 1994, pp. 500-509.
- Christos Faloutsos and Ibrahim Kamel, Beyond
Uniformity and Independence: Analysis of R-trees Using the Concept of
Fractal Dimension, Proc. ACM SIGACT-SIGMOD-SIGART PODS, May
1994, pp. 4-13, Minneapolis, MN.
- Text and LSI
- Time sequences
- DSP and image databases
- Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jon Ashley,
Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin
Petkovic, David Steele and Peter Yanker Query
by Image and Video Content: the QBIC System IEEE Computer 28,
9, Sep. 1995, pp. 23-32. (hard copy - on reserve)
- Journal
of Intelligent Inf. Systems, 3, 3/4, pp. 231-262, 1994 An earlier,
more technical version of the IEEE Computer '95 paper.
- FastMap: Textbook chapter 11; Also in: C.
Faloutsos and K.I. Lin FastMap: A Fast Algorithm for Indexing,
Data-Mining and Visualization of Traditional and Multimedia Datasets
ACM SIGMOD 95, pp. 163-174.
- DFT/DCT: In PTVF ch. 12.1, 12.3, 12.4; in Textbook Appendix B.
- Wavelets: In PTVF ch. 13.10; in Textbook Appendix C
- Karhunen-Loeve: in Textbook Appendix D.
- JPEG: Gregory K. Wallace, The
JPEG Still Picture Compression Standard, CACM, 34, 4, April
1991, pp. 31-44
- MPEG: D. Le Gall, MPEG:
a Video Compression Standard for Multimedia Applications CACM,
34, 4, April 1991, pp. 46-58
- Fractal compression: M.F. Barnsley and A.D. Sloan, A
Better Way to Compress Images, BYTE, Jan. 1988, pp. 215-223. (hard
copy: on reserve)
- Textbook, chapter 9
B. Data mining
- Graph mining and social networks:
- Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos, On
Power-Law Relationships of the Internet Topology, SIGCOMM 1999.
- R. Albert, H. Jeong, and A.-L. Barabási, Diameter of
the World Wide Web, Nature, 401,
130-131 (1999).
- Réka Albert and Albert-László
Barabási Statistical
mechanics of complex networks, Reviews of Modern Physics, 74,
47 (2002).
- Time series forecasting
- Statistics background:
In PTVF pp. 620-621 and ch. 14.4-14.5;
- AI background /
Classification
- [HK] chapter 7.3
- Rakesh Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer and
Arun Swami An
Interval Classifier for Database Mining Applications VLDB Conf.
Proc. Vancouver, BC, Canada, Aug. 1992, pp. 560-573.
- M. Mehta, R. Agrawal and J. Rissanen, `SLIQ:
A Fast Scalable Classifier for Data Mining', Proc. of the Fifth
Int'l Conference on Extending Database Technology, Avignon, France,
March 1996.
- Data Mining in Databases:
- Data warehouses, OLAP and DataCubes: [HK],
ch. 2.
- Data reduction: [HK] chapter 3.4
- Association Rules:
- Cluster analysis: [HK] chapter 8.
- Miscellaneous
(approximate counting)
- Christopher Palmer, Phillip Gibbons and Christos Faloutsos, ANF:
A Fast and Scalable Tool for Data Mining in Massive Graphs, KDD
2002, Edmonton, Alberta, Canada, July 2002
- Efficient
and Tunable Similar Set Retrieval, by Aristides Gionis,
Dimitrios Gunopulos and Nikos Koudas, ACM SIGMOD, Santa Barbara,
California, May 21-24, 2001.
- New
sampling-based summary statistics for improving approximate query
answers, by Phillip B. Gibbons and Yossi Matias, ACM SIGMOD,
pp 331 - 342, Seattle, Washington, 1998.
Last modified May 2, 2006, by Christos Faloutsos