Carnegie Mellon University
15-826 Multimedia Databases and Data Mining
Spring 2008 - C. Faloutsos
Final Exam Study Guide
Reminders:
- Exam duration: 3 hours, on Tue May 13, 8:30am-11:30 PH A18A (double-check at http://www.cmu.edu/hub/current-finals.pdf)
- All aids allowed, EXCEPT laptop (due to its wireless
connection)
- Please bring a calculator
- The exam will be comprehensive, with more emphasis on
the material after the midterm
- Extra office hours by instructor: Wed 5/7, 2-3pm; Fri 5/9
12-1pm
For your information:
- Several of the links are internal to CMU.
- The reading list below is a slightly modified version of the
original reading list. Namely, we are omitting DataCubes, OLAP,
ICA and approximation algorithms.
Required text
Recommended text
- [HK] Jiawei Han and
Micheline Kamber, Data Mining: Concepts
and Techniques, Morgan Kaufmann, 2000.
- [PTVF] William H. Press Saul A.
Teukolsky William T. Vetterling Brian P. Flannery Numerical
Recipes in C Cambridge University Press, 1992, 2nd Edition.
On-line evaluation copy
- Undergraduate DB textbook, for
those who took a db class too long ago:
Foils:
In pdf, from the course schedule
page.
A. Multimedia Indexing
- Primary key access methods
- Secondary key and spatial access methods
- A. Guttman
R-Trees: a Dynamic Index Structure for Spatial
Searching, Proc. ACM SIGMOD, June 1984, pp. 47-57, Boston,
Mass.
- J. Orenstein,
Spatial Query Processing in an Object-Oriented Database
System, Proc. ACM SIGMOD, May, 1986, pp. 326-336,
Washington D.C.
- Ibrahim Kamel and Christos Faloutsos,
Hilbert R-tree: An improved R-tree using fractals Proc.
of VLDB Conference, Santiago, Chile, Sept. 12-15, 1994, pp.
500-509.
- **NEW
2/3/2008 ** Roberto F. Santos
Filho, Agma Traina, Caetano Traina Jr., and Christos Faloutsos:
Similarity search without tears: the OMNI family of all-purpose
access methods ICDE, Heidelberg, Germany, April 2-6
2001.
- Textbook, chapters 4 and 5.
- Fractals
- Christos Faloutsos and Ibrahim Kamel,
Beyond Uniformity and Independence: Analysis of R-trees Using
the Concept of Fractal Dimension, Proc. ACM
SIGACT-SIGMOD-SIGART PODS, May 1994, pp. 4-13, Minneapolis,
MN.
- **NEW
2/5/2008 ** Alberto Belussi and Christos Faloutsos, Estimating
the Selectivity of Spatial Queries Using the `Correlation' Fractal
Dimension Proc. of VLDB, p. 299-310, 1995 (and
gzipped postscript )
- Power laws, lognormals etc: M. E. J. Newman, Power laws, Pareto
distributions and Zipf's law Contemporary Physics 46, 323-351
(2005) (local
pdf copy)
- Text and LSI
- Textbook, chapter 6
- Peter W. Foltz and Susan T. Dumais,
Personalized Information Delivery: an Analysis of Information
Filtering Methods, Comm. of ACM (CACM), 35, 12, Dec. 1992,
pp. 51-60.
- SVD: In PTVF ch. 2.6; Textbook Appendix D
- PageRank: Sergey Brin, Lawrence Page The Anatomy of a
Large-Scale Hypertextual Web Search Engine (1998) (local
pdf)
- HITS: Jon M. Kleinberg Authoritative Sources in a
Hyperlinked Environment JACM, 46,5 (1999) (local
pdf)
- Tensors: Tamara G. Kolda and Brett W. Bader.
Tensor decompositions and applications. Technical Report
SAND2007-6702, Sandia National Laboratories, Albuquerque, NM and
Livermore, CA, November 2007 (local
pdf copy )
- Time sequences
- DSP and image databases
- Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jon Ashley,
Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee,
Dragutin Petkovic, David Steele and Peter Yanker
Query by Image and Video Content: the QBIC System IEEE
Computer 28, 9, Sep. 1995, pp. 23-32.
- Journal
of Intelligent Inf. Systems, 3, 3/4, pp. 231-262, 1994 An
earlier, more technical version of the IEEE Computer '95
paper.
- FastMap: Textbook chapter 11; Also in: C.
Faloutsos and K.I. Lin
FastMap: A Fast Algorithm for Indexing, Data-Mining and
Visualization of Traditional and Multimedia Datasets ACM
SIGMOD 95, pp. 163-174
.
- DFT/DCT: In PTVF ch. 12.1, 12.3, 12.4; in
Textbook Appendix B.
- Wavelets: In PTVF ch. 13.10; in Textbook Appendix C
- Karhunen-Loeve: in Textbook Appendix D.
- JPEG: Gregory K. Wallace,
The JPEG Still Picture Compression Standard, CACM, 34,
4, April 1991, pp. 31-44
- MPEG: D. Le Gall,
MPEG: a Video Compression Standard for Multimedia
Applications CACM, 34, 4, April 1991, pp. 46-58
- Fractal compression: M.F. Barnsley and A.D. Sloan,
A Better Way to Compress Images, BYTE, Jan. 1988, pp.
215-223.
- Textbook, chapter 9
B. Data mining
- Graph mining and social networks:
- Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos,
On Power-Law Relationships of the Internet Topology,
SIGCOMM 1999.
- R. Albert, H. Jeong, and A.-L. Barabási,
Diameter of the World Wide Web Nature,
401, 130-131 (1999).
- Réka Albert and Albert-László
Barabási
Statistical mechanics of complex networks, Reviews of
Modern Physics, 74, 47 (2002).
- Jure Leskovec, Jon Kleinberg, Christos Faloutsos Graphs
over Time: Densification Laws, Shrinking Diameters and Possible
Explanations, KDD 2005, Chicago, IL, USA, 2005.
- D. Chakrabarti and C. Faloutsos, Graph Mining: Laws, Generators and
Algorithms, in ACM Computing Surveys, 38(1), 2006
(
pdf draft, internal to CMU)
- Statistics background: In PTVF pp. 620-621
and ch. 14.4-14.5;
- AI background / Classification
- Rakesh Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer and
Arun Swami
An Interval Classifier for Database Mining Applications
VLDB Conf. Proc. Vancouver, BC, Canada, Aug. 1992, pp.
560-573.
- M. Mehta, R. Agrawal and J. Rissanen, `
SLIQ: A Fast Scalable Classifier for Data Mining', Proc. of
the Fifth Int'l Conference on Extending Database Technology,
Avignon, France, March 1996.
- Data Mining in Databases:
Last modified: April 30, 2008, by Christos
Faloutsos