Carnegie Mellon University
15-826 Multimedia Databases and Data Mining
Spring 2005 - C. Faloutsos
Reading List
Required text
- [Textbook] C. Faloutsos Searching
Multimedia Databases by Content, Kluwer Academic Press, 1996. Evaluation
draft ( internal to CMU - gzipped postscript).
Recommended text
-
[HK] Jiawei Han and Micheline Kamber, Data Mining: Concepts
and Techniques, Morgan Kaufmann, 2000.
- [PTVF] William H. Press Saul A. Teukolsky
William T. Vetterling Brian P. Flannery Numerical Recipes in C
Cambridge University Press, 1992, 2nd Edition. On-line evaluation copy
- Undergraduate DB
textbook, for those who took a db class too long ago:
- Raghu Ramakrishnan, Johannes Gehrke, "Database Management
Systems," McGraw-Hill 2002 (3rd ed).
Foils:
In
pdf files
A. Multimedia Indexing
- Primary
key access methods
- Secondary key and spatial access methods
- A. Guttman R-Trees:
a Dynamic Index Structure for Spatial Searching, Proc. ACM
SIGMOD, June 1984, pp. 47-57, Boston, Mass.
- J. Orenstein, Spatial
Query Processing in an Object-Oriented Database System, Proc.
ACM SIGMOD, May, 1986, pp. 326-336, Washington D.C..
- Textbook, chapters 4 and 5.
- Fractals
- Ibrahim Kamel and Christos Faloutsos, Hilbert R-tree: An
improved R-tree using fractals Proc. of VLDB Conference, Santiago,
Chile, Sept. 12-15, 1994, pp. 500-509. (postscript
here)
- Christos Faloutsos and Ibrahim Kamel, Beyond Uniformity and
Independence: Analysis of R-trees Using the Concept of Fractal Dimension,
Proc. ACM SIGACT-SIGMOD-SIGART PODS, May 1994, pp. 4-13, Minneapolis,
MN, (postscript
here)
- Text and LSI
- Time sequences
- DSP and image databases
- Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jon Ashley,
Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin
Petkovic, David Steele and Peter Yanker Query
by Image and Video Content: the QBIC System IEEE Computer 28,
9,
Sep. 1995, pp. 23-32. (hard copy - in reserve)
- Journal
of Intelligent Inf. Systems, 3, 3/4, pp. 231-262, 1994 An earlier,
more technical version of the IEEE Computer '95 paper.
- FastMap: Textbook chapter 11; Also in: C.
Faloutsos and K.I. Lin FastMap: A Fast Algorithm for Indexing,
Data-Mining and Visualization of Traditional and Multimedia Datasets
ACM SIGMOD 95, pp. 163-174 postscript
of paper
- DFT/DCT: In PTVF ch. 12.1, 12.3, 12.4; in Textbook Appendix B.
- Wavelets: In PTVF ch. 13.10; in Textbook
Appendix C
- Karhunen-Loeve: in Textbook Appendix D.
- JPEG: Gregory K. Wallace, The
JPEG Still Picture Compression Standard, CACM, 34, 4, April
1991, pp. 31-44
- MPEG: D. Le Gall, MPEG:
a Video Compression Standard for Multimedia Applications CACM,
34, 4, April 1991, pp. 46-58
- Fractal compression: M.F. Barnsley and A.D. Sloan, A Better
Way to Compress Images, BYTE, Jan. 1988, pp. 215-223, (hard
copy
- in reserve)
- Textbook, chapter 9
B. Data mining
- Graph mining and social networks:
- Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos, On
Power-Law Relationships of the Internet Topology, SIGCOMM 1999.
- R. Albert, H. Jeong, and
A.-L.
Barabási, Diameter of
the World Wide Web, Nature,401, 130-131 (1999).
- Réka Albert and Albert-László
Barabási Statistical
mechanics of complex networks, Reviews of Modern Physics, 74, 47 (2002).
- Time series forecasting
- Statistics background: In PTVF pp. 620-621 and
ch. 14.4-14.5;
- AI background / Classification
- [HK] chapter 7.3
- Rakesh Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer and
Arun Swami An
Interval Classifier for Database Mining Applications VLDB Conf.
Proc. Vancouver, BC, Canada, Aug. 1992, pp. 560-573.
- M. Mehta, R. Agrawal and J. Rissanen, `SLIQ: A Fast
Scalable Classifier for Data Mining', Proc. of the Fifth Int'l
Conference on Extending Database Technology, Avignon, France, March
1996 (postscript)
- Data Mining in Databases:
- Data warehouses, OLAP and DataCubes: [HK],
ch. 2.
- Data reduction: [HK] chapter 3.4
- Association Rules:
- Cluster analysis: [HK] chapter 8.
- Miscellaneous (ICA, approximate counting)
- Jia-Yu Pan, Christos Faloutsos, Masafumi Hamamoto and Hiroyuki
Kitagawa:
AutoSplit: Fast and Scalable Discovery of Hidden Variables in Stream
and
Multimedia Databases, PAKDD, Sydney, Australia, May 2004.
- Christopher Palmer, Phillip Gibbons and Christos Faloutsos, ANF:
A Fast and Scalable Tool for Data Mining in Massive Graphs, KDD
2002, Edmonton, Alberta, Canada, July 2002
RECOMMENDED OPTIONAL READING
Additional, optional citations, that may be useful for your
project:
Multimedia indexing
- Spatial access methods:
- N. Beckmann, H.-P. Kriegel, R. Schneider B. Seeger The
R*-Tree: an Efficient and Robust Access Method for Points and Rectangles
ACM SIGMOD, May 1990, pp. 322-331 Atlantic City, NJ. (Deferred
splitting
in R-trees)
- Fractals
- B. Mandelbrot Fractal Geometry of Nature W.H. Freeman,
1977. (The classic book on fractals).
- Manfred Schroeder, Fractals, Chaos, Power Laws: Minutes
From an Infinite Paradise W.H. Freeman and Company, 1991. (An excellent introduction to fractals)
Data mining
- Time sequences
- George E.P. Box, Gwilym M. Jenkins and Gregory C. Reinsel, Time
Series Analysis: Forecasting and Control Prentice Hall, 1994 (3rd
Edition). (Time series forecasting - the classic approach. It also has
the algorithms for linear predictive coding.)
- Andreas S. Weigend and Neil A. Gerschenfeld, Time Series
Prediction: Forecasting the Future and Understanding the Past
Addison
Wesley, 1994. (Time series forecasting: non-linear/chaotic approaches)
- Graph mining:
- Tom Mitchell,
Machine Learning, McGraw Hill, 1997.
- John Ross Quinlan C4.5: Programs for Machine Learning
Morgan Kaufmann Publishers Inc., 1993. (Introduction to data mining,
with source code)
- Approximate counting (NEW:
4/25/2005)
- Efficient
and Tunable Similar Set Retrieval, by Aristides
Gionis, Dimitrios Gunopulos and Nikos Koudas, ACM SIGMOD, Santa
Barbara, California, May 21-24, 2001.
- New
sampling-based summary statistics for improving approximate query
answers, by Phillip B. Gibbons and Yossi Matias, ACM
SIGMOD, pp 331 - 342, Seattle, Washington, 1998.
Updated 4/25/2005 - by christos <at> cs.cmu.edu