Carnegie Mellon University
10-603/15-826(A): Multimedia Databases and Data Mining
Spring 2002 - C. Faloutsos
Syllabus
DESCRIPTION
The course covers advanced algorithms for learning, analysis, data management
and visualization of large datasets. Topics include indexing for text and
DNA databases, searching medical and multimedia databases by content, fundamental
signal processing methods, compression, fractals in databases, data mining,
privacy and security issues, rule discovery and data visualization.
TOPICS TO BE COVERED
-
Database topics:
-
Traditional databases: Advanced hashing and multi-key access methods, for
main-memory and for disk-based data.
-
Text databases: indexing text and DNA strings, clustering, information
filtering, LSI (singular value decomposition).
-
Multimedia databases: Searching by content in signals: Time
sequences, photographs and medical images, video clips, feature extraction,
continuous media storage and delivery.
-
Tools:
-
Fundamental signal processing methods: Discrete Fourier Transform, wavelets,
JPEG and MPEG compression.
-
Singular Value Decomposition: revisited
-
Fractals in databases: Self-similarity/non-uniformity of real datasets,
fractal dimensions, selectivity using fractals and multifractals, fractal
image compression, self-similarity in web-traffic patterns.
-
Data Mining:
-
Review of Statistical methods,
-
Review of AI-methods,
-
Database methods - Massive datasets: Association rules; Frequent sets;
Single-pass learning algorithms; Information compression and
reconstruction; Sampling; Condensed data representations; Datacubes; Cube-trees;
Function finding.
-
Security and Privacy Protection: Datafly, Scrub, Mu-Argus, and k-Similar.
-
Visualization of large data sets
-
OVERVIEW OF RECENT TOPICS: Mobile databases; Active Disks for data mining;
Web databases; Future directions.
PREREQUISITES: Introductory database course 15-415 (familiarity
with B-trees and Hashing), or permission of the instructor.
UNIVERSITY UNITS: 12
CORE UNITS: 1
TEXT
Copies of instructor's transparencies and notes, as well as copies of selected
articles will be made available. The required text is
Recommended, but not required texts:
-
William H. Press, Saul A. Teukolsky, William T. Vetterling and Brian P.
Flannery, Numerical Recipes
in C, Cambridge University Press, 1992, 2nd Edition.
-
Korth, H., Silbershatz, A., Database System Concepts, 2nd edition,
McGraw Hill Inc., 1991.
-
Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques,
Morgan Kaufmann, 2000.
METHOD OF EVALUATION
The course involves
-
A midterm (20%)
-
Homeworks (10%)
-
A Project (40%)
-
A Final exam (30%)
Projects will be carried out in teams of 1-3. A detailed handout about
the project will be distributed at the beginning of the course, along with
a list of suggested projects. The goal of the project is to give the participants
the opportunity to tackle a large, interesting problem, which may lead
to a publication.