Carnegie Mellon University
15-826: Multimedia Databases and Data Mining
Spring 2005 - C. Faloutsos
Syllabus
DESCRIPTION
The course covers advanced algorithms for learning, analysis, data
management
and visualization of large datasets. Topics include indexing for text
and
DNA databases, searching medical and multimedia databases by content,
fundamental
signal processing methods, compression, fractals in databases, data
mining,
privacy and security issues, rule discovery, data visualization, graph
mining, stream mining.
TOPICS TO BE COVERED
- Database topics:
- Traditional databases: Advanced hashing and multi-key access
methods,
for
main-memory and for disk-based data.
- Text databases: indexing text and DNA strings,
clustering,
information
filtering, LSI (singular value decomposition).
- Multimedia databases: Searching by content in signals:
Time
sequences, photographs and medical images, video clips, feature
extraction,
continuous media storage and delivery.
- Tools:
- Fundamental signal processing methods: Discrete Fourier
Transform,
wavelets,
JPEG and MPEG compression.
- Singular Value Decomposition: revisited
- Fractals in databases: Self-similarity/non-uniformity of real
datasets,
fractal dimensions, selectivity using fractals and multifractals,
fractal
image compression, self-similarity in web-traffic patterns.
- Data Mining:
- Graph mining: ``Laws'' in large graphs (power laws; 'small
world' phenomena); graph generators; social networks.
- Sensor and time series mining: linear and non-linear forecasting
- Review of Statistical methods,
- Review of AI-methods,
- Database methods - Massive datasets: Association rules;
Frequent
sets;
Single-pass learning algorithms; Information compression
and
reconstruction; Sampling; Condensed data representations; Datacubes;
Cube-trees;
Function finding.
- Security and Privacy Protection: Datafly, Scrub, Mu-Argus, and
k-Similar.
- Visualization of large data sets
- More tools: approximate counting algorithms; Independent
Component Analysis.
- OVERVIEW OF RECENT TOPICS: Mobile databases; Active Disks for
data
mining;
Web databases; Future directions.
PREREQUISITES: Introductory database course 15-415
(familiarity
with B-trees and Hashing), or permission of the instructor.
UNIVERSITY UNITS: 12
CORE UNITS: 1
TEXT
Copies of instructor's transparencies and notes, as well as copies of
selected
articles will be made available. The required text is
Recommended, but not required texts:
- William H. Press, Saul A. Teukolsky, William T.
Vetterling and
Brian P.
Flannery, Numerical
Recipes
in C, Cambridge University Press, 1992, 2nd Edition.
- Raghu Ramakrishnan, Johannes Gehrke, "Database
Management Systems,"
McGraw-Hill 2002 (3rd ed).
- Jiawei Han and Micheline Kamber, Data Mining: Concepts and
Techniques,
Morgan Kaufmann, 2000.
METHOD OF EVALUATION
The course involves
- A midterm (20%)
- Homeworks (10%)
- A Project (40%)
- A Final exam (30%)
Projects will be carried out in teams of 1-3. A detailed handout about
the project will be distributed at the beginning of the course, along
with
a list of suggested projects. The goal of the project is to give the
participants
the opportunity to tackle a large, interesting problem, which may lead
to a publication.