Carnegie Mellon University
15-826: Multimedia Databases and Data Mining
Spring 2017 - C. Faloutsos
The course covers advanced algorithms for learning, analysis, data
management and visualization of large datasets. Topics include
indexing for text and DNA databases, searching medical and
multimedia databases by content, fundamental signal processing
methods, compression, fractals in databases, data mining, privacy
and security issues, rule discovery, data visualization, graph
mining, stream mining.
TOPICS TO BE COVERED
PREREQUISITES: Introductory database course 15-415/615
(familiarity with B-trees and Hashing), or permission of the
- Database topics:
- Traditional databases: Advanced hashing and multi-key access
methods, for main-memory and for disk-based data.
- Text databases: indexing text and DNA strings,
clustering, information filtering, LSI (singular value
- Multimedia databases: Searching by content in signals:
Time sequences, photographs and medical images, video clips,
feature extraction, continuous media storage and
- Fundamental signal processing methods: Discrete Fourier
Transform, wavelets, JPEG and MPEG compression.
- Singular Value Decomposition: revisited
- Fractals in databases: Self-similarity/non-uniformity of real
datasets, fractal dimensions, selectivity using fractals and
multifractals, fractal image compression, self-similarity in
- Data Mining:
- Graph mining: ``Laws'' in large graphs (power laws; 'small
world' phenomena); graph generators; social networks.
- Sensor and time series mining: linear and non-linear
- Review of Statistical methods,
- Review of AI-methods,
- Database methods - Massive datasets: Association rules;
Frequent sets; Single-pass learning algorithms;
Information compression and reconstruction; Sampling; Condensed
data representations; Datacubes; Cube-trees; Function finding.
- Security and Privacy Protection.
- Visualization of large data sets
- More tools: approximate counting algorithms; Independent
- OVERVIEW OF RECENT TOPICS: trust and influence propagation;
UNIVERSITY UNITS: 12
CORE UNITS: 1
Copies of instructor's transparencies and notes, as well as copies
of selected articles will be made available. The required text is
Recommended, but not required texts:
- William H. Press, Saul A. Teukolsky, William T. Vetterling and
Brian P. Flannery, Numerical
Recipes in C, Cambridge University Press, 1992, 2nd
- Raghu Ramakrishnan, Johannes Gehrke, "Database Management
Systems," McGraw-Hill 2002 (3rd ed).
- Jiawei Han and Micheline Kamber, Data Mining: Concepts and
METHOD OF EVALUATION
The course involves
- A midterm (20%)
- Homeworks (10%) (hw1: 1%, hw2,3,4: 3% each)
- A Project (40%)
- A Final exam (30%)
- Projects will be carried out in teams of 2. A detailed
handout about the project will be distributed at the beginning of
the course, along with a list of suggested projects. The goal of
the project is to give the participants the opportunity to tackle a
large, interesting problem, which may lead to a publication, and/or
a large-size software system.
Last updated: Jan. 15, 2017, by Christos