Carnegie Mellon University
15-826 Multimedia Databases and Data Mining
Spring 2007 - C. Faloutsos

List of suggested projects

The projects are grouped according to their general theme. We also list the data and software available, to leverage your effort. More links and resources may be added in the future. Reminders:

SUGGESTED TOPICS

1. SPATIO/TEMPORAL AND STREAM MINING

1.1. [*] Disk access traffic patterns, and the Self-* project

1.2. [*] Similarity search in motion-capture sequences

2. GRAPHS - LARGE GRAPH MINING

2.1 Handling Large Graphs

2.2. Parallel graph mining

2.3. Finding frequent sub-graphs

2.4. Fast implementations of RWR (for gCap)

2.5. Large Graph Visualization

2.6. [*] Relational databases as graphs, and 'fuzzy queries'.

2.7. [*] E-bay fraud detection

3. GRAPHS - INFLUENCE  PROPAGATION, GENERATORS, MODELS

3.1. [*] Propagation of Influence/Information in Networks and weblogs ('blogs')

3.2. Virus propagation - SIS model

3.3. Virus propagation - SIR model

3.4. Finding models for Time Evolving Graphs

3.5. Generation of Realistic Labeled Graphs

4. MULTIMEDIA - BIOLOGICAL IMAGES

4.1. Indexing and Clustering for bio images

4.2. Distance function for 3-d protein images

5. MISCELLANEOUS

5.1. Fraud detection in on-line auctions - hijacked accounts

5.2. Auction fraud - detecting networks of 1-cent auctions


DATASETS

Unless explicitly mentioned, the datasets are either  'public' or 'owned' by the instructor; for the rest, we need to discuss about 'Non-disclosure agreements' (NDAs).

Time sequences

Spatial data

Images/video

Graph-like data

Miscellaneous:

SOFTWARE

Notes for the software: Before you modify any code, please contact the instructor - ideally, we would like to use these packages as black boxes.

BIBLIOGRAPHICAL RESOURCES:


Last modified Feb. 6, 2007, by Christos Faloutsos.