Carnegie Mellon University
15-826 Multimedia Databases and Data Mining
Fall 2012 - C. Faloutsos

List of suggested projects

The projects are grouped according to their general theme. We also list the data and software available, to leverage your effort. More links and resources may be added in the future. Reminders:

SUGGESTED TOPICS

People who take the class for their master's degree, are strongly recommended to choose one of the two default projects, with the first one being the most recommended. They are both well defined, with a lot of implementation, and rather predictable outcomes.

The rest of the projects are more open-ended, and they are more suitable for people who want to do research in data mining.

1. DEFAULT PROJECTS - recommended for people in M.Sc. programs.

1.1  Default project #1: UCR insect dataset


1.2 Default project #2: Graph mining using RDBMS


2. GRAPH MINING

2.1 Anomaly detection and attribution


2.2 Belief propagation in large graphs



2.3 Parallel graph mining using hadoop


2.4 Non-negative matrix factorization with hadoop and SGD

 



3. SPATIO/TEMPORAL AND STREAM MINING

3.1 Guess the next flu spike: Co-evolving time series mining

 

DATASETS

Unless explicitly mentioned, the datasets are either  'public' or 'owned' by the instructor; for the rest, we need to discuss about 'Non-disclosure agreements' (NDAs).

Time sequences

Spatial data

Graph data

Miscellaneous:


SOFTWARE

Notes for the software: Before you modify any code, please contact the instructor - ideally, we would like to use these packages as black boxes.


BIBLIOGRAPHICAL RESOURCES:


Last modified Sept. 17, 2012, by Christos Faloutsos.