Carnegie Mellon University
15-826 Multimedia Databases and Data Mining
Fall 2014 - C. Faloutsos

List of suggested non-default projects

The projects are grouped according to their general theme. We also list the data and software available, to leverage your effort. More links and resources may be added in the future. Reminders:

SUGGESTED TOPICS

As mentioned earlier, people who take the class for their master's degree, are expected to choose  the  default project. It is well defined, with a lot of implementation, and rather predictable outcomes. See full description of the Graph Mining project here, in pdf.
Here we give a list of suggested, non-default projects. They are more open-ended, and they are more suitable for people who want to do research in data mining.

1.  GRAPH / TENSOR MINING

1.1. Adversarial Spam Injection

1.2. Spam Detection for Review Data





 

1.3. Random walk on tensors


1.4. Tensors on hadoop - 'sparse-3'

1.5. Tensor decomposition using RDBMS


2.  MODELING

2.1 Skewed, 2-d distributions - ``Almond-G'' and extensions

2.2 'Brain in a box'

DATASETS

Unless explicitly mentioned, the datasets are either  'public' or 'owned' by the instructor; for the rest, we need to discuss about 'Non-disclosure agreements' (NDAs).

Time sequences

Spatial data

Graph data - need NDA

Graph Data - public

Miscellaneous:


SOFTWARE

Notes for the software: Before you modify any code, please contact the instructor - ideally, we would like to use these packages as black boxes.


BIBLIOGRAPHICAL RESOURCES:


Last modified Sept. 15, 2014, by Christos Faloutsos.