Carnegie Mellon University
15-826 Multimedia Databases and Data Mining
Fall 2011 - C. Faloutsos

List of suggested projects

The projects are grouped according to their general theme. We also list the data and software available, to leverage your effort. More links and resources may be added in the future. Reminders:

SUGGESTED TOPICS

0. HADOOP AND PARALLELISM

The projects below are mainly designed for a traditional, single-machine architecture. However, 'hadoop' allows relatively easy parallel execution, implementing the  map-reduce system  of Google [Dean + Ghemawat, OSDI'04]. 'Hadoop' is open source; we have a small cluster where we can give you an account, or  make some other arrangement.


1. HADOOP AND LARGE GRAPH MINING

1.1. [P] Large/parallel graph mining, possibly using 'hadoop'

 


1.2. [P] Anomaly detection in weighted and/or attributed graphs


1.3. [P] Weighted graphs over time


1.4. Graph similarity, summarization and approximation.



1.5 [P] Attention Routing


2. GRAPH GENERATORS

2.1. [P] Model fitting for the 'RTG'



2.2. `PaC' model for graph generation


3. VIRUS, TWEET, AND INFLUENCE PROPAGATION

3.1. [P] Shape, and timing of cascades - 'rise and fall'


3.2 [P] Competing viruses



4. SPATIO/TEMPORAL AND STREAM MINING

4.1 Co-evolving time series mining

 

DATASETS

Unless explicitly mentioned, the datasets are either  'public' or 'owned' by the instructor; for the rest, we need to discuss about 'Non-disclosure agreements' (NDAs).

Time sequences

Spatial data

Graph data

Miscellaneous:


SOFTWARE

Notes for the software: Before you modify any code, please contact the instructor - ideally, we would like to use these packages as black boxes.


BIBLIOGRAPHICAL RESOURCES:


Last modified Sept. 20, 2011, by Christos Faloutsos.