CS Thesis Proposal

  • Computer Science Department
  • Carnegie Mellon University
Thesis Proposals

Large Graph Mining and Sense-making

Given a large graph, like Facebook or Gmail, what can we say about its structure? Which are the most important structures in the graph? Are there many cliques, stars and chains or are there more complex structures? What do the structures reveal? Are there anomalies?  How does the graph change over time?

This thesis focuses on developing fast algorithms and models that promote the understanding of large graphs at different granularities: node and graph level. For this purpose we go after two directions: similarity analysis, and pattern mining and modeling. For similarity analysis, we mainly concentrate on finding node affinities and roles, developing graph similarity approaches when the node correspondence is known or unknown, and using graph alignment to reveal similarities between nodes across different networks. For pattern mining, the goal is to provide the domain expert with a succinct summary of one or more graphs with billion nodes and edges, and spotlight the "important" and "interesting" (anomalous) graph patterns. For that reason, we model features in a variety of graphs, and also propose a graph summarization approach that unveils semantically interesting structures.

Applications come from the domain of neuroscience, where we study brain connectivity graphs, as well as static or time-evolving social and collaboration networks (e.g., Twitter, Facebook, DBLP, IMDB, Enron, YahooWeb).

Thesis Committee:
Christos Faloutsos (Chair)
William Cohen
Roni Rosenfeld
Eric Horvitz (Microsoft Research, Redmond)

Thesis Summary

For More Information, Please Contact: