Today's social and internet networks contain millions or even billions of nodes, and copious amounts of side information (context) such as text, attribute, temporal, image and video data. A thorough analysis of a social network should consider both the graph and the associated side information, yet we also expect the algorithm to execute in a reasonable amount of time on even the largest networks.
Towards the goal of rich analysis on societal-scale networks, this thesis provides (1) modeling and algorithmic techniques for incorporating network context into existing network analysis algorithms based on statistical models, and (2) strategies for network data representation, model design, algorithm design and distributed multi-machine programming that, together, ensure scalability to very large networks.
The methods presented herein combine the flexibility of statistical models with key ideas and empirical observations from the data mining and social networks communities, and are supported by software libraries for cluster computing based on original distributed systems research. These efforts culminate in a novel mixed-membership triangle motif model that easily scales to large networks with over 100 million nodes on just a few cluster machines, and can be readily extended to accommodate network context using the other techniques presented in this thesis.
Eric Xing (Chair)
Mark S.l Handcock (UCLA)
diane [atsymbol] cs.cmu.edu