Thesis Defense

Mining Large Dynamic Graphs and Tensors


Kijung Shin

February 4, 2019, 11 am EST
Gates Hillman Complex (GHC) 4405



Speaker

Kijung Shin
Ph.D. Candidate, Computer Science Department
Carnegie Mellon University


Thesis Committee



Document

The write-up [.pdf] can be found here.


Slides

The slides [.pdf] can be found here.


Abstract

  Graphs are ubiquitous, representing a variety of information, ranging from who follows whom on online social networks to who reviews what on e-commerce sites. Many of these graphs are large (e.g., online social networks with over two billion active users) and dynamic (i.e., nodes and edges can be added and removed over time). Moreover, they are with rich side information (e.g., e-commerce reviews with timestamps, ratings, and text) and thus naturally modeled as tensors (i.e., multi-dimensional arrays).
  Given large dynamic graphs and tensors, how can we analyze their structure? How can we detect interesting anomalies? Lastly, how can we model behaviors of individuals in the data? My thesis focuses on these closely related questions, all of which are fundamental to understand massive evolving data on user behavior. That is, we develop scalable algorithms for mining large dynamic graphs and tensors, with a focus on three tasks:   To achieve the highest performance and scalability, our algorithms for the above tasks employ mathematical techniques (e.g., approximation and sampling), use distributed computing frameworks (e.g., MapReduce and MPI), and/or exploit pervasive patterns in real-world data (e.g., power-law degree distribution). We successfully apply them to massive datasets, including 20.6 billion social connections on LinkedIn, 1.47 billion follow relations on Twitter, 783 million hyperlinks between web pages, 483 million edits on Wikipedia, and a synthetic tensor with 1 trillion non-zero entries.



References

My thesis is based on the following publications:


My thesis is also based on the following preprints:



Open Source Software



Datasets (Graphs)


NameDescriptionSourceDownload
KarateClub Friendship Network KONECT and UCINET Link
Hamster Friendship Network KONECT Link
CatsterFriendship Network KONECT Link
YouTubeFriendship Network SNAP and MPI-SWS Link
FlickrFriendship Network KONECT and MPI-SWS Link
Flickr(L)Friendship Network KONECT and MPI-SWS Link
OrkutFriendship Network SNAP and MPI-SWS Link
Youtube(L) Friendship Network Network Repository and MPI SWS Link
LiveJournalFriendship Network SNAP and MPI-SWS Link
FriendsterFriendship Network KONECT Link
AdvogatoTrust Network KONECT Link
EpinionTrust Network Network Repository and SNAP Link
Protein Protein Interaction Network KONECT Link
TwitterSubscription Network KAIST Link
EmailEmail Network SNAP Link
Email(L)Email Network KONECT and CMU Link
StanfordWeb Graph SNAP Link
BerkStanWeb Graph SNAP Link
GoogleWeb Graph SNAP Link
NotreDameWeb Graph SNAP Link
Web (EU)Web Graph LAW
Web (UK)Web Graph LAW
CaidaInternet Topology SNAP and CAIDA Link
SkitterInternet Topology SNAP and CAIDA Link
HepThCitation Network SNAP and Cornell Link
HepPhCitation Network SNAP and KDD Cup Link
PatentCitation Network SNAP and NBER Link
DBLPCollaboration Network SNAP Link
HollywoodCollaboration Network LAW
AmazonCo-purchasing network SNAP Link


Datasets (Tensors)


Name Modes Entries Source Download
YoutubeUser X User X Date1 (Friend) or 0 KONECT Link
SMSUser X User X Timestamp # Messages NDA NDA
StackO.User X Post X Timestamp1 (Like) or 0 KONECT Link
Wiki (Kor)User X Page X Timestamp# Revisions Wikimedia Link
Wiki (Eng)User X Page X Timestamp# Revisions Wikimedia Link
Yelp User X Business X Date X Score # Reviews Yelp Link
AppMarket User X App X Date X Score # Reviews ODDS
Android User X App X Date X Score # Reviews UCSD Link
NetflixUser X Movie X Date X Score# Reviews Netflix Link
YahooM.User X Item X Date X Score# Reviews Yahoo Labs Link
MAGAuthor X Venue X Year X Keyword # Papers KDD Cup Link
DARPASrc IP X Dst IP X Timestamp # Connections Lincoln Lab Link
AirForceProtocol X Service X Flag X Src Bytes
X Dst Bytes X Counts X Srv Counts
# Connections UCI KDD Archive Link
LinkedInUser X Features X Timestamp 1 (Use) or 0 NDA NDA