Kijung's Thesis Defense

Speaker

Kijung Shin
Ph.D. Candidate, Computer Science Department
Carnegie Mellon University

Thesis Committee

Prof. Christos Faloutsos, Carnegie Mellon University (Chair)
Prof. Tom M. Mitchell, Carnegie Mellon University
Prof. Leman Akoglu, Carnegie Mellon University
Prof. Philip S. Yu, University of Illinois at Chicago

Document

The write-up [.pdf] can be found here.

Slides

The slides [.pdf] can be found here.

Abstract

Graphs are ubiquitous, representing a variety of information, ranging from who follows whom on online social networks to who reviews what on e-commerce sites. Many of these graphs are large (e.g., online social networks with over two billion active users) and dynamic (i.e., nodes and edges can be added and removed over time). Moreover, they are with rich side information (e.g., e-commerce reviews with timestamps, ratings, and text) and thus naturally modeled as tensors (i.e., multi-dimensional arrays).
Given large dynamic graphs and tensors, how can we analyze their structure? How can we detect interesting anomalies? Lastly, how can we model behaviors of individuals in the data? My thesis focuses on these closely related questions, all of which are fundamental to understand massive evolving data on user behavior. That is, we develop scalable algorithms for mining large dynamic graphs and tensors, with a focus on three tasks:

Structure Analysis: We build one-pass, sublinear-space algorithms that incrementally estimate the triangle count, which is an important connectivity measure, in large dynamic graphs. In particular, our distributed algorithm yields up to 39X more accurate estimates faster than a baseline. We also develop distributed and out-of-core algorithms for succinctly but accurately summarizing the structure of large graphs and tensors. They summarize over 25X larger data without quality loss than their best competitors.
Anomaly Detection: We develop near-linear time approximation algorithms for detecting unusually dense subgraphs and subtensors, which signal notable anomalies such as 'edit wars' on Wikipedia and fake followers on Twitter. Especially, our tensor algorithm is up to 114X faster without accuracy loss than the previously best heuristic. We also extend it for distributed or dynamic data with the same approximation guarantee.
Behavior Modeling: We design game-theoretic models for purchases of individuals in social networks and a fast algorithm for finding Nash equilibria of the models. In addition, we develop a stage model for the progression of individuals and a distributed optimization algorithm for fitting the model to behavior logs with trillions of records. Using our tools, we measure social inefficiency regarding purchase of sharable goods and discover progression patterns of LinkedIn users.

To achieve the highest performance and scalability, our algorithms for the above tasks employ mathematical techniques (e.g., approximation and sampling), use distributed computing frameworks (e.g., MapReduce and MPI), and/or exploit pervasive patterns in real-world data (e.g., power-law degree distribution). We successfully apply them to massive datasets, including 20.6 billion social connections on LinkedIn, 1.47 billion follow relations on Twitter, 783 million hyperlinks between web pages, 483 million edits on Wikipedia, and a synthetic tensor with 1 trillion non-zero entries.

References

My thesis is based on the following publications:

M-Zoom: Fast Dense-Block Detection in Tensors with Quality Guarantees.
Kijung Shin, Bryan Hooi, and Christos Faloutsos.
ECML/PKDD 2016.

CoreScope: Graph Mining Using k-Core Analysis - Patterns, Anomalies and Algorithms.
Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos.
ICDM 2016.

D-Cube: Dense-Block Detection in Terabyte-Scale Tensors.
Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos.
WSDM 2017.

S-HOT: Scalable High-Order Tucker Decomposition.
Jinoh Oh, Kijung Shin, Evangelos E. Papalexakis, Christos Faloutsos, and Hwanjo Yu.
WSDM 2017.

DenseAlert: Incremental Dense-Subtensor Detection in Tensor Streams.
Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos.
KDD 2017.

Why You Should Charge Your Friends for Borrowing Your Stuff.
Kijung Shin, Euiwoong Lee, Dhivya Eswaran, and Ariel D. Procaccia.
IJCAI 2017.

WRS: Waiting Room Sampling for Accurate Triangle Counting in Real Graph Streams.
Kijung Shin.
ICDM 2017.

Fast, Accurate and Flexible Algorithms for Dense Subtensor Mining.
Kijung Shin, Bryan Hooi, and Christos Faloutsos.
TKDD Journal 2018.

Patterns and Anomalies in k-Cores of Real-World Graphs with Applications.
Kijung Shin, Tina Eliassi-Rad, and Christos Faloutsos.
KAIS Journal 2018.

Discovering Progression Stages in Trillion-Scale Behavior Logs.
Kijung Shin, Mahdi Shafiei, Myunghwan Kim, Aastha Jain, and Hema Raghavan.
WWW 2018.

Tri-Fly: Distributed Estimation of Global and Local Triangle Counts in Graph Streams.
Kijung Shin, Mohammad Hammoud, Euiwoong Lee, Jinoh Oh, and Christos Faloutsos.
PAKDD 2018.

Think before You Discard: Accurate Triangle Counting in Graph Streams with Deletions.
Kijung Shin, Jisu Kim, Bryan Hooi, and Christos Faloutsos.
ECML/PKDD 2018.

SWeG: Lossless and Lossy Summarization of Web-Scale Graphs.
Kijung Shin, Amol Ghoting, Myunghwan Kim and Hema Raghavan.
WWW 2019.

My thesis is also based on the following preprints:

DiSLR: Distributed Sampling with Limited Redundancy For Triangle Counting in Graph Streams.
Kijung Shin, Euiwoong Lee, Jinoh Oh, Mohammad Hammoud, and Christos Faloutsos.

Out-of-Core and Distributed Algorithms for Dense Subtensor Mining.
Kijung Shin, Bryan Hooi, Jisu Kim, and Christos Faloutsos.

Open Source Software

WRS (Chapter 4)
Tri-Fly (Chapter 5)
ThinkD (Chapter 6)
S-HOT (Chapter 8)
Core-A & Core-D & Core-S (Chapter 9)
M-Zoom (Chapter 11)
D-Cube (Chapter 12)
DenseStream & DenseAlert (Chapter 13)

Datasets (Graphs)

Name	Description	Source	Download
KarateClub	Friendship Network	KONECT and UCINET	Link
Hamster	Friendship Network	KONECT	Link
Catster	Friendship Network	KONECT	Link
YouTube	Friendship Network	SNAP and MPI-SWS	Link
Flickr	Friendship Network	KONECT and MPI-SWS	Link
Flickr(L)	Friendship Network	KONECT and MPI-SWS	Link
Orkut	Friendship Network	SNAP and MPI-SWS	Link
Youtube(L)	Friendship Network	Network Repository and MPI SWS	Link
LiveJournal	Friendship Network	SNAP and MPI-SWS	Link
Friendster	Friendship Network	KONECT	Link
Advogato	Trust Network	KONECT	Link
Epinion	Trust Network	Network Repository and SNAP	Link
Protein	Protein Interaction Network	KONECT	Link
Twitter	Subscription Network	KAIST	Link
Email	Email Network	SNAP	Link
Email(L)	Email Network	KONECT and CMU	Link
Stanford	Web Graph	SNAP	Link
BerkStan	Web Graph	SNAP	Link
Google	Web Graph	SNAP	Link
NotreDame	Web Graph	SNAP	Link
Web (EU)	Web Graph	LAW
Web (UK)	Web Graph	LAW
Caida	Internet Topology	SNAP and CAIDA	Link
Skitter	Internet Topology	SNAP and CAIDA	Link
HepTh	Citation Network	SNAP and Cornell	Link
HepPh	Citation Network	SNAP and KDD Cup	Link
Patent	Citation Network	SNAP and NBER	Link
DBLP	Collaboration Network	SNAP	Link
Hollywood	Collaboration Network	LAW
Amazon	Co-purchasing network	SNAP	Link

Datasets (Tensors)

Name	Modes	Entries	Source	Download
Youtube	User X User X Date	1 (Friend) or 0	KONECT	Link
SMS	User X User X Timestamp	# Messages	NDA	NDA
StackO.	User X Post X Timestamp	1 (Like) or 0	KONECT	Link
Wiki (Kor)	User X Page X Timestamp	# Revisions	Wikimedia	Link
Wiki (Eng)	User X Page X Timestamp	# Revisions	Wikimedia	Link
Yelp	User X Business X Date X Score	# Reviews	Yelp	Link
AppMarket	User X App X Date X Score	# Reviews	ODDS
Android	User X App X Date X Score	# Reviews	UCSD	Link
Netflix	User X Movie X Date X Score	# Reviews	Netflix	Link
YahooM.	User X Item X Date X Score	# Reviews	Yahoo Labs	Link
MAG	Author X Venue X Year X Keyword	# Papers	KDD Cup	Link
DARPA	Src IP X Dst IP X Timestamp	# Connections	Lincoln Lab	Link
AirForce	Protocol X Service X Flag X Src Bytes X Dst Bytes X Counts X Srv Counts	# Connections	UCI KDD Archive	Link
LinkedIn	User X Features X Timestamp	1 (Use) or 0	NDA	NDA

Thesis Defense

Mining Large Dynamic Graphs and Tensors

Kijung Shin

February 4, 2019, 11 am EST
Gates Hillman Complex (GHC) 4405

Speaker

Thesis Committee

Document

Slides

Abstract

References

Open Source Software

Datasets (Graphs)

Datasets (Tensors)

Thesis Defense

Mining Large Dynamic Graphs and Tensors

Kijung Shin

February 4, 2019, 11 am EST Gates Hillman Complex (GHC) 4405

Speaker

Thesis Committee

Document

Slides

Abstract

References

Open Source Software

Datasets (Graphs)

Datasets (Tensors)

February 4, 2019, 11 am EST
Gates Hillman Complex (GHC) 4405