Danai Koutra pronounced:
Selected publications (Full list)
Danai Koutra, U Kang, Jilles Vreeken, Christos Faloutsos. VoG:Summarizing and Understanding Large Graphs.
SDM 2014, Philadelphia, PA, April 2014.
Selected as one of the best papers of SDM'14.
Danai Koutra, Hanghang Tong, David Lubensky. BIG-ALIGN: Fast Bipartite Graph Alignment.
IEEE ICDM 2013, Dallas, TX, December 2013.
Danai Koutra, Joshua Vogelstein, Christos Faloutsos. DeltaCon: A Principled Massive-Graph Similarity Function. SDM 2013, Austin, TX, May 2013.
Danai Koutra, Tai-You Ke, U Kang, Duen Horng (Polo) Chau, Hsing-Kuo Kenneth Pao, and Christos Faloutsos. Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms. ECML PKDD, Athens, Greece, Sep. 2011.
Node and graph similarity: Theory and Applications. With Tina Eliassi-Rad and Christos Faloutsos. IEEE ICDM 2014, Shenzen, China, December 2014. (acceptance ratio: 22%)
Node similarity, graph similarity and matching: Theory and Applications. With Tina Eliassi-Rad and Christos Faloutsos. SDM 2014, Philadelphia, PA, April 2014. (over 100 researchers attended!)
How can we succinctly describe a million-node graph with a few simple sentences? How can we measure the 'importance' of a set of discovered subgraphs in a large graph? Our main ideas are to construct a 'vocabulary' of subgraph-types that often occur in real graphs (e.g., stars, cliques, chains), and from a set of subgraphs, find the most succinct description of a graph in terms of this vocabulary.
Specifically, our first insight is to best describe the structures in a graph using an enriched set of 'vocabulary' terms: cliques and near-cliques (which are typically considered by community detection methods), and also stars, chains and (near) bi-partite cores. The second insight is to formalize our goal as a lossless compression problem, and use the MDL principle. The best summary of a graph is the set of subgraphs that describes the graph most succinctly, i.e., compresses it best, and, thus, helps a human understand the main graph characteristics in a simple, non-redundant manner. The proposed algorithm, VoG, has the following input and output.
Input: a graph
Output: a set of possibly overlapping subgraphs that most succinctly describe the given graph, i.e., that explain as many of its edges in as simple possible terms.
How much did a network change since yesterday? How different is the wiring between Bob's brain (a left-handed male) and Alice's brain (a right-handed female)? Graph similarity with known node correspondence, i.e. the detection of changes in the connectivity of graphs, arises in numerous settings, such as temporal anomaly detection and protein-protein interaction. DELTACON is a principled, intuitive, and scalable algorithm that assesses the similarity between two graphs on the same nodes (e.g. employees of a company, customers of a mobile carrier). It has the following inputs and outputs:
Input: (a) two graphs with same nodes and different edge sets
(b) node correspondence
Output: similarity score s in [0,1], where 0 means completely
dissimilar and 1 means identical
Publication: DeltaCon: A Principled Massive-Graph Similarity Function.
Project: Graph Similarity with Attribution and Alignment.
What is the appropriate 2-d distribution to fit real, 2-d points in social networks? For example, how can we model the joint distribution of the number of messages vs. number of phone-calls that each customer makes?
Almond-DG is our proposed digitized 2-d distribution which matches our empirical observation in real-world social networks: super-linear relationships among tasks (variables -- e.g., number of messages and number of phone-calls), and log-logistic marginals. This distribution uses copulas, a powerful technique, which has been successfully used in survival models, financial risk management, and decision analysis.
If several friends of Smith have committed petty thefts, what would you say about Smith? Guilt-by-association methods combine weak signals to derive stronger ones, and have been extensively used for anomaly detection and classification in numerous settings (e.g., accounting fraud, cyber-security, calling-card fraud). Network effects are very powerful, resulting even in popular proverbs like "birds of a feather flock together". In social networks, obese people tend to have obese friends, happy people tend to make their friends happy too, and in general, people usually associate with like-minded friends with respect to politics, hobbies, religion etc. Thus, knowing the types of a few nodes in a network, (say, "honest" vs "dishonest"), we would have good chances to guess the types of the rest of the nodes.
Among the most successful guilt-by-association techniques is Belief Propagation, for which there are no convergence guarantees in loopy networks (= typical real-world networks). The proposed algorithm, FaBP, is a fast approximation of Belief Propagation that yields 2x speedup, equal or higher accuracy, and is guaranteed to converge. The inputs, outputs and assumptions of the algorithm are briefly described below.
Input: graph with n nodes and m edges; and few labeled nodes (e.g., possible classes red/green)
Output: class (red/green) for the rest of the nodes
Assuming: network effects (homophily / heterophily)
Input: csv file with (x,y,value) triplets
Output: heatmap for scatter data in log-log scale
Input: tab-separated file with one observation
per line (each column corresponds to a feature)
Output: the 1D distribution for each feature
all the pairwise 2D distributions
|Dec 14-17, 2014: Giving a tutorial on node and graph similarity at ICDM'14 @ Shenzen, China (with Tina Eliassi-Rad and Christos Faloutsos).|
|Nov 2-4, 2014: Invited to attend the Rising Stars in EECS Workshop @ UC Berkeley.|
|Oct 5-8, 2014: In Hawaii, to advertize Glance and explore UIST'14.|
|Sept 20-27, 2014: Selected to attend the 2nd Heidelberg Laureate Forum in Germany.|
|Sept 18-19, 2014: Invited talk at MLconf, in Atlanta.|
|Aug 23-28, 2014: at KDD, New York.|
|May 14, 2014: Events and Controversies in the news (MIT Technology Review, Technology.org)!|