CMU - IR Discussion Series

Friday, January 19, 2007 - 12:00-1:00 pm, Newell-Simon Hall (NSH) 3002
Title: Using Graphs and Random Walks to Discover Latent Similarities in Text
Speaker: Gunes Erkan

Abstract:

Graph-based approaches to NLP have been quite popular in the past few years. In this talk, I will describe a particular graph-based representation of text collections where nodes are textual units (e.g. documents, paragraphs, or sentences) and the edges are induced by a similarity relation between the nodes. Such a graph is a nice abstraction for many NLP and IR problems. There are one-to-one mappings between ranking the sentence nodes and summarization, ranking the document nodes and document retrieval, partitioning the graph and document clustering or classification. Therefore, graphs enable us to reuse these graph-based methods for NLP problems. I will primarily focus on the ranking methods (such as PageRank-style algorithms) for summarization and nearest neighbor methods for document clustering/classification based on random walks. I will also present the impact of using different similarity measures on the graphs including the classical cosine and more IR-inspired language modeling measures. Using random walks to reiterate the similarities on a graph can give us even better approximations for the latent similarities between the nodes.

This is joint work with Dragomir Radev.

BIO: Gunes Erkan obtained both a BS and MS degrees in Computer Engineering from the Middle East Technical University in Ankara, Turkey. Since fall 2003, He is pursuing a Ph.D. in Computer Science at the University of Michigan under the supervision of Professor Dragomir Radev.