TITLE: Graph structures in data mining

INSTRUCTORS: Soumen Chakrabarti and Christos Faloutsos, CMU

INTENDED DURATION: 2.5 hours

BASIC INFORMATION

Until recently, machine learning and data mining techniques focused on
single, flat tables of feature vectors.  However, in the last few
years, there has been an explosion of data mining applications where
the underlying data has an irresistible graphical interpretation.  The
Web is a standard example by now, but there are many other domains
spanning the physical Internet, emails, USENET, blogs, keyword search
in XML and relational data, multi-relational mining, natural language
processing, and biological data.

This tutorial will give a concise overview of the important conceptual
and software tools for characterizing, modeling, and analyzing graph
structures common in some of the application domains listed above,
with pointers to active research areas, latest publications, and
available software.

AUDIENCE

Data management researchers and professionals who need to deal with
data domains naturally represented by graphs.

COVERAGE

Tools and concepts (mainly by Faloutsos):
* Graphs in combinatorics vs. graphs in practice, some examples
* Some generative models: preferential attachment and variants
* Bulk properties of real-world graphs: degree, pagerank
* SVD, Spectral analysis and implications; HITS, pagerank
* Resistive network models

Applications and extensions:
(mainly by Chakrabarti)
* Graph evolution models, Web e.g. effect of "Goolearchy"
* Biased pagerank and applications; keyword search in graphs
* Finding local frequent patterns by combinatorial search
* Applications to discovering schema and communities on the Web
* Graphical statistical models, Bayes nets and beyond
* Representing coupled distributions and belief propagation
* Applications to relational learning
* Graph models in NLP and information extraction

BIOS

Soumen Chakrabarti (http://www.cse.iitb.ac.in/~soumen/main/bio.html)
received a Ph.D. in Computer Science from U.C. Berkeley 1996.  He
worked at IBM Almaden Research Center from 1996 to 1999.  He is now
Associate Professor at IIT Bombay.  In Spring 2004 he is Visiting
Associate Professor at Carnegie-Mellon University.  He has published
in the WWW, SIGIR, SIGKDD, SIGMOD, VLDB, ICDE, SODA, STOC, SPAA and
other conferences as well as Scientific American, IEEE Computer, VLDB
and other journals. He holds eight US patents on Web-related
inventions. He has served as vice-chair or PC member for WWW, SIGIR,
SIGKDD, SIGMOD, VLDB, ICDE, SODA and other conferences, and guest
editor or editorial board member for DMKD and TKDE journals.  He is
also author of a new book on Web Mining.  His current research
interests include question answering, Web analysis, monitoring and
search, mining irregular and relational data, and textual data
integration.

Christos Faloutsos (http://www-2.cs.cmu.edu/~christos/short-bio.txt).
He is a Professor at Carnegie Mellon University.  He has received the
Presidential Young Investigator Award by the National Science
Foundation (1989), four ``best paper'' awards, and several teaching
awards.  He is a member of the executive committee of SIGKDD; he has
published over 120 refereed articles, one monograph, and holds four
patents.  His research interests include data mining for streams and
networks, fractals, indexing methods for spatial and multimedia bases,
and data base performance.