TITLE: Graph structures in data mining INSTRUCTORS: Soumen Chakrabarti and Christos Faloutsos, CMU INTENDED DURATION: 2.5 hours BASIC INFORMATION Until recently, machine learning and data mining techniques focused on single, flat tables of feature vectors. However, in the last few years, there has been an explosion of data mining applications where the underlying data has an irresistible graphical interpretation. The Web is a standard example by now, but there are many other domains spanning the physical Internet, emails, USENET, blogs, keyword search in XML and relational data, multi-relational mining, natural language processing, and biological data. This tutorial will give a concise overview of the important conceptual and software tools for characterizing, modeling, and analyzing graph structures common in some of the application domains listed above, with pointers to active research areas, latest publications, and available software. AUDIENCE Data management researchers and professionals who need to deal with data domains naturally represented by graphs. COVERAGE Tools and concepts (mainly by Faloutsos): * Graphs in combinatorics vs. graphs in practice, some examples * Some generative models: preferential attachment and variants * Bulk properties of real-world graphs: degree, pagerank * SVD, Spectral analysis and implications; HITS, pagerank * Resistive network models Applications and extensions: (mainly by Chakrabarti) * Graph evolution models, Web e.g. effect of "Goolearchy" * Biased pagerank and applications; keyword search in graphs * Finding local frequent patterns by combinatorial search * Applications to discovering schema and communities on the Web * Graphical statistical models, Bayes nets and beyond * Representing coupled distributions and belief propagation * Applications to relational learning * Graph models in NLP and information extraction BIOS Soumen Chakrabarti (http://www.cse.iitb.ac.in/~soumen/main/bio.html) received a Ph.D. in Computer Science from U.C. Berkeley 1996. He worked at IBM Almaden Research Center from 1996 to 1999. He is now Associate Professor at IIT Bombay. In Spring 2004 he is Visiting Associate Professor at Carnegie-Mellon University. He has published in the WWW, SIGIR, SIGKDD, SIGMOD, VLDB, ICDE, SODA, STOC, SPAA and other conferences as well as Scientific American, IEEE Computer, VLDB and other journals. He holds eight US patents on Web-related inventions. He has served as vice-chair or PC member for WWW, SIGIR, SIGKDD, SIGMOD, VLDB, ICDE, SODA and other conferences, and guest editor or editorial board member for DMKD and TKDE journals. He is also author of a new book on Web Mining. His current research interests include question answering, Web analysis, monitoring and search, mining irregular and relational data, and textual data integration. Christos Faloutsos (http://www-2.cs.cmu.edu/~christos/short-bio.txt). He is a Professor at Carnegie Mellon University. He has received the Presidential Young Investigator Award by the National Science Foundation (1989), four ``best paper'' awards, and several teaching awards. He is a member of the executive committee of SIGKDD; he has published over 120 refereed articles, one monograph, and holds four patents. His research interests include data mining for streams and networks, fractals, indexing methods for spatial and multimedia bases, and data base performance.