CONALD, June 11-13 Conference on Automated Learning and Discovery
Plenary Speakers

Tom Dietterich

Stuart Geman

David Heckerman

Michael Jordan

Daryl Pregibon

Herb Simon

Robert Tibshirani

Learning from Text and the Web

An increasing fraction of the world's information and data is now represented in textual form. For example, the World Wide Web, online news feeds, and other Internet sources contain a tremendous volume of information. However, users seeking information do not have unlimited attention, and therefore methods of summarizing, clustering, categorizing and discovering patterns in the information space are required.

The goal of this workshop is to explore computer methods for automatically extracting information from text and hypertext sources. Examples might include systems that automatically extract descriptions of corporate mergers by monitoring online newsfeeds, or systems that automatically extract addresses and phone numbers from home pages on the web.

Interested participants are encouraged to submit workshop papers describing work in progress, that may not yet have reached the point where journal publication is waranted. Relevant topics include (but are not restricted to) computer methods for information extraction from text and hypertext, automated learning of such methods, automatic text summarization, and text classification. Papers will be distributed in advance of the workshop, and the workshop itself will be organized into brief presentations of papers, along with interactive discussions of key research themes.


Schedule: Friday June 12

11:00 - 11:15 introduction
11:15 - 12:15 paper presentations (15 min each)
Chun-Nan Hsu, Ming-Tzung Dung,ASU Wrapping Semistructured Web Pages with Finite-State Transducers
Jude Shavlik, Tina Eliassi-Rad Building Intelligent Agents for Web-Based Tasks: A Theory-Refinement Approach
Sean Slattery, Mark Craven Learning to Exploit Document Relationships and Structure: The Case for Relational Learning on the Web
Ion Muslea, Steve Minton, Craig Knoblock Wrapper Induction for Semistructured, Web-based Information Sources
12:15 - 12:30 Q/A
2:00 - 2:45 paper presentations (15 min each)
Kurt Bollacker, Steve Lawrence, C. Lee Giles CiteSeer: An Autonomous System for Processing and Organizing Scientific Literature on the Web
John Lafferty, Peter Venable Simultaneous Word and Document Clustering
Thomas Hofmann Learning and Representing Topic, A Hierarchical Mixture Model for Word Occurrences in Document Databases
2:45 - 2:55 Q/A
2:55 - 3:40 paper presentations (15 min each)
Yiming Yang, Tom Pierce, Jaime Carbonell Event Detection
Chid Apte, Fred Damerau, Sholom Weiss Text Mining with Decision Rules and Decision Trees
Authors: Dunja Mladenic, Marko Grobelnik Feature selection for classification based on text hierarchy
3:40 - 3:55 Q/A
3:55 - 4:20 Break
4:20 -5:20 Panel: Jaime Carbonell (coordinator), Ray Mooney (UT Astin), William Cohen (ATT), Jan Pedersen (Varity), James Allan (UMass), and Oren Etzioni (Washinton)

