Conference on Automated Learning and Discovery

Plenary Speakers

Learning from Text and the Web

An increasing fraction of the world's information and data is now represented in textual form. For example, the World Wide Web, online news feeds, and other Internet sources contain a tremendous volume of information. However, users seeking information do not have unlimited attention, and therefore methods of summarizing, clustering, categorizing and discovering patterns in the information space are required.

The goal of this workshop is to explore computer methods for automatically extracting information from text and hypertext sources. Examples might include systems that automatically extract descriptions of corporate mergers by monitoring online newsfeeds, or systems that automatically extract addresses and phone numbers from home pages on the web.

Interested participants are encouraged to submit workshop papers describing work in progress, that may not yet have reached the point where journal publication is waranted. Relevant topics include (but are not restricted to) computer methods for information extraction from text and hypertext, automated learning of such methods, automatic text summarization, and text classification. Papers will be distributed in advance of the workshop, and the workshop itself will be organized into brief presentations of papers, along with interactive discussions of key research themes.

Organizers:

Yiming Yang (chair)

Jaime Carbonell

Steve Fienberg

Tom Mitchell

Schedule: Friday June 12

11:00 - 11:15 introduction

11:15 - 12:15 paper presentations (15 min each)

Chun-Nan Hsu, Ming-Tzung Dung,ASU Wrapping Semistructured Web Pages with Finite-State Transducers

Jude Shavlik, Tina Eliassi-Rad Building Intelligent Agents for Web-Based Tasks: A Theory-Refinement Approach

Sean Slattery, Mark Craven Learning to Exploit Document Relationships and Structure: The Case for Relational Learning on the Web

Ion Muslea, Steve Minton, Craig Knoblock Wrapper Induction for Semistructured, Web-based Information Sources

12:15 - 12:30 Q/A

2:00 - 2:45 paper presentations (15 min each)

Kurt Bollacker, Steve Lawrence, C. Lee Giles CiteSeer: An Autonomous System for Processing and Organizing Scientific Literature on the Web

John Lafferty, Peter Venable Simultaneous Word and Document Clustering

Thomas Hofmann Learning and Representing Topic, A Hierarchical Mixture Model for Word Occurrences in Document Databases

2:45 - 2:55 Q/A

2:55 - 3:40 paper presentations (15 min each)

Yiming Yang, Tom Pierce, Jaime Carbonell Event Detection

Chid Apte, Fred Damerau, Sholom Weiss Text Mining with Decision Rules and Decision Trees

Authors: Dunja Mladenic, Marko Grobelnik Feature selection for classification based on text hierarchy

3:40 - 3:55 Q/A

3:55 - 4:20 Break

4:20 -5:20 Panel: Jaime Carbonell (coordinator), Ray Mooney (UT Astin), William Cohen (ATT), Jan Pedersen (Varity), James Allan (UMass), and Oren Etzioni (Washinton)

More Information

Contact conald@cs.cmu.edu for more information

The conference is sponsored by CMU's newly created Center for Automated Learning and Discovery.