A Flexible Learning System for Wrapping Tables and Lists

William Cohen, Matthew Hurst, Lee Jensen - WhizBang! Labs

Abstract

A program that makes an existing website look like a database is called a wrapper. Wrapper learning is the problem of learning website wrappers from examples. We present a wrapper-learning system called WL2 that can exploit several different representations of a document. Examples of such different representations include DOM-level and token-level representations, as well as two-dimensional geometric views of the rendered page (for tabular data) and representations of the visual appearance of text as it will be rendered. Additionally, the learning system is modular, and can be easily adapted to new domains and tasks. The learning system described is part of an ``industrial-strength'' wrapper management system that is in active use at WhizBang Labs. Controlled experiments show that the learner has broader coverage and a faster learning rate than earlier wrapper-learning systems.


Back to the Main Page

Charles Rosenberg
Last modified: Tue Mar 12 18:00:50 EST 2002