=============================== Learning to Summarize Web Pages =============================== Adam Berger This talk introduces Ocelot, a prototype system for automatically generating the "gist" of a web page by summarizing it. Although most text summarization research to date has focused on the task of news articles, web pages are of quite a different flavor. Instead of a coherent text with a well-defined discourse structure, they are often a helter-skelter jumble of phrases, links, graphics and formatting commands. What text there is in a web page is often disjointed, providing little foothold for extractive summarization techniques, which attempt to generate a summary of a document by excerpting a contiguous span of text from it. We build on recent work in non-extractive summarization work, producing the gist of a web page by "translating" it into a more concise representation rather than attempting to extract a text span verbatim. Ocelot uses statistical models to guide its selection and ordering of terms in the gists it produces. The talk will describe a technique for learning these models automatically from a collection of human-summarized web pages.