Summarization in Lemur


Contents

  1. Overview
  2. Applications
  3. Summarization API

1. Overview

The Lemur summarization library includes an abstract class for general automatic summary generation. This class is also useful for users who want to build a prototype / evaluation system. There are two sample methods implemented: a basic sentence selection algorithm as well as one based on a Maximum Marginal Relevance (MMR) algorithm. Both are designed with generic summarization in mind, although they can be used to produce query-based summaries as well. The goal of this abstract is to provide a reasonable structure for swapping out various summarization algorithms easily that work off of a Lemur index.

2. Applications

BasicSummApp - a simple summarizer

MMRSummApp - Maximal Marginal Relevance summarization

3. Summarization API

The basic Summarizer class provides a generic interface for various summary generation techniques. It describes essentially two ways for each summarizer to be accessed by an application. One method specifies a pre-determined summary length (in number of passages), after which the summarizer determines what those passages are and hands them back to the application at once. The other method is iterative, where the application can continually request subsequent passages from the summarizer. Two implementations are provided to demonstrate how to utilize this interface, BasicSumm which implements a simple sentence selection algorithm, and MMRSumm which implements an MMR algorithm that includes automatic query generation for generic summaries. Passage is a simply container for passages (often sentences or fixed length word sequences) which are the basic unit of each summary. The classes BasicPassage and MMRPassage are tailored implementations for BasicSumm and MMRSum respectively.