This paper describes a framework for generating natural language captions to accompany complex graphical presentations of diverse data sets. It describes an implemented system that integrates two robust systems: sagean intelligent graphics presentation system (Roth et al., 1994), and a natural language generator, consisting of a text planner (Young and Moore, 1994; Young, 1997), a micro-planner implementing tactical decisions, and a sentence realizer (Elhad and Robin, 1992).
Graphical presentations can be an effective method for succinctly communicating information about multiple, diverse data attributes and their interrelationships. More than 80% of all business reports these days contain graphic presentations of data (Beattie and Jones, 1994; Schmid, 1983). When a display includes only a small number of data attributes or can make use of conventionalized graphical styles (e.g., spreadsheet graphics), it is easy for a viewer to understand how to interpret it. However, one of the main goals for automatic presentation systems is to allow users to see complex relationships between different attributes and perform problem-solving tasks (e.g., summarizing, finding correlations or groupings, and analyzing trends in data) that involve many data attributes at the same time. A number of research groups have developed systems that can automatically design sophisticated presentations to support a task -- presentations that are both novel and complex (e.g., (Casner 1991; Mackinlay 1986; Roth et al., 1994)). These graphics are often difficult to understand (Shah, 1995). Clearly, such graphics can only be fully effective for supporting analysis tasks if accompanied by explanations designed to enable users to understand how the graphics express the information they contain. Studies have shown that the presentation of captions with pictures can significantly improve both recall and comprehension, compared to either pictures or captions alone (Nugent, 1983; Large et al., 1995; Hegarty and Just, 1993). This suggests that the generation of captions for statistical graphics is an important application area in which natural language generation techniques can make a significant contribution.
In our system, the graphical displays are designed by an automatic presentation component, sage (Roth et al., 1994), and are often complex for several reasons. First, they typically display many data attributes at once. The mapping of many different data attributes to multiple graphical objects in a single display can be difficult to determine from the graphics alone. Second, integrating multiple data attributes in a display requires designing graphics that are unfamiliar to users accustomed to spreadsheet graphics that create simple displays of individual data attributes. While these integrated displays can be very useful once they are explained, it is often difficult to understand them completely without accompanying explanations. Finally, the nature of the data with which we are concerned is inherently abstract and does not have an obvious or natural visual representation. Unlike depictions of real world objects or processes (e.g., radios (Feiner and McKeown 1991), coffee makers (Walster et al., 1993), network diagrams (Marks, 1991)) and visualizations of scientific data (e.g., weather, medical images), visualizations of abstract information lack an obvious physical analog.
As an example of the type of data we are concerned with, consider the graphic shown in Figure 1. This is a sage generated version of the famous graphic drawn by Minard in 1861 depicting Napoleon's march of 1812 (Roth et al., 1994). The graphic relates seven different variables: position (latitude and longitude), size, direction of movement, temperature, and dates and locations of battles. Unless one has seen this graphic (or a very similar one) before, it can be very difficult to understand. Indeed, Minard accompanied the original graphic with a paragraph of text, the first half of which is about how the graphic expresses the information the information it contains.
Consider how the following human-generated caption for the graphic in Figure 1 explains the picture and the underlying data
This map shows march segments and battles from Napoleon's 1812 campaign. The map shows the relation between the geographic locations, temperature and number of troops for each segment. Each line shows the start and end locations for the march segment. Its color shows the temperature, and the thickness shows the number of troops. The temperature was about 100 degrees for the initial segments in the west (the wide, dark red lines on the left), about 60 degrees in later segments in the east (the narrower, light red lines on the right) and about -40 degrees in the last segments, also in the west (the narrowest, dark blue lines on the left). The number of troops was 400,000 in the earliest segments, 100,000 in the later segments, and 10,000 in the last segments. The city and date of each battle is shown by the labels of a yellow diamond, which shows the battle's location.
This caption can help users understand the various attributes and the underlying relations between them--conveyed so succinctly by the graphic.
Although several projects have focused on the question of how such intelligent graphical presentations can be automatically generated (e.g., (Casner 1991, Mackinlay 1986, Roth and Hefley, 1993, Kerpedjiev, 1992)), they have not addressed the problem of generating the accompanying textual explanations. Without this ability, automatic graphical presentation systems will necessarily be limited to generating conventionalized graphics that do not use novel means to express complex relationships among data attributes, or risk generating displays that users will find difficult to fully comprehend and utilize.
In designing our framework for generating natural language captions we have adapted and integrated work in natural language generation (NLG) by a number of researchers--including ourselves--in different sub-areas: text planning, aggregation, centering, computing referring expressions, example generation and linearization. Given the applied nature of our work, in selecting specific NLG techniques we followed a parsimonious approach. For each sub-task we selected the simplest technique that was capable, in conjunction with the behavior of the other sub-tasks, of producing coherent text that could express the propositions we needed to convey.
The generation process starts with content selection. For this process, we use longbow, a domain-independent discourse planner originally developed as part of a project aimed at generating tutorial (Young and Moore, 1994). Using plan operators that encode discourse strategies devised for the task of generating captions, the planner determines what information should be included in the captions (and consequently what should be left out), and how to organize the selected information. Operator constraints analyze the structure of the graphic presentation and the perceptual complexity of the graphical display to enable the planner to select and apply appropriate strategies. The output of the text planning stage is then further processed by a micro-planner, a sequence of modules implementing inter- and intra-clause ordering, aggregation, and referring expression computation. The module performing intra-clause ordering is of special interest because it uses a novel technique based on centering theory. Although, we have devised such a technique specifically for generating captions, it is general and can be applied to any discourse structure. The other three micro-planning modules use standard NLG techniques. Ordering and aggregation are based on text genre (i.e., descriptions of information graphics) and domain specific (e.g., real estate sales or stock market data) heuristics. The referring expression module uses a well-known domain-independent algorithm that given an intended referent builds a description uniquely identifying it. The referential problems in our application did not require more sophisticated referring algorithms; there was also no interaction between computing the referring expressions and inter- and intra-clause ordering. Once micro-planning is complete the FUF/SURGE realization module generates the actual English. The modules of our NLG system are discussed in detail in section 5.
In addition to these NLG techniques, generating textual captions for information graphics requires the following knowledge sources:
We describe these knowledge sources and the discourse strategies in the following three sections.
A sample graphic generated by sage.
To next section.
Paper Sections:To Title page
To Part 2: SAGE: A System for Automatic Graphical Explanations
To Part 3: Discourse Strategies for Generating Captions
To Part 4: Graphical Complexity: The Need for Clarification
To Part 5: Generating Explanatory Captions
To Part 6: System Implementation and Evaluation
To Part 7: Related Work
To Part 8: Conclusions and Future Work
To Appendix A
|[RESEARCH] [SAMPLES] [PAPERS] [PEOPLE] [HOME]|