4. Graphical Complexity: The Need for Clarification

In the previous section, we discussed three strategies used to organize the information to be presented. As mentioned earlier, it is important to select information about mappings based on either complexity or ambiguity if the caption is to be both succinct and informative. We have identified the following five types of graphical complexities that can make it difficult for a user to understand complex data to grapheme mappings.

4.1 Encoder Complexity

Understanding the encoders used in designing a picture is necessary for users to be able to read data values shown in the picture. Encoders allow the user to map between graphical values and attribute values. Two examples of encoders are the axes (which allow users to map between positional values in the picture and data values along the axes), and graphical keys (these can illustrate mappings between variables such as size and shape and attribute values). Complexities can arise either (i) when an encoder is complex, or (ii) when an encoder mapping uses a scale that is complex.

Consider for instance, Figure 10. Among the encoders used in this picture are the X and Y axes which map positional information to house prices and house addresses respectively. In the chart shown here, the X axis does not have a zero origin (presumably in order to make the differences between the data items clearer by having more screen real estate to display a smaller range of data values). Because of this translation of the origin, it is no longer possible to conclude in this chart that a bar twice as long as another bar encodes a value twice as large (for instance, bars representing houses WALNUT-6343 and VERMONT-637 in 10). Both axis translation and truncation--to compress empty regions in quantitative data--can lead to false inferences. Similar decoding problems can occur with other encoding techniques as well, as when a quantitative attribute is mapped to the area of a circle or non-linear scales are used along axes.

A more complex example of encoding technique complexity can be seen in Figure 1. Saturation and color are combined in a single encoding technique to express temperature. Dark red indicates 100 degrees and dark blue indicates -40 degrees. As the color gets paler (less saturated) it indicates a less extreme temperature. For example, pale red (pink) indicates 65 degrees, while pale blue indicates -5 degrees. White indicates a transition point. Thus both the frame of reference (the color-saturation key) and the technique are potentially complex here. Figure 1 also illustrates range complexity: the user must determine what the transition point is (whether it is the center of the scale, or some special value, such as 32 degrees F). The graphic is not explicit about whether the two ranges on both sides of this special transition point are balanced.

Figure 10
Figure 10

Comprehension can be hindered by encoding technique complexities (e.g., a truncated X-axis).

4.2 Grapheme Complexity

Although the encoder (e.g., positional encoding on an axis) and the mapping (e.g., the scale used along the axis) may both be simple, a grapheme that uses that encoder and mapping may still be difficult for users to interpret. This may occur for a variety of reasons ranging from too many mappings to problems in identifying the mappings. Complexities of this type can arise from:

  • multiple grapheme properties: In some cases, the presentations can include graphemes that have a large number of geometric properties used in mapping data attributes. Consider, for instance, Figure 11. While the encoders in the figure are relatively straightforward, the fact that four different mappings are used here--x position, y position, shape and color--can hinder comprehension.
  • unclear geometric properties: Circular marks and horizontal bars are usually familiar to most readers and sage chooses them whenever possible. However, in some cases the system may have to use graphemes that are not as common. In such cases, the reader has to not only understand the encoder and the mapping technique, but also understand which property of the grapheme is being used in each encoding. Consider, for instance, if a triangular mark is used in a plot chart: in order to interpret its positional property, it is essential to know which of its three vertices (or the center) is used in the mapping.)
  • semantic properties: The third type of grapheme complexity occurs in graphemes that have sub-components. For instance if an icon of a truck were to be used as a grapheme, and different sub-components were used in the mappings (e.g., speed of the truck to the wheel size, cargo type to tank color), the reader must understand not only the various data to grapheme mappings, but also the relationship between the various sub-components.)

Figure 11

Figure 11

Comprehension difficulties can result from complex graphemes with multiple properties being used in the encoding.

4.3 Ambiguous Mapping Complexity

A user's ability to identify the mapping of even simple techniques can be hindered when dissimilar graphemes (or dissimilar properties of a grapheme) are used to map to similar attribute types. Consider for instance, the charts in Figures 12 and 13. The left and right edges of the bar in 12 refer to the selling-price and asking-price of a house in the domain. However, the X axis represents prices in general, and there is no way to distinguish between the two from the figure itself. Similarly, in Figure 13, the two text labels refer to two different prices, but the two attributes cannot be distinguished from one another solely from the figure.

Figure 12
Figure 12

Complexities can arise from ambiguous mappings (a).

4.4 Composition Complexity

When multiple graphemes occur in a space, they can be confusing at first until their relationship to each other are clarified. Compositions can result in clusters of two types:

  • Cooperative Graphemes: For example, consider the chart shown in Figure 14. The mark and label graphemes form an aggregate that must be considered together. In this case, since the label conveying the real estate agency is slightly offset from the position on the X and Y axes, it cannot be interpreted as being related to a particular house and a date of sale on its own. Grapheme composition results in multiple graphemes being displayed as a spatially grouped conceptual unit--these need to be understood as such and interpreted accordingly.
  • Interfering Graphemes: Unfortunately, grapheme composition does not always result in a cluster where the graphemes are distinct and non-occluding. Consider, for instance, the chart shown in 8. The mark indicating the agency estimate of the selling price often overlaps with the interval bar showing the actual asking and selling prices. In some cases, the asking and selling prices are so close that the mark indicating the agency estimate actually occludes the interval bar. Clusters such as this can hinder interpretation and it is important that such mappings be clarified.

Figure 13

Figure 13

Complexities can arise from ambiguous mappings

4.5 Alignment Complexity

As illustrated in Figures 6, 7, and 9, alignment of multiple charts and/or tables can be a useful technique for supporting comparisons, rapid lookups for many attributes of the same object, and for maintaining consistent scales. Whenever an alignment occurs, all but one of the charts become separated from the aligning axis labels and the relation between the aligned axis and the rest of the charts may not be clear.

The complexity assessment module in the system is capable of identifying the graphemes in the display that are complex for any of the five reasons described in this section. It annotates the picture representation generated by sage to indicate the graphemes and their types of complexity. The result of the complexity assessment for the Minard graphic–Figure 1–is shown in Figure 15. As discussed earlier, for instance, the mapping between the attribute temperature and the color of the line is complex for two reasons: (i) encoding complexity, because of the use of color and saturation, and (ii) range complexity, because of the unequal distributions of warm and cold temperatures. Figure 16 gives the complexity assignment for the graphic shown in Figure 6. In this case, the mapping between the attribute asking price and the bar is complex for three reasons: (i) grapheme complexity, since the interval bar is a complex grapheme (ii) ambiguous mapping, since from the graphic, it is not possible to determine whether the attribute is mapped to the left edge or the right edge of the bar, and (iii) composition complexity, since the bar and the mark can overlap and occlude each other (as indicated by the "i" for interfering). The annotated picture representation can then be used as one of the knowledge sources in the NLG system to select and structure information appropriately in generating the captions.

Figure 14
Figure 14
Presentations can have clusters of cooperative graphemes

To next section.


Paper Sections:

     To Title page
     To Part 1: Introduction
     To Part 2: SAGE: A System for Automatic Graphical Explanations
     To Part 3: Discourse Strategies for Generating Captions
     To Part 5: Generating Explanatory Captions
     To Part 6: System Implementation and Evaluation
     To Part 7: Related Work
     To Part 8: Conclusions and Future Work
     To Appendix A
     To Acknowledgements
    [RESEARCH]     [SAMPLES]     [PAPERS]     [PEOPLE]     [HOME]