CS Dept. and HCI Institute
Carnegie Mellon University
Pittsburgh, PA 15213
1-412-268-7799, 1-412-268-5684
christel@cs.cmu.edu, liz@cs.cmu.edu
Scalable Vector Graphics (SVG) is a language for describing two-dimensional graphics in XML, specifically vector graphic shapes, images, and text. SVG is a new World Wide Web Consortium (W3C) Proposed Recommendation as of July 2001, and this paper describes how SVG provides an ideal framework for presenting manipulable, interactive summarizations into a multimedia information repository. Specifically, we present VIBE and map SVG interfaces into a digital news video library for delivery through web browsers. Pan-and-zoom visualizations of video through SVG are discussed.
H.5.1 [Information Interfaces and Presentation]: Multimedia Information Systems - video. H.3.7 [Information Storage and Retrieval]: Digital Libraries - standards, dissemination, user issues.
Design, Human Factors, Standardization.
SVG, digital video library, surrogate.
The Informedia Project at Carnegie Mellon University has created a multi-terabyte digital video library consisting of thousands of hours of video, segmented into over 50,000 stories, or documents. Since Informedia's inception in 1994, numerous interfaces have been developed and tested for accessing this library, including work on multimedia surrogates which represent a video document in an abbreviated manner [2, 3]. The utility and efficiency of these surrogates and their validation through usability methods have been reported in detail elsewhere [6, 8]. These interfaces are being re-implemented in HTML, XML, XSLT, XPath, and JavaScript, with the intent that use of W3C recommendations maximizes user flexibility in accessing the digital video library through a browser interface [4]. This paper discusses the use of Scalable Vector Graphics (SVG) for visualizing sets of digital news video. The SVG support for vector-based drawing offers a quick way to zoom into areas of interest and show those in greater detail without the need for communicating back to a web server.
HTML, XML, XSLT, and XPath are all W3C Recommendations, with published references available through the W3C [9]. As of July, 2001 SVG is a "Proposed Recommendation" one step away from "Recommendation."
These W3C references are utilized in the implementation of the Informedia digital video library web interface. Extensible Markup Language (XML) is the universal format for structured documents and data on the Web. Extensible Stylesheet Language Transformations (XSLT) is a language for translating XML documents into other XML documents or into HTML. XSLT makes use of the expression language defined by the XML Path Language (XPath) for selecting elements for processing, for conditional processing and for generating text. Finally, SVG presents another way to display data besides HTML. SVG describes two-dimensional graphics in XML, specifically vector graphic shapes, images, and text. Figure 1 shows an architecture that maximizes presentation flexibility by sending XML to the client and relying on client-side script or transformations to convert the XML into HTML or SVG.
Figure 1. Client-side processing of XML, where user interaction can produce multiple HTML and SVG views without Web server involvement.
Multiple XSLT transformations, e.g., one for low bandwidth users, another for high bandwidth users, optional additional ones for specific languages, age groups, etc., allow the video data to be widely disseminated in different forms based on W3C standards. We will focus on different HTML and SVG presentations implementing Informedia surrogates and summaries. We make use of Microsoft's IE 5.5 browser and their XML Parser 3.0, an Internet Explorer add-on released in November 2000 supporting client-side XSLT. We use Adobe's SVG Viewer 2.0 plug-in for Internet Explorer released in April 2001.
For the HTML shown in Figure 2, an "ascending date thumbnail view" XSL transformation file was used to convert the XML, via the following JavaScript where xmldoc holds the XML data:
xslcontents.async = false;
xslcontents.load("http:ascdate.xsl");
try {
res=xmldoc.documentElement.transformNode(
xslcontents.documentElement);
} catch (exception) {
res = HandleTranslationRuntimeError(exception);}
if (res != "") resultHTML.innerHTML = res;
The property "innerHTML" is used to take the output of the transformation and render it as HTML within the document object "resultHTML" on the current HTML page. If a different transformation is used, say for example to order thumbnails by descending size, then a different XSLT file such as descsize.xsl becomes the parameter in this script. The first time a transformation is used it must be downloaded from the server, but after that it is cached in the browser just like other URLs.

Figure 2. Thumbnails for region query (oldest 7 docs. shown).
Unfortunately, there is not the equivalent innerHTML or innerSVG property for SVG as there is for HTML, to read in plain text and convert it into an SVG document dynamically. However, the SVG Document Object Model (DOM) is compatible with the W3C DOM Level 2 Recommendation, and so the SVG document can be built through script calling DOM methods like cloneNode, setAttribute, and appendChild. Hence, for SVG output the client script navigates the XML using XPath and produces the appropriate SVG through SVG DOM methods.
Visualization by Example (VIBE) was developed to emphasize relationships of result documents to query word. For users unfamiliar or uncomfortable with Boolean logic, VIBE allows a visual plot to be manipulated to discover and/or relationships between query entities and documents. Entities are drawn as anchors that can be picked up and dragged, with documents plotted against the entity anchor positions based on the contribution of those entities to the documents' relevance scores. Manipulating anchor points lets the user resolve any ambiguities in the two-dimensional plot [7].
The VIBE plot for the full result set of Figure 2 is shown in Figure 3. Figure 3 is an SVG document, created through script from the same XML that was used via XSLT to generate Figure 2. We extended the traditional use of VIBE to address text queries into the domain of map region searches. For Figure 3, the anchors are gazetteer entries such as cities and countries, from within the query map region shown in Figure 4.
The VIBE visualization conveys semantics through positioning. In Figure 3 the green boxes represent video documents. The position of the box indicates which anchors are found in that document. For example, the absence of any document box at Namibia indicates that stories dealing with Namibia always discuss another entity in this African region. By dragging the anchor for Namibia and other terms and seeing how the green boxes immediately replot to the newly positioned anchors, the user can discover that all Namibian stories also deal with South Africa or South African cities.
Figure 3. VIBE plot for Figure 3, with focus on South Africa and Kenya.
A user can interact with the SVG VIBE plot to emphasize particular relationships. Just as Figure 3 shows that only one document discusses both South Africa and Kenya through yellow highlighting, anchors can be highlighted and the corresponding document points matching the criteria highlighted as well. The one document matching both South Africa and Kenya is colored yellow in the plot of Figure 4. This VIBE interface can of course be enriched to overlay other information dimensions through size, shape, and color, as detailed elsewhere [2, 3].
The Adobe SVG Viewer provides default interface options to zoom in and out, pan, and return to the original view. When zooming in, the vector graphics are cleanly redrawn without pixelation. SVG allows VIBE plots to be generated quickly, rendered cleanly, and manipulated efficiently, while providing standard ways for zooming into and out of regions of interest.
A casual examination of the SVG examples currently available on the Web reveals a number of maps rendered as SVG. Maps are well suited to vector representations, since world scale overviews can be provided, with boundaries and coastlines sharpened as the map is zoomed down to smaller areas of countries, states, and cities. All of the different views are supported by the same SVG document, rather than individual bitmap raster files that need to be separately accessed from a Web server. We make use of gazetteer and geographic information from ESRI [5] for indexing video documents geographically and creating SVG map interfaces for query and display. Maps are used in the following ways for the Informedia web interface:
Figure 4. Map as both query input interface and for feedback on focused areas.
The same result set viewed in different ways in Figures 2 and 3 can be viewed on a map as well, with countries colored if they are dealt with in one or more of the 74 documents returned by this query. Dynamic query sliders can be used to give the user control in setting a more narrow focus [1]. Figure 5 shows a date slider setting the focus to the time period November 2 through November 9, 1999. The United States and predominantly the East African countries stay in focus, along with Yemen and Egypt, indicating a concentration of stories for those regions during this week of interest.
Figure 5. SVG map as visualizer interface.
We are investigating overlaying added detail, such as the Informedia video surrogates of storyboards, thumbnails, and titles, to summary interfaces as shown in Figures 3 and 5. When a SVG summary interface is zoomed to a point where only a few documents are left represented, video surrogates could be displayed, one per document, in the available screen space. A more ambitious goal is to summarize across document sets, so that a surrogate represents one or more video documents.
Future research work might even fold in temporal components like audio or video skims. SVG supports the SMIL Recommendation for synchronized presentation. The first use of animated SVG documents as Informedia web interface elements may be interactive maps: as video plays, its countries, states, and cities highlight during their period of discussion and are unhighlighted when they lose focus. A non-SVG implementation of interactive maps is described elsewhere [3], but SVG again offers the advantage of user-controlled browser display of data without the need to issue requests back to the server. Ideally, richer SVG summarization interfaces can be given temporal structure and be "played" to reveal additional information as temporal features, just as size, color, shape, and location can convey meaning in the current map and VIBE SVG interfaces.
This material is based on work supported by the National Science Foundation (NSF) under Cooperative Agreement No. IRI 9817496. Partial support for this work also comes from NSF's National Science, Mathematics, Engineering, and Technology Education Digital Library Program under grant DUE-0085834.