Task 1: Augmented View Generation

Calibration-free View Augmentation for Semi-Autonomous VSAMs

The goal is to provide remote observers with graphic overlays to augment live video views of battlefield or urban scenes. The overlays depict targets or activities, and must be correctly registered with their surroundings. Thus targets and activities are rendered accurately from the point of view of the live TV camera even if they are not physically visible to the camera (obstacles, terrain, smoke...). Before graphical 3D objects can be correctly registered to live video of a battlefield environment the relationship between the internal reference frames of these objects, the camera and the environment itself must be determined.

The key novel technical aspect of the proposed FRE is the use of relative representations for achieving this goal: rather than representing the graphical objects and the camera in an absolute world coordinate frame (implies extensive calibration), both the camera and the objects are represented in a non-Euclidean reference frame defined by four or more landmarks that can be tracked in the video stream and whose configuration in space is unknown. The geometric properties these relative representations must have, the geometric constructions necessary for achieving accurate video overlays, and the noise sensitivity of the overlays themselves are tightly coupled to the model governing the image formation process (orthographic, weak perspective, perspective). As a result, a good understanding of the issues involved in achieving correct overlays for each of the imaging models becomes crucial due to the wide range of imaging conditions that are possible for VSAM, RPV, or ground observer video input.

To address this problem, base research will be conducted in this FRE both at a theoretical and at a practical level. At the theoretical level, the objective will be to provide the algorithms and tools needed to register video overlays accurately for an increasingly general viewing model, starting from orthographic and weak perspective, moving to projective, and finally to situations where all three models may be applicable at different times in the same video input. At the practical level, the goal will be to implement and evaluate on our two indoor-outdoor mobile platforms the increasingly sophisticated algorithms we develop.

Practically, these techniques must be married to the other two research streams. The tracking stream provides the information from which to construct the non-Euclidean frames and the location of targets in them. Merging this work with the activity recognition stream means developing graphic overlays representing activities. The observer should have a useful dynamic icon to encode the type, location, extent, intensity, etc. of the activity and thus present a more compelling and information-rich display to aid decision making. Ultimately, the observer may access a local data base (``what sort of vehicles are involved in this activity?'') and discover other information about participants in activities.

To close the loop and allow a commander to reposition, re-target, reroute, reconfigure, or otherwise interact with the VSAMs, we shall develop software and hardware technology for interaction with graphic renditions of controllable objects. Trackable wands, three-dimensional mice, keyboard, and other techniques will be available and will be exploited. We shall also develop techniques for the operator to annotate objects in (and areas of) 3-space.

A summary of the approach is the following.

On To Next Task

Up to Main Page

This page is maintained by Mike Van Wie.

Last update: 11/11/96.