MIME-Version: 1.0 Server: CERN/3.0 Date: Sunday, 01-Dec-96 19:45:11 GMT Content-Type: text/html Content-Length: 15559 Last-Modified: Saturday, 09-Dec-95 04:57:20 GMT
ORWELL: Removal of Tracked Objects in Digital Video
Alfred Hong, Heji Kim, Lin-hsian Wang
Department of Computer Science
324 Upson Hall
Cornell University
Ithaca, NY 14853-7501 US
http://www.cs.cornell.edu/Info/Projects/zeno/rivl/rivl.html
In addition to OTR, we also extend this work to segment the tracked-object from the background; we can use the resulting segmentation for a variety of video processing effects such as overlaying the tracked-object on top of different sequence. The resulting application is an ideal test bed for experimenting with various OTR and segmentation algorithms. We reconstruct the background and use different techniques for segmentation as illustrated in the diagram below.
Figure 1: Orwell OTR and Segmentation Overview
[Key words: object-tracking, Hausdorff distance, object-removal, segmentation, background reconstruction, image filtering]
Figure 2: Hausdorff tracking algorithm explained
The first two approaches for the background reconstruction are temporal mean filter and the temporal median filter.
The TMEF technique computes the mean pixel value by taking the arithmetic average of the whole frame sequence and assigns this result to the pixels in the background frame. This technique averages out the tracked object in the scene with a possible blurring effect. We implement this filter by averaging each of the RGB values independently.
TMDF builds the background frame by computing the median pixel value from sorting the images in the video sequence. This techniques relies on the assumption that any portion of the tracked object appears in any one particular location in less than half the image frames. We implement this filter by finding which frame has the median using its gray-level value, and then reconstructing the background using the corresponding RGB values.
Both temporal filters are pixel level operations we wrote in RiVL_Genc. RiVL_Genc only allows twenty frames maximum to be entered as input to a function, and because medians of medians is not a median, we could not implement a true median function over the entire video sequence. Instead, we compute the median for several different samples -- each sample composing of twenty frames set at equal intervals, and allow the user to decide over the best result.
Physical space search finds the frame where the bounding box of the tracked object does not overlap with the one in the currently processed frame, that is the part of background needed to replace the object is the one that has not been occupied by the object in the previous frames. Using assumptions of motion continuity, we initially search for the the background for the current image near the frame where we found the background for the previous frame; this way we can avoid a comprehensive search. For the initial frame, we must search the entire sequence for all possible background replacements. Although we prefer the closest frame that contains the background, we also want to find multiple scenes in which the background resides in case that another moving object has moved into the background. It is also possible to partition the bounding box into smaller blocks and search for the background in pieces.
If we assume a single moving object in the sequence, then it is possible to use one frame which has the object removed and the background reconstructed as the background for the entire sequence. However, due to shifting lighting levels, it is desirable to reconstruct the scene for every frame or every block of frames. Figure 3 shows the result of the background covered by the subject's head reconstruced.
Figure 3: Sequence
illustrating object-tracking and background reconstruction
Figure 4: Segmentation methods
Figure 4: Segmentation methods
We used the above video sequence of 200 frames as one of the inputs for our test. Images were recorded as Motion JPEGS with a Sun Microsystems camera using a Parallax board.
Mean
Median
Figure 6: Temporal filter results
The example images above are the results from temporal filters. The reconstructed background image from the mean filter has a slightly visible blurring effect caused by the moving object. As the number of frames in the video increases, this effect will become negligible. We can further process the mean filter with a smoothing function, and then a sharpening function to further rid of the shadowing effect.
Figure 4: Segmentation results for background subtraction
We choose the "eyeball test", a metric commonly used by many vision researchers, to determine the quality of the segmentation. Background subtraction produced the best segmentation with the smoothest edges and the least number of holes within the object. Motion differencing performed the worst since it tends to give an irregular outline of the motion and includes portion of the background which belongs to the object in the previous image but not in the current image: this effect appears as an undesirable white outline around the object in the right pair of images below. The second differencing method shows improved results over regular motion differencing, but is still not as solid as background subtraction. Second differencing has an advantage over background subtraction since reconstructing the background is not necessary. Some sort of post-filtering is necessary for all cases to fill in the holes and smooth the edges.
Figure 5: Segmentation results for second differencing, and motion differencing
We set out with a two-fold goal, one of object-removal and the other of object-segmentation. The overall quality of object-removal depends on the accuracy of the Hausdorff tracker and the fidelity of the reconstructed background. We feel we have accomplished OTR as long as the background does exist. We have had less success with segmentation, and leave much room for future improvements.
We can extend this project along these orthogonal directions: