Temporal dithering of illumination for fast active vision

Active vision techniques use programmable light sources, such as projectors, whose intensities can be controlled over space and time. We present a broad framework for fast active vision using Digital Light Processing (DLP) projectors. The digital micromirror array (DMD) in a DLP projector is capable of switching mirrors “on” and “off” at high speeds (106/s). An off-the-shelf DLP projector, however, effectively operates at much lower rates (30-60Hz) by emitting smaller intensities that are integrated over time by a sensor (eye or camera) to produce the desired brightness value. Our key idea is to exploit this “temporal dithering” of illumination, as observed by a high-speed camera. The dithering encodes each brightness value uniquely and may be used in conjunction with virtually any active vision technique. We apply our approach to five well-known problems: (a) structured light-based range finding, (b) photometric stereo, (c) illumination de-multiplexing, (d) high frequency preserving motion-blur and (e) separation of direct and global scene components, achieving significant speedups in performance. In all our methods, the projector receives a single image as input whereas the camera acquires a sequence of frames.

Publications

"Temporal Dithering of Illumination for Fast Active Vision"
S.G Narasimhan, S. J. Koppal and S. Yamazaki
European Conference on Computer Vision (ECCV),
October 2008.
[PDF]

"Exploiting DLP Illumination Dithering for Reconstruction and Photography of High-speed Scenes"
S. J. Koppal, S. Yamazaki and S.G Narasimhan
International Journal of Computer Vision (IJCV),
January 2012.
[PDF]

Presentation

"Temporal Dithering of Illumination for Fast Active Vision"
Oral Presentation at ECCV 2008: [PPT]

Pictures

	Calibration of Temporal Dithering: A calibration image composed of 5 x 5 pixel blocks each with a different intensity value from 0 to 255 is input to the projector. Each intensity at a pixel in this calibration image is projected onto a flat screen using a unique temporal dithering. The high speed camera observes the projected images at 10 kHz. Notice the significant variation in the images recorded. The plot shows the patterns emitted by the projector for 4 input brightnesses (165, 187, 215, 255), as measured over 100 camera frames. The temporal ditherings corresponding to all the 256 input intensities in the calibration image are collated into a photograph for better visualization of this principle. The temporal dithering is stable and repeatable but varies for each projector-camera system.
	Experimental Setup (Structured light reconstruction): The Photron PCI-1024 high speed camera is placed vertically above the Infocus In38 8-bit DLP projector. A vertical plane is placed behind the scene (statue) for calibration. Our goal is to obtain correspondences between the projector and camera pixels at high speeds. A single image composed of a set of horizontal lines of randomly chosen colors and intensities is input to the projector via a laptop. Synchronization is done using the calibration plane.
	Reconstruction result quality vs Required camera frame-rate: The temporal dithering occurs at 10KHz and only the most dynamic scenes (such as a balloon bursting) are faster than this. Therefore there is a trade-off between reconstruction quality and camera frame rate. This depends on the scene, and is a user choice: for high-speed scenes, we need a high-speed camera but medium-speed scenes require a medium range camera. At the high-end, we can use a high-speed camera with 10KHz frame-rate, to reconstruct scenes such as cloth and paper. In the figure we show the folds and creases of a paper being waved. This scene is reconstructed with intermediate quality results if a medium-range camera, with 500Hz-1000Hz frame rate, is used. These cameras are better suited to reconstruct slower scenes such as facial expressions shown in the figure. Finally, regular cameras in the range of 60-120Hz can be used with temporal dithering to reconstruct scenes that move slowly and static objects such as statues.
	Photometric stereo and Illumination demultiplexing results: A flag is waved under three different DLP projector illumination. A mirror ball gives the direction. We use illumination demultiplexing to separate the photographs into images that appear as if illuminated by each projector individually. We then apply lambertian photometric stereo to obtain the surface normals. Integrating the surface-normals gives depth, and we render the result. In the second result, we place color filters (red, green and blue) over each projector and apply the same demultiplexing algorithm. The projectors are placed close to each other, reducing the difference in incident illumination angle. The demultiplexed images become the RGB channels of a color image. This experiment requires removing the color-wheel from the dlp-projectors.
	Experimental Setup (Direct and Global separation): Direct and global separation using temporal dithering involves taking into account that the projected patterns are are gray-scale and not black-and-white. This requires calibrating each projected pixel, and we need a special set-up that co-located the camera and projector using a beam-splitter as shown.
	Direct and Global separation result: The scene in our experiment consists of a set of white ping-pong balls dropped from a hand. The ping-pong balls are mostly diffuse. Notice that the direct component for each ball looks like the shading on a sphere (with dark edges) and the indirect component includes the interreflections between the balls (notice the bright edges). For the hand, the direct component is only due to reflection by the oils near the skin surface and is dark. The indirect component includes the effect of subsurface scattering and dominates the intensity. The checker pattern “flips” once in approximately 1/100s and hence we achieve separation at 100Hz. Due to finite resolution of the camera and the narrow depth of field of the projector, a 1-pixel blur is seen at the edges of the checker pattern, creating grid-artifacts.

Videos

(Video Result Playlist)

Moving hand (Structured light reconstruction):
Reconstruction of a hand moving. We show four videos. The first is a few frames of the hand at 3000Hz. Note that as the light changes (temporal dithering) the hand does not seem to move at all. In the second video, we show a 30Hz version of our input. The last two videos are reconstructions, at 3000Hz and then at real-time.

Hand with chopstick (Structured light reconstruction):
Reconstruction of a hand moving a chop stick quickly. As before we show four videos. The first is a few frames of the chopstick at 3000Hz. Note that as the light changes (temporal dithering) the hand does not seem to move at all. However, the chopstick does move, since this is a fast moving scene. In the second video, we show a 30Hz version of our input. The last two videos are reconstructions, at 3000Hz and then at real-time.

Face with tongue (Structured light reconstruction):
Reconstruction of a face smiling and sticking his tongue out. As before we show four videos. The first is a few frames of the person at 3000Hz. Note that as the light changes (temporal dithering) the facial expression does not seem to change. However, when the person sticks out his tongue, it moves quickly, even at 3000Hz. In the second video, we show a 30Hz version of our input. The last two videos are reconstructions, at 3000Hz and then at real-time. Noise around the neck area is due to errors in background removal.

Paper (Structured light reconstruction):
Reconstruction of a person waving a piece of paper. As before we show four videos. The first is a few frames of a person holding the paper piece at 3000Hz. Note that as the light changes (temporal dithering) the hands do not appear to move, but the paper edge moves slightly. In the second video, we show a 30Hz version of our input. The last two videos are reconstructions, at 3000Hz and then at real-time. When the edge of the paper is perpendicular to the image plane, reconstruction is impossible and there are holes that appear here.

Flag waving (Photometric stereo reconstruction):
Reconstruction of a flag being waved. On the left we have the link to the input video, with three projectors showing the temporal dithering. We use the calibration plane to demultiplex the images, and the mirror sphere to obtain the directions of the three light sources. We apply lambertian photometric stereo to obtain the surface normals. On the right we have a video of the reconstructed flag, obtained by integrating the surface normals.

Colorization video (using illumination demultiplexing):
Creating a color video using a projector with filters. On the left we have the 30Hz input video of a scene illuminated by three DLP projectors, with color filters on each of these. We also remove the color-wheel from these projectors. After doing the demultiplexing, we concatenate each of these to form an RGB color image as shown on the left. Here we show the color video at 150Hz, since we use 20 frames from the 3000Hz video to do the demultiplexing.