|
Active vision techniques use programmable light sources, such as projectors, whose
intensities can be controlled over space and time. We present a broad framework
for fast active vision using Digital Light Processing (DLP) projectors. The
digital micromirror array (DMD) in a DLP projector is capable of switching
mirrors “on” and “off” at high speeds (106/s).
An off-the-shelf DLP
projector, however, effectively operates at much lower rates (30-60Hz) by
emitting smaller intensities that are integrated over time by a sensor (eye
or camera) to produce the desired brightness value. Our key idea is to
exploit this “temporal dithering” of illumination, as observed
by a high-speed camera. The dithering encodes
each brightness value uniquely and may be used in conjunction with
virtually any active vision technique. We apply our approach to five
well-known problems: (a) structured light-based range finding, (b) photometric
stereo, (c) illumination de-multiplexing, (d) high frequency preserving
motion-blur and (e) separation of direct and global scene components,
achieving significant speedups in performance. In all our methods, the
projector receives a single image as input whereas the camera acquires a
sequence of frames.
|
Publications
"Temporal Dithering of Illumination for Fast Active Vision"
S.G Narasimhan, S. J. Koppal and S. Yamazaki
European Conference on Computer Vision (ECCV),
October 2008.
[PDF]
"Exploiting DLP Illumination Dithering for Reconstruction and
Photography of High-speed Scenes"
S. J. Koppal, S. Yamazaki and S.G Narasimhan
International Journal of Computer Vision (IJCV),
January 2012.
[PDF]
|
Presentation
"Temporal Dithering of Illumination for Fast Active Vision"
Oral Presentation at ECCV 2008:
[PPT]
|
Pictures
|
Calibration of Temporal Dithering:
A calibration image composed of 5 x 5 pixel blocks each with
a different intensity value from 0 to 255 is input to the projector. Each
intensity at a pixel in this calibration image is projected onto a flat
screen using a unique temporal dithering. The high speed camera observes
the projected images at 10 kHz. Notice the significant variation in the
images recorded. The plot shows the patterns emitted by the projector for
4 input brightnesses (165, 187, 215, 255), as measured over 100 camera
frames. The temporal ditherings corresponding to all the 256 input
intensities in the calibration image are collated into a photograph for
better visualization of this principle. The temporal dithering is stable
and repeatable but varies for each projector-camera system.
|
|
Experimental Setup
(Structured light reconstruction):
The Photron PCI-1024 high speed camera is placed vertically
above the Infocus In38 8-bit DLP
projector. A vertical plane is placed behind the scene (statue) for
calibration. Our goal is to obtain correspondences
between the projector and camera pixels at high speeds. A single image
composed of a set of horizontal lines of randomly chosen
colors and intensities is input to the projector via a laptop.
Synchronization is done using the calibration plane.
|
|
Reconstruction result quality
vs Required camera frame-rate:
The temporal dithering occurs at 10KHz and only the most dynamic
scenes (such as a balloon bursting) are faster than this. Therefore there
is a trade-off between reconstruction quality and camera frame rate. This
depends on the scene, and is a user choice: for high-speed scenes, we
need a high-speed camera but medium-speed scenes require a medium range
camera. At the high-end, we can use a high-speed camera with 10KHz
frame-rate, to reconstruct scenes such as cloth and paper. In the figure
we show the folds and creases of a paper being waved.
This scene is reconstructed with
intermediate quality results if a medium-range camera, with 500Hz-1000Hz
frame rate, is used. These cameras are better suited to reconstruct
slower scenes such as facial expressions shown in the figure. Finally,
regular cameras in the range of 60-120Hz can be used with temporal
dithering to reconstruct scenes that move slowly and static objects such
as statues.
|
|
Photometric stereo and
Illumination demultiplexing results:
A flag is waved under three different DLP projector illumination. A
mirror ball gives the direction. We use illumination demultiplexing to
separate the photographs into images that appear as if illuminated by
each projector individually. We then apply lambertian photometric stereo
to obtain the surface normals. Integrating the surface-normals gives
depth, and we render the result.
In the second result, we place color filters (red, green and blue)
over each projector and apply the same demultiplexing algorithm. The
projectors are placed close to each other, reducing the difference in
incident illumination angle. The demultiplexed images become the RGB
channels of a color image. This experiment requires removing the color-wheel
from the dlp-projectors.
|
|
Experimental Setup (Direct
and Global separation):
Direct and global separation using temporal dithering involves taking into
account that the projected patterns are are gray-scale and not
black-and-white. This requires calibrating each projected pixel, and we
need a special set-up that co-located the camera and projector using a
beam-splitter as shown.
|
|
Direct and Global separation result:
The scene in our experiment
consists of a set of white ping-pong balls dropped from a hand. The
ping-pong balls are mostly diffuse. Notice that the direct component for
each ball looks like the shading on a sphere (with dark edges) and the
indirect component includes the interreflections between the balls
(notice the bright edges). For the hand, the direct component is only due
to reflection by the oils near the skin surface and is dark. The indirect
component includes the effect of subsurface scattering and dominates the
intensity. The checker pattern “flips” once in approximately
1/100s and hence we achieve separation at 100Hz.
Due to finite resolution of the camera and the narrow depth of field of
the projector, a 1-pixel blur is seen at the edges of the checker
pattern, creating grid-artifacts.
|
|
Videos
(Video Result Playlist)
|
Moving
hand (Structured light reconstruction):
Reconstruction
of a hand moving. We show four videos. The first is a few frames of the
hand at 3000Hz. Note that as the light changes (temporal dithering) the hand
does not seem to move at all. In the second video, we show a 30Hz version
of our input. The last two videos are reconstructions, at 3000Hz and then
at real-time.
|
|
Hand
with chopstick (Structured light reconstruction):
Reconstruction
of a hand moving a chop stick quickly. As before we show four videos. The
first is a few frames of the chopstick at 3000Hz. Note that as the light
changes (temporal dithering) the hand does not seem to move at all.
However, the chopstick does move, since this is a fast moving scene. In
the second video, we show a 30Hz version of our input. The last two
videos are reconstructions, at 3000Hz and then at real-time.
|
|
Face
with tongue (Structured light reconstruction):
Reconstruction of a face smiling and sticking his tongue out. As before
we show four videos. The first is a few frames of the person at 3000Hz.
Note that as the light changes (temporal dithering) the facial expression
does not seem to change. However, when the person sticks out his tongue,
it moves quickly, even at 3000Hz. In the second video, we show a 30Hz
version of our input. The last two videos are reconstructions, at 3000Hz
and then at real-time. Noise around the neck area is due to errors in
background removal.
|
|
Paper
(Structured light reconstruction):
Reconstruction of a person waving a piece of paper. As before we show
four videos. The first is a few frames of a person holding the paper
piece at 3000Hz. Note that as the light changes (temporal dithering) the
hands do not appear to move, but the paper edge moves slightly. In the
second video, we show a 30Hz version of our input. The last two videos
are reconstructions, at 3000Hz and then at real-time. When the edge of
the paper is perpendicular to the image plane, reconstruction is
impossible and there are holes that appear here.
|
|
Flag
waving (Photometric stereo reconstruction):
Reconstruction of a flag being waved. On the left we have the link to the
input video, with three projectors showing the temporal dithering. We use
the calibration plane to demultiplex the images, and the mirror sphere to
obtain the directions of the three light sources. We apply lambertian
photometric stereo to obtain the surface normals. On the right we have a
video of the reconstructed flag, obtained by integrating the surface
normals.
|
|
Colorization
video (using illumination demultiplexing):
Creating a color video using a projector with filters. On the left we have
the 30Hz input video of a scene illuminated by three DLP projectors, with
color filters on each of these. We also remove the color-wheel from these projectors.
After doing the demultiplexing, we
concatenate each of these to form an RGB color image as shown on the
left. Here we show the color video at 150Hz, since we use 20 frames from
the 3000Hz video to do the demultiplexing.
|
|
|