Real-Time Multi-Camera Tracking

Robert T. Collins, Omead Amidi, Takeo Kanade
Robotics Institute, Carnegie Mellon University


Darpa Next Generation Internet (NGI)

Under the Darpa NGI project, we developed real-time tracking algorithms to guide a set of pan/tilt/zoom cameras to simultaneously foveate on a moving person. Each camera implements mean-shift tracking based on normalized color histograms.

Some initial results of single camera pan/tilt tracking are shown below. The active sensor used is a Sony EVI D30 teleconference camera. The camera tracks me around my office, through a variety of scales, poses, and lighting conditions. Click on the picture to get a larger version to inspect.



A video of the tracking experiment is shown below as an AVI file. The servo control loop was very simple (bang-bang control), and I had the camera speed set low to avoid oscillations. This is why the camera consistently lags behind be while I am moving. However, when I stop moving for a bit, the camera catches up and centers on me.


AVI file (8.8 MB)




We ported this algorithm to a three-camera system in Takeo's new "Virtualized Reality" (tm) room. The pan/tilt and camera hardware are the same that we used for the EyeVision project: a custom Mitsubishi pan/tilt head (formed by lopping of several segments from a robot arm!) and a Sony 3-CCD color camera. You can see a movie of the camera tracking me here. The computer screen is showing some intermediate results of the image processing. The upper left corner is the input image, the upper right is a normalized color image, made by dividing through by intensity and quantizing to 4 bits per channel. The lower left corner is a likelihood image formed by comparing each pixel's color with a color histogram describing my shirt. This is the image that the mean-shift algorithm runs on.


AVI file (2.5 MB)


Here are three movies taken simultaneously by three active cameras, each run real-time mean-shift tracking, keyed on my shirt color in normalized color space. Using normalized color makes the system insensitive to varying illumination as I walk around the room (and get closer to the lights when I go up the ladder). Unfortunately, this forces me to wear colored shirts to work each day ... black, white or grey shirts would be indistinguishable from the white walls in normalized color space! Each movie below is an AVI file of roughly 3.5 MB.



If you want more information, download our most recent paper about this system:

  • R.Collins, O.Amidi and T.Kanade, "An Active Camera System for Acquiring Multi-View Video," to appear International Conference on Image Processing (ICIP), Rochester, NY, September 2002.