Augmenting Reality with Reality

The Idea

The title may sound nonsensical, but it is an accurate representation of this project. It is split into two parts. The first is automatically reconstructing a 3d model of a real life object from a set of images using Voxel Coloring. The second is inserting this model of a real life object into a live video stream in real time using a fiduciary marker.

This may still sound like a strange use of brain and computer resources, but imagine using it in this scenario: You're at Ikea. An interesting looking lamp catches your eye, but you don't want to buy it without knowing what it would look like in your home. So, you take several pictures of it from all angles, and run it through the first stage of the system. You then place a physical marker around your house, and with a webcam, examine what the lamp would look like in its potential future surroundings. You decide it looks ugly, and save \$25.

The Process

The ideal process for this is to capture images of an object from all sides, calculate the location of the camera in each image, carve voxels of the resulting volume on a consistant world coordinate system, polygonalize the resulting voxel cloud, and reproject the virtual real object onto known markers on a video stream.

Capturing images while being able to calculate the 3d position of the camera can be a difficult problem. One way around it is to have the camera in a known fixed position, and simply place the object on a turn table. Of course, that involves unrealistic hardware overhead. A better method is to reuse the marker tracking stuff so common in augmented reality to get the position of the camera while capturing the images. By having a known unique marker or markers a fixed position from the object, the world coordinate system can remain constant. For purposes of accuracy, it is best to calculate the intrisic parameters of the camera in addition to these extrinsic parameters.

Unfortunately, segmenting the object from the background is still a very difficult problem. It requires user input or known background to subtract to remain accurate. User input can be minimized by using Interactive Segmentation techniques like Grab Cut.

Voxel carving involves turning the convex hull of camera volumes into a 3d point cloud, and then calculating which of the points in the volume belong to the object, and which are simply background. To do this, each point is projected onto each camera image location. A color consistancy algorithm is then used to calculate if the point is the same color on multiple cameras. If it is, it is assumed that it is actually a point on the physical object. If it is not, we assume it is background being projected, and mark it as invisible. This is continued until all points are known. For more detail on the actual algorithms, take a look at Generalized Voxel Coloring and Space Carving.

To turn this cloud of colored points into a 3d model, it is then polygonalized using something like Marching Cubes.

The 3d model is then reprojected onto a live video stream containing a known marker or markers in the typical augmented reality way.

The Implementation

Unfortunately, due to time constraints and not knowing much about 3d, my implementation required hacking together a lot of existing libraries in order to finish by the deadline.

The images are captured using the camera module in Pygame, which I wrote over the summer. I used it primarily out of familiarity, any image capturing program would have done. I captured about a dozen images of each object from various angles in a well lit room.

I calculated the intrinsic parameters for my webcam using the Matlab Camera Calibration Toolbox.

The extrinsic parameters used a combination of the Matlab toolbox and Matlab port of ARToolKit. ARToolKit was used to identify the known fiducial trackers and get their corner points. The corner points fed into the Matlab toolbox to get a more accurate corner estimate and then, with the intrinsic parameters, calculate the extrinsic parameters of the camera accurately.

For the sake of convenience, I used TVSeg to segment the object from the background. It is a GPU based program that allows for realtime interactive segmentation.

I then fed this information into Voxel Coloring Framework to do the actual voxel carving. It output a Matlab array of the point cloud and colors. I wrote a script that output the data as a VRML 2.0 scene, with each point represented as a colored cube. This is inefficient and pretty ugly, but it still looks pretty nifty. The objects were then reinserted into live video on trackers using ARToolKit.