Computational Photography
Zack Fleischman

Tour Into a Picture

Overview

Demos: Room, Alley, Library, Railroad, Trees, Road, Canal, Columns

Overview

A "Tour Into a Picture" is the interesting process of extracting 3D data from a 2D image based off of our perceptions of depth and perspective within the image. For this project, I only chose examples of images with one point of perspective. What this means is that all parrallel lines in my photos converge to a single point at the horizon. This is the phenomenon that you see when staring down 2 train tracks. If unsure of what I mean, check out the "Railroad" demo. Anyway, the result of treating an image as if it is a 3 dimensional scene is that you can essentially "walk around" in the picture and take "new pictures" of the scene from different angles and viewpoints. The process I used to acheive this is a 3-step process:

Marking 3D data points

Producing the 5 different scene images

Constructing the 3D scene

1.) Marking 3D data points

3D data extraction is not yet completely automatic, so I must choose 8 control points in my image. 4 of these points are shown in the image above as the corners of the rectangle in the center outlying the "backwall" of the image. The other 4 are the points along the edge of the image where the "edges" of the room leave the scene. You can see that if chosen correctly, that the lines created between the edge points and the corresponding corners of the backwall should all converge to a single point on the horizon line. If you look carefully (disregarding the white lines in the center of the backwall), you can make out 5 quadralaterals: The backwall, the ceiling, the rightwall, the leftwall and the floor. Extracting these quadralaterals is the goal of step 2.

2.) Producing the 5 different scene images
Ok so now I have points approximating quadralaterals in the image. One of the problems here though is that if I connect the quadralaterals the way they are, I could end up cutting off a corner of the image. For example, in the image above, while examining the left wall quadralateral formed between the left points of the back wall and the blue point on the upper edge (called Pt.A) and the cyan point on the left edge (called Pt.B), connecting points A and B to complete the quadraleteral will cut off the upper left corner completely. So in order to ensure the whole image is taken, I extend the edge points along their respective lines far enough such that when the quadralaterals are formed, the entire image is accounted for.

Now that I have these 5 quads, I can solve homographies for each of them to rectangles of appropriate dimensions. Then using the homographies actually warp the 5 quads to rectangles and save them as 5 seperate images.
For example, the 5 images extracted for the image above are as follows:

Ceiling:

Left Wall:
Backwall:

Right Wall:

Floor:

Notice that some of the images have large black spots on them. This is because the camera did not perfectly capture a rectangle for each wall, floor or ceiling in a scene. So my algorithm extends the rectangles to all be the same "depth" and fills in the un captured parts with black. Anyway, now that we have our 5 scene images, we can construct the 3D scene!

3.) Constructing the 3D scene
Up to this point, I had used Matlab exclusively to perform all my calculations and image manipulations. This was primarily because Matlab is extremely well suited for image processing. However, now we dive into the realm of 3 Dimensions! So at this point in my implementation I switch gears and start using the 3D friendly DirectX Graphics library in C++ (and using Perl to glue the 2 together). This decision enabled me to relatively easily construct the 3D scene and move the camera in real time through the scene. It also allowed me to perform some fun tricks with scaling that you can see in the Bells & Whistles section.

So the process here is theoretically simple.

Extract the dimensions of the 5 scene images

Construct 5 planes according to the above mentioned dimensions in the shape of a box with one side removed

Texture map the 5 images onto the 5 planes

Point the virtual camera "into" the box

Here's an example of the constructed 3D scene:

Results
I did some fun little matrix math to allow myself to control the camera very much like a character in a first-person videogame and was then able to navigate through the scene in real time to get various "Novel Views" of the image from different perspectives and angles.

Example:

Original Image
Novel View

Make sure to check out the demos and the bells & whistles pages for some really cool images and videos!
fin