Final Project: Augmented Reality
15-463: Computational Photography Zuye Zheng
The Idea
My project will be on augmented reality and more specifically, I would like to do something like the Playstation 3 game, Eye of Judgment (image below). The premise is that there is a webcam pointed down on a surface, then when a user places a specially marked card, a computer generated object augments the webcam video in the correct location. Also, when the marked card is rotated or translated, it also changes the augmented cg object appropriately. This is sort of like the last project where you have to determine the correct projection of a plane but a little more complex as you must also account for rotations and be able to do some sort of pattern matching to identify which card is being placed down and to actually find the card. Also, if I have time I would also like to be able to guess the correct lighting situation, maybe by analyzing the specially marked card to identify the light falloff.
Here is a mockup of my simplified version for my project.
Base Image Augmented Image
The Plan

Development Environment: Microsoft Visual C#
Hardware: Logitech Orbit AF

Step 1: Get a video feed from a webcam.
Step 2: Isolate and track a "marker" in real time.
Step 3: Overlay a 3D model onto the video feed using the "marker" as a base for the perspective and orientation.

Step 1: Video Feed
This was a lot harder than expected as I initially attempted with DirectShow which quickly became a mess of code. So I then switched to a much simpler method of using the Windows Image Acquisition (WIA) API which, although slower and did not allow for as much control of the video stream, proved much simpler to code. Below is a screen shot of the final application with the webcam feed active. I only captured video at a resolution of 320x240 as it allowed for almost real time performance when the application was multi-threaded without resulting in any diminished accuracy in identification and tracking.

Step 2: Marker Identification and Tracking
My final marker was a white square bounded by a thick black border. This allowed for easier identification due to the thick black border as my initial step in identification was through thresholding as seen in the screen shot below. First thresholding took place by excluding anything with color such that only the grayscale remained and then the grayscale was thresholded to separate the dark blacks from the rest of the scene.

The next part of identification involved the labeling of components or connected pixels of the binary thresholded image from above. I implemented a sequential algorithm that required 2 passes over the image to achieve correct labeling. Below you can see the results of the labeling with the color coded and red box bounded components.

Next was to identify which of these components were actually the markers I was trying to identify. This was done by taking ratios of the pixel volume in the black border and the enclosed white square. This ratio should be around .65 and .85 and should be moderately unique and consistent to the marker as the entire marker is altered by the projection and thus the pixel ratios should remain similar. Furthermore, since we are only measure the pixel volume of the enclosed white square it eliminates the majority of the other components as most components do not contain significant holes.

Lastly, we had to find the corners of each marker which was done by a boundary search so that we could calculate the orientation of each marker and the orientation of the surface they were on. Below you can see the perspective projections of each marker.

Step 3: 3D Overlay
I had originally intended to export everything into DirectX to achieve the 3D model overlay and projections but I think I was a little over ambitious with the project as the technical details of getting the webcam video feed proved much more difficult than expected and doing everything in C# from scratch was a lot more taxing than using Matlab. However, just as a proof of concept I was able to create a pseudo 3D box wireframe overlay for the markers as shown below. Also, as you can see, although not totally accurate it is somewhat convincing and do exhibit some traits of a perspective projection as markers further away result in smaller cubes.

Video Demo

Finally, here is a video screen capture demo of the application in action, working in real time, performing all of the above steps with single and multiple markers.