Virtualized Reality dynamic event modeling -- How it all works

Virtualized RealityTM dynamic event modeling:
How it all works

As nearly everyone knows, television can give us a view into another part of the real world, such as a sporting event. This capability is great, but each viewer gets the same view, whether they want it or not, and none of the viewers has any power to control that viewpoint. In contrast, virtual reality (VR) immerses viewers in virtual worlds even though their bodies are still in the real world. Each viewer moves independently and freely throughout this world, allowing everyone to see events from their own viewpoint. VR, though, has focused on creating purely virtual worlds that do not correspond to anything in the real world. In addition, typical virtual worlds look too artificial to convince viewers that they are in another part of the real world.

Our work combines the technology behind television, VR, and Computer Vision (also see the Computer Vision Handbook) to create virtual models of real-world events -- what we call Virtualized RealityTM dynamic event models. These models can then be used to construct views of the real events from nearly any viewpoint, without interfering with the events! Like VR, Virtualized Reality dynamic event models allow viewers to see whatever they want to, but unlike VR, this "other world" is actually a real event, and the views of this event are photorealistic.

Behind the Scenes

So how does all this work? Virtual models contain two basic types of information about the worlds they represent: shape and color. Shape in purely virtual worlds can be designed with CAD tools, since they do not have to correspond to any real shapes. To virtually model a real event, though, CAD modeling is quite difficult because the model must exactly match the real object. Similarly, color can be added to purely virtual scenes by "painting" each object (using virtual paint, of course), and then adding lights to the scene. For virtual reproductions of real events, the color must match the real object and lighting color, which again would be difficult to do with VR modeling tools. Clearly, these traditional VR modeling tools are not too useful for building models of real events.

An alternative approach to modeling is to mimic the way humans observe the world: with pictures! Many times we are allowed to look at a scene or event, but we are not allowed to touch it. Even so, we can still figure out the basic shape and color of the scenes, and of course it all looks real because it is real. To make the computer be able to do the same thing, the computer must have eyes -- this is a task called computer vision. These eyes usually come in one of two forms: those that capture still images, like a 35mm camera, and those that capture motion, like video cameras. (Surprisingly, a video camera is just a still-image camera that takes pictures so quickly that our eyes think that they see motion.) In our case we want to capture time-varying events, so we use video cameras.

Having lots of videos, by itself, is not enough to model real scenes, though. (Actually, if you had tons of videos, you can make it look like you have a model, but that's left for someone else to explain.) The images directly show the color of the world, so we can use them directly as models of scene appearance. The images also include information about the shape of the objects in the world, but not directly, so we must somehow recover the shape from the images.

The shape model is constructed in two stages. First, we apply a computer vision technique called stereo to determine the shapes of the objects visible in each image. Stereo tries to find corresponding features in a set of images, and then triangulates these correspondences to determine how far away the 3D feature is. For example, suppose I take two pictures of your face from slightly different viewpoints. Because both pictures contain an image of your face, I can find your left eye in both pictures. Trying to find these matching points is called the correspondence problem. If I have found the corresponding points, and if I know where the cameras were when they took the pictures, I can compute how far away your eye was from each camera. This computation is called triangulation. (Incidentally, the process of determining the positions of the cameras is called calibration. Before we actually capture all our images, we perform camera calibration to determine precisely where the cameras are.)

Stereo only estimates the shapes of objects visible in each image. The second stage is to integrate these image-based shape models into a single, complete shape model of the entire scene. Continuing our example from before, the images of your face can be used to determine the shape of your face, but not of the rest of your head. If we have more cameras behind your head, we can use their images to tell us the shape of the back of your head, but not your face. By integrating, or merging, the shape of your face and the shape of the back of your head, we can create a complete shape model of your head. In a similar way, we integrate the shape information from many cameras to create a complete shape model of real events.

In Front of the Camera

While all of this modeling is interesting from a technical perspective, the really exciting part is to use the models to create new views of the real events -- even in places that a real camera could not survive! These views are computed much like views are synthesized in virtual reality, but because our models are derived from real images, the models look much more realistic than typical VR worlds. The color information captured in the real camera images is projected on top of the shape model. The shape model is used to determine what is visible and what is hidden from a particular viewpoint. Visible parts of the world are then projected (a process called rendering) into a virtual camera to create a new image of the real event.

I Wish I Could See...

The dreamers among us frequently ask for the impossible: I wish I could see that play from the eyes of the quarterback; I wish I could see what the referee (supposedly!) saw; I even wish I could see what the ball saw! With Virtualized Reality modeling, and it's ability to create any view of the event, it is now possible to let them have their wish -- and for the rest of us to start making our own requests! In fact, each viewer can have their own view of the world, and can change the view as the event continues. In essence, each viewer has a virtual camera under their control. Because these cameras are entirely virtual, they can go into places that may have disrupted the real event -- like standing in the middle of a basketball court. In addition, the virtual camera can survive in areas that a real camera may not have -- on a baseball, for example.

Virtualized Reality dynamic event modeling has great significance in entertainment of many types, not just sporting events. In the movie industry, Virtualized Reality models can be used to create many special effects that currently are very difficult or totally impossible to simulate. Movies themselves could be replaced by an entirely different medium that allows the viewers to enter into the scene and, unlike the "3D" movies we occasionally see now, viewers could independently move around and see whatever they want to.

Another major use of Virtualized Reality modeling is in training. Rather than getting a typical instructional video, you could get a complete model of the same event, so that you could study the lesson from any viewpoint, even from the instructor's. In medical school, for example, a student might be trying to learn a surgical operation. By virtually reproducing a real operation, students could review the procedure as it was actually performed, not just the way it is described in a textbook. They could also walk around the modeled operating room and see the procedure from any viewpoint, even the real surgeon's, without interfering with the real operation.

Return to the Virtualized RealityTM Home Page


© Peter Rander