The National Academy of Sciences report on VR [NAS] recommends an agenda to determine when VR systems are better than desktop displays, and states that without scientific grounding many millions of dollars could be wasted. Ultimately, we would like a predictive model of what tasks and applications merit the expense and difficulty of VR interfaces. In this paper, we take a step towards quantifying immersion, or the sense of "being there." We asked users, half using an HMD and half using a stationary monitor, to search for a target in heavily camouflaged scenes. In any given search, there was a 50/50 chance that the target was some where in the scene. The user's job was to either find the target or claim no target was present. Our major results are:
1) VR users did not find targets in camouflaged scenes faster than traditional users.
2) VR users were substantially faster when no target was present. Traditional users needed to re-search portions of the scene to be confident there was no target.
From these two findings, we infer that the VR users built a better mental frame-of-reference for the space. Our second two conclusions are based on search tasks where the users needed to determine that no target existed in the scene:
3) Users who practiced first in VR positively transferred that experience and improved their performance when using the traditional display.
4) Users who practiced first with the traditional display negatively transferred that experience and performed worse when using VR. This negative transfer may be relevant in applications that use desktop 3D graphics to train users for real-world tasks.
In a practical sense, the only way to demonstrate that VR is worth while is to build real applications that have VR interfaces, and show that users do better on real application tasks. That can be expensive, and new technologies take time to mature. But the computer graphics community has not even achieved a lower standard: showing, even for a simple task, that VR can improve performance. We show improvement in a search task and discuss why a VR interface improved user performance.
To hold the variables constant we used the same HMD as both the head-tracked display and the stationary monitor. In all cases, we rendered the same image to both eyes (i.e. not stereo.) Figure 2 shows the stationary condition, where we bolted the HMD onto a ceiling-mounted post, thus turning the HMD into a stationary monitor. This provided the same resolution, field of view, and image quality in both VR and desktop interfaces. Table 1 gives the values for this particular HMD, the Virtual Research Flight Helmettm [HMD]. We chose a task where the display resolution was unimportant because the targets were large and easily visible. Using a mouse or joystick as the desktop input device would have introduced variables in lag and sampling rate. Therefore, we used the same magnetic 6DOF electromagnetic tracker [Tracker] from the HMD as our hand input device. All we did to create the desktop interface was to seat the user in a chair and take the 6DOF tracker off the user's head and place it in a comfortable device held in user's hands. By holding all other variables constant, we can claim our results are dependent on head-input versus hand-input. For the remainder of this paper, we refer to these groups as "the VR users" and "the desktop users." While we acknowledge that our desktop users are hardly using a conventional configuration, we claim their setup contains the essential components: a stationary monitor and a hand-input device.
We attempted a pilot experiment [Pausch 1993] where 28 users searched for easy-to-find, uncamouflaged, targets at random locations in a virtual room. VR users found the targets 42% faster than desktop users. We feared we had measured how fast users could move the camera, rather than how immersed they were. For example, finding the red `Y' in Figure 3 is a pre-attentive task, where the human visual system can find the target without having to consider the camouflage. In a room surrounding the user, the time to find a red Y might be limited only by how fast one could move the camera. But the time to find an object like the black `Y' in Figure 3 is limited by one's ability to serially examine the items. Searching for a black `K' in Figure 3 is another mentally limited task; there is no `K', and the only way to be certain of that is to systematically search the entire scene. We claim that VR users are much better at systematic searches because they can better remember where they have already looked in the scene that surrounds them.
We placed users in the center of a simple virtual room, 4 meters on each side. The room contained a door and two windows which served as orientation cues. During each search task, the room contained 170 letters arranged on the walls, ceiling and floor. Figure 1 shows a third-person view of the scene, with one wall removed. Letters measured 0.6 meters in length and were easily visible through the display. Users needed to apply some degree of concentration and focused attention to locate the target letter among the similar looking "camouflage" letters. In any given task, we chose target and camouflage letters from either the set "AKMNVWXYZ" (whose primarily visual features are slanted lines), or "EFHILT" (whose primary features are horizontal and vertical lines). We began each search by displaying the target letter in a fixed location over the door, and waiting for the user to say the target letter in order to begin the search. On the user's cue, we rendered the 170 camouflage letters, placing the target letter at a random location. When they found the target letter, users said "there it is," which we verified by watching an external display.
48 users participated in the experiment, 24 using VR and 24 using the desktop configuration created by bolting the HMD into a fixed position. Desktop users controlled their viewpoint with the hand- held "camera" controller shown in Figure 4, which contained the same 6DOF tracker used to track the VR users' heads. We did a large number of informal experiments to design a reasonable hand- held camera controller. Based on that experience, we also removed the roll component of tracking for the hand input device. The end- to-end system latency in all cases was roughly 100 milliseconds, measured by the technique described by Liang [Liang], and we rendered a constant 60 frames per second on an SGI Onyx Reality Engine2.
Practice did not appear to be a factor. We asked users to do practice searches until they were comfortable; we required two practice, searches, and some users did three. We did not count practice searches in the results. Users took roughly 15 minutes to perform the searches, and none appeared fatigued. To measure practice and fatigue, we ran separate control groups with eight users each, who ran double the number of trials on both the VR and desktop interfaces. These users showed no statistically significant differences between their earlier and later trials. Users made essentially no errors. All users were between 18 and 25 years old, mostly undergraduates with no VR experience. Both groups were evenly bal anced by gender. All users said they could easily see the targets. In addition to the 48 users we report, 3 other users began but did not complete this study. All 3 felt slightly nauseous, and they all reported that they were generally prone to motion sickness.
We now consider searching for a target which is not in the scene. Of course, if the user knows the target is not there, then the task is pointless. Therefore, we had users perform a sequence of searches, each of which had a 50% likelihood of containing a target. Users were instructed to either locate the target, or claim no target existed. In this way, we measured the time users needed to confidently search the entire scene.
If the targets are dense, and the users are efficient in their search ing, we can predict how long this will take. Working backwards, consider an efficient user who takes 40 seconds to completely search a scene, with no wasted effort. On average, when a target is present, that user should find it in 20 seconds. Random placement may make the letter appear earlier or later in the search process, but on average the user will find the target halfway through the search. We know how long it takes users to find targets when they are present. If the users searched perfectly, it should take twice that long to search the entire room and confidently conclude the target is not there. Any time over that would imply that the users were re- examining portions of the room that they had already searched. This prediction is shown in Graph 2.
Graph 1 showed the results of users who each performed five searches for targets that were in the room. In fact, these users each performed a sequence of ten searches, where on any given search, the target might or might not have existed. For each of the ten searches, the user was told to either find the target, or announce that it was not there. Users did not know beforehand whether a target would be present in any given search. Graph 3 shows the average time users required to locate a target that was in the room (Graph 1 results), the predicted time to search the entire room (Graph 2 results), and the observed time to search the entire room and conclude that no target existed.
The VR user data is only 1.4% above the prediction for efficient search. This concurs with our personal observations of VR users, who appeared to search the entire room without rescanning. However, desktop users typically examined portions of the room a second time. As shown in Graph 3, the desktop users spent 41% above the time that a perfect search would take.
1) When targets were present, VR did not improve performance. We believe this is because the task was cognitively limited, and the ability to move the camera quickly and/or naturally was not the bottleneck.
2) When there was no target present, VR users concluded this substantially faster than traditional users. We believe that VR users built a better mental frame-of-reference for the space, and avoided redundant searching.
3) Users of traditional displays improved by practicing first with VR. This underscores that something occurred in the user's mental state and could be transferred to using a different interface.
4) VR users who practiced first with traditional displays hurt their performance in VR. This may imply problems with using desktop 3D graphics to train users for real world search tasks.
We believe this is the first formal demonstration that VR can improve search task performance versus a traditional interface. More importantly, the results give us insight into why VR is beneficial. This is a step towards our long term goal of being able to predict which real-world tasks will benefit from having a VR interface.