A User Study Comparing Head-Mounted and Stationary Displays

Randy Pausch & M. Anne Shackelford (Computer Science Department)

Dennis Proffitt (Psychology Department)

University of Virginia

Charlottesville, VA 22903

804/982-2211

pausch@Virginia.EDU

Table of Contents

Abstract

Head-mounted displays, as popularized by virtual reality systems, offer the opportunity to immerse a user in a synthetically generated environment. While there is much anecdotal evidence that this is a qualitative jump in the user interface, there is little quantitative data to establish that immersion improves task performance. This paper presents the results of a user study: users performing a generic search task decrease task performance time by roughly half (42 percent reduction) when they change from a stationary display to a head-mounted display with identical properties (resolution, field-of-view, etc.). A second result is that users who practice with the head-mounted display reduce task completion time by 23 percent in later trials with the stationary display, suggesting a transfer effect.

Introduction

The rapidly emerging field of virtual environments, sometimes referred to as virtual reality, provides the possibility of a qualitative change in the way humans interact with computers. The combination of head-mounted displays and spatial input devices, such as gloves, gives a strong illusion of placing the user inside a computer-generated scene, as opposed to outside that scene, viewing the scene through the window of a traditional, stationary desktop display.

While many people have speculated that virtual environments are a more efficient way of interacting with three-dimensional computer generated scenes, few empirical studies have been performed. Most existing work has been point design, where a proof-of-concept application is constructed. These applications typically show the limitations of the current technology, and are not rigorously evaluated. Our approach is to systematically subdivide the design space for potential virtual environment applications, and attempt to discern for which types of tasks these new interaction devices are most helpful. This knowledge should be of general use to designers of virtual environment applications. In this way, our general approach is similar to early work measuring the effectiveness of the mouse [Card 87] as a generic pointing device.

As our first piece of work in this area, we have set out to measure the effectiveness of a head-mounted display in a generic searching task: locating twenty targets in a synthetic room. The central user operation in this task is controlling the orientation of a synthetic camera which controls the viewpoint within a computer-generated scene. The fundamental question is whether searching can be performed more quickly by using head motion to control the synthetic camera, or by using a traditional, fixed-location monitor and a hand-held input device to control the camera.

We attempted to design a simple task in order to avoid potential confounding factors. In this task, no input devices were used beyond those to control the camera orientation, and the only important metric was task completion time. Our first challenge was to select the hardware configuration that we would compare with the head-mounted display and attached tracker. We rejected the familiar configuration of desktop monitor and mouse on the grounds that it would introduce too many confounds (variables) to the study. In the interest of keeping as many things constant as possible, we decided to use the head-mounted display as the stationary display device, placing the orientation tracker in the user's hand, rather than attaching it to his or her head.

The study produced two major results. First, users controlling the camera via head tracking completed the search task almost twice as fast (task reduction time of 42%). Second, we observed a training phenomenon: practice with the head tracking significantly improved performance of users who subsequently used hand tracking to control the camera. We saw no carryover in the opposite direction: practice with hand tracking did not improve performance using head tracking.

Related Work

The concept a synthetic environment presented via a head-mounted display dates back to the 1960s [Sutherland 68]. However, very little work has been done to compare head-mounted displays to conventional, stationary monitors. Military research has been done on the effectiveness of head-tracking for targeting applications [Wells 87], and various techniques have been described and studies have been performed on the effectiveness of various devices for simultaneously controlling multiple degrees of freedom in input [Driver 90, Mackinlay 90, McKenna 92, Ware 90, Ware 91].

User studies have been performed to determine the effectiveness of various real-time graphics techniques for depth cuing and the like [Liu 91, Liu 92]. The work we are aware of that is closest in spirit to this work is J. Chung's research on the task of targeting radiation treatment beams [Chung 92]. Another interesting study is the work by Tom Piantanida and his group at SRI, who are examining trade-offs in field of view and resolution in head-mounted displays [Piantanida 92].

Basic Design of the Study

Our primary objective was to design the simplest possible study to discriminate between a head-tracked and non-head-tracked camera control searching task. We wanted to control as many variables as possible: had we merely compared existing head-mounted display technology with existing desktop technology, we would have introduced massive differences in resolution and field of view: typical head-mounted displays provide 185 by 139 color pixels across an 80 degree wide by 60 degree high field-of-view [VPL 91]; standard desktop monitors provide 1280 by 1024 pixels across a smaller field of view.

In addition to the display, the input devices used to control the virtual camera introduced another unwanted variable. Six degree-of-freedom trackers, such as the Polhemus Isotrak [Polhemus], have significant lag times [Liang 91, Adelstein], low accuracy, and high noise when compared to desktop devices such as the mouse or Spaceballtm [Spaceball]. Therefore, we designed the study to mechanically eliminate all these variables by using identical hardware and software for both the head tracked and hand tracked conditions. Figure 1 shows the configuration of our head-mounted display subjects. They each wore a VPL Eyephonetm and used a Polhemus 3spacetm tracker. The graphics software, running on a pair of Silicon Graphics VGXtm machines, presented an environment where the subjects were in the middle of a room, 6 meters long, 6 meters wide, and 3.4 meters tall. Inside this room, there were 20 targets, each a two digit number roughly 0.3 meters tall. The targets were large enough to be easily viewed in the display.

The hand tracked group used the same physical display (the VPL Eyephone), but held a fixed location in space, much like the equipment one "sticks one's face up to" at an eye-doctor's office. In this way, we make the VPL Eyephone a stationary monitor, just as a desktop display would be relative to the user. Figure 1 shows a rigid ceiling mount for demonstrational purposes; in the study we used human helpers who held the display with their hands. The tracking device was removed from the subject's head, and manipulated by the subject with either one or both hands. Thus, the only difference between the two groups was whether the virtual camera was controlled by muscles in the head, or muscles in the hand. Each users's base position within the graphics environment was fixed. Although we used all six degrees of freedom from the physical tracker, the (x, y, z) translations were small (less than one foot); the basic task was controlling the orientation of the camera.

Each trial required the subject to identify twenty targets in the room; timing began as soon as the graphics were initially presented. When the subject located a two digit number, he or she would call out the number, the experimenter would type the number, and that target would disappear from the display. The frame rate was roughly ten frames per second. As an orientation cue, we placed a unique graphics object (filing cabinet, plant, bookcase, and chair) in each corner of the virtual room.

Pilot Trials

We ran pilot trials with eight subjects, balanced by gender. Four subjects used head tracking, and four using hand tracking. All subjects were videotaped and interviewed afterwards. The major outcome of the pilot trials was modification of the instructions given the subjects. We explicitly made it clear that they did not need to wait for a target to disappear before pursuing and even voicing the next target. We found that keying in the numbers as they were called out was neither a bottleneck nor a source of significant errors. Also, we made it explicit to the hand tracked subjects that they could use both hands to manipulate the tracker. Based on difficulties some hand tracked subjects had in pilot trials, we attached the polhemus tracker to a flashlight to make the tracker easier to hold and manipulate. Based on observations during the pilot trials, we also modified the set of potential two digit numbers to avoid similar sounding pairs, such as seventy and seventeen.

Experimental Details

Our subjects were volunteers, 14 males and 14 females, all in an introductory undergraduate computer science course. They were rewarded with a nominal number of points on a course assignment, and were a highly motivated subject pool, as this was their first exposure to high-end computer graphics. All subjects understood that it was important to the study that they complete the tasks as quickly as possible.

Balanced by gender, we randomly divided the subjects into the head tracked (virtual environment) group and the hand tracked group. Each subject performed ten trials, where each trial consisted of locating twenty targets. The subjects were informed for each trial when they had located all the targets. In advance of the experiment, we generated sets of random target locations within the room. We generated enough sets to avoid fears that a given random placement was somehow skewed, and then used those data sets for both groups of subjects. In the psychology community, this is referred to as yoking the input sets. To avoid complications arising from targets which were difficult to read, we oriented the targets so they were always upright and perpendicular to the subjects' line of sight; since the subject was effectively stationary, this orientation could be performed once at the time of data set generation. We also constrained targets to be at least 0.6 meters from the room's surfaces, and to not fall within a 1.0 meter diameter cylinder centered about the subject. Objects were also kept a minimum of 1.5 meters from each other, to avoid problems with visual overlap.

Subjects were allowed to practice performing the task until they were comfortable, and then ten measured trials began. Trials one through three were used as practice and are not included in our analysis. After completing their trials, subjects rested briefly and then switched modes and performed another ten trials: head tracked subjects tried hand tracking, and hand tracked subjects tried head tracking. This led us to discover a training effect which we discuss in the results section. At the end of the session, which typically took thirty to sixty minutes per subject, we asked the subject a series of questions about the experiment.

Results

Our primary result is that the head tracked subjects performed the task almost twice as fast as the hand tracked subjects, with an average of 1.5 seconds per target (± 0.1), versus an average of 2.6 (± 0.2) for the hand tracked subjects. Using head tracking reduced task time by 42% over hand tracking. As graphically shown in Figure 2, the variance was also substantially lower for head tracking, and with the exception of one outlier in each group, the slowest head tracked subject was faster than the fastest hand tracked subject. (Averages are taken after discarding the first three trials for each subject as practice trials.)

Our second major result is a training effect: subjects performed the task 23% faster using hand tracking if they had first used head tracking. Experience with hand tracking, however, did not improve subsequent performance with head tracking.

Although we eliminated the first three of ten trials for each subject in our averages, Figure 3 shows that the "practice effect" ceased approximately after the first trial, and that for the head tracked subjects, effectively no learning is necessary, bolstering the argument that head tracking is more natural for controlling the camera in this simple task.

Some previous work has addressed differences between male and female subjects in performing spatially-oriented tasks. [McGee 79]. As a group, our female subjects did slightly better in all categories than our male subjects, but the differences were not significant. Also, our population was drawn from students in engineering majors, so it presumably was skewed with respect to the general population.

We asked each subject a sequence of questions after they had completed the study. When asked "Which method do you prefer?", 25 of the 28 subjects preferred head tracking. No subject reported the head-mounted display as a major problem, although we suspect that the novelty of the experience explains why so few subjects reported a problem using the head-mounted display, which is certainly cumbersome. None of the subjects reported any problem with motion sickness or discomfort during the study, although the subjects overwhelmingly stated that lag between when they moved and when the display updated as the major problem with both modes.

Conclusions

We speculate that head tracking reduced task completion time by allowing the subjects to build a better internal representation of the environment. This internal representation allowed the subjects to more readily determine where they had and had not already searched. While we do not believe that the proper conclusion to draw from this study is that "Virtual Reality is twice as fast as desktop interfaces," we do believe that this study demonstrates that, as least for this generic search task, a head-mounted display is better than a stationary display and hand-held tracker.

Bibliography

[Adelstein 92] Adelstein, B., Johnston, E., and Ellis, S., A Testbed for Characterizing Dynamic Response of Virtual Environment Spatial Sensors, UIST `92: ACM Annual Symposium on User Interface Software and Technology, Monterey, CA, November 15-18.
[Card 87] Card, Stuart K. English, William K., and Burr, Betty I., Evaluation of Mouse, Rate-Controlled Isometric Joystick, Step Keys and Text Keys for Text Selection on a CRT. In Readings in Human-Computer Interaction: A Multidisciplinary Approach, edited by Ronald M. Baecker and William A. S. Buxton. Morgan-Kaufmann Publishers Inc., Los Altos, CA 94022, 1987, pages 386-392.
[Chung 92] Chung, J. C., A Comparison of Head-tracked and Non-head-tracked Steering Modes in the Targeting of Radiotherapy Treatment Beams, 1992 ACM Symposium on Interactive 3D Graphics.
[Driver 90] Driver, J., Read, R., Blough, E., Seah, K., An Evaluation of the Polhemus, Spaceball, and Mouse Devices for 3D Cursor Positioning, Computer Science Department, University of Texas at Austin, August, 1990. Available as Technical Report TR-90-29.
[Liang 91] Liang, J., Shaw, C., Green, M., On Temporal-Spatial Realism in the Virtual Reality Environment, Proc. ACM SIGGRAPH Symposium on User Interface Software and Technology, 1991, pp. 19-25.
[Liu 91] Liu, A., Stark, L., Hirose, M., Interaction of Visual Depth Cues and Viewing Parameters During Simulation Telemanipulation, 1991 IEEE International Conference on Robotics and Automation, April 7-12, 1991, Sacramento, CA, pages 2286-2291.
[Liu 92] Liu, A., Tharp, G., Stark, L., Depth Cue Interaction in Telepresence and Simulated Telemanipulation, 1992 SPIE Conference on Human Vision, Visual Processing, and Digital Display, San Jose, CA.
[Mackinlay 90] Mackinlay, J. D., Card, S. K., and Robertson, G. G., Rapid Controlled Movement through a virtual 3D workspace. Computer Graphics, 24, 4 (Aug. 1990), 171-176.
[McGee 79] McGee, M. G., Human Spatial Abilities, Praeger, New York, 1979.
[McKenna 92] McKenna, Michael, Interactive Viewport Control and Three-Dimensional Operations, 1992 ACM Symposium on 3D Graphics, pages 53-56.
[Piantanida 92] Piantanida, T. P., Boman, D., Larimer, J., Gille, J., Reed, C., Studies of the Field-of-View/Resolution Tradeoff in Virtual-Reality Systems (in preparation)
[Polhemus] Polhemus Incorporated, P.O. Box 560, Hercules Dr., Colchester, Vt. 05446, (802) 655-3159
[Spaceball] Spaceball Technologies, Inc., 600 Suffolk Street, Lowell, MA, 01854, Tel: (508) 970-0330, Fax: (508) 970-0199, telephone in Mountain View: (415) 966-8123
[Sutherland] I. E. Sutherland, A head-mounted three dimensional display, Proceedings of the AFIPS Fall Joint Computer Conference, 1968, volume 33, pages 757-764.
[VPL 91] Eyephonetm description, available from VPL, Inc.: 415/312-0200, 950 Tower Lane 14th floor, Foster City, CA 94404
[Ware 90] Ware, C., Osborne, S., Exploration and Virtual Camera Control in Virtual Three Dimensional Environments, In Proc. 1990 Symposium on Interactive 3D Graphics (Snowbird, UT, Mar. `91). Computer Graphics, 24, 2 (Mar. 1991), 175-183.
[Ware 91] Ware, C. and Slipp, L., Exploring virtual environments using velocity control: A comparison of three devices. In Proc. Hum. Factors Soc., 35th Ann. Meeting. (San Francisco, Sep. 1991). HFS, 1991, pp. 300-304.
[Wells 87] Wells, M. J., Griffin, M. J., A Review and Investigation of Aiming and Tracking Performance with Head-Mounted Sights, IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-17, No. 2, March/April 1987, 210-221.