Cognitive Dimensions and An Empirical Evaluation: Lessons Learned

Francesmary Modugno
University of Washington
Seattle, WA 98115
fm@cs.washington.edu
http://www.cs.washington.edu/homes/fm

ABSTRACT

We discuss usability problems uncovered by a Cognitive Dimensions (CD) analysis of a demonstrational desktop and verified by an empirical evaluation. These combined analyses provide lessons for those selecting usability evaluation techniques and those developing demonstrational systems: CD's are often overlooked for evaluation; CD's can be learned and used quickly; CD's can help designers understand and evaluate the differences between alternative designs; non-empirical evaluation techniques can guide the interpretation of empirical data and shed light on overlooked aspects of a system; and demonstrational systems should support programming strategy selection.

Keywords: cognitive dimensions, usability evaluation, programming by demonstration, end-user programming

INTRODUCTION AND MOTIVATION

System designers often make tradeoffs to satisfy design goals. Also, before embarking on a costly empirical evaluation, designers usually employ non-empirical evaluation technique (e.g., heuristic evaluation) to uncover potential usability problems quickly and cheaply. Which technique(s) can help designers understand the tradeoffs in a design or between designs and provide them with feedback on how to improve a design without requiring that they become experts in the technique? We present a case study of one technique, Cognitive Dimensions [2] (CD's), and share the lessons learned by doing a CD analysis of a system and then comparing the results with an empirical study.

COGNITIVE DIMENSIONS

Cognitive Dimensions are a framework for a broad-brush assessment of a system's form and structure. To evaluate a system, the user analyzes it along each of 12 dimensions. The dimension, which are grounded in psychological theory, can provide insight into the cognitively important aspects of a system and can reveal potential usability problems. (For details of how the dimensions derive from psychological theory and how they are applied see [2])

The Pursuit Desktop

We analyzed Pursuit [6], a demonstrational desktop (similar to the Macintosh Finder) whose goal is to enable non-programmers to construct programs containing loops, variables and conditionals without having to develop programming expertise. To create a program, users demonstrate its actions on files and folders on the desktop, and Pursuit infers a general procedure. An open problem for demonstrational systems is how to represent the inferred program. We explored two equivalent languages to represent the evolving program while the user demonstrates it : a mostly graphical language containing icons for data and operations, and a mostly textual language containing icons for data and text for operations. We developed two Pursuit prototypes that differed only in how they represented the evolving program.

The Cognitive Dimensions of Pursuit

The goal of the CD analysis was to understand the tradeoffs between the two languages and to gain insight into the impact the languages might have on Pursuit's effectiveness. A second goal was to uncover potential usability problems prior to the empirical evaluation. We chose CD's over other analysis techniques because CD's explicitly explore design tradeoffs. Also, as novices to usability evaluation, we wanted a technique that we could learn and use quickly. Moreover, we wanted to get a deeper understanding of the user's interaction with the system, not just find interface problems.

After reading papers on CD's, we spent a day thinking about how each dimension applied to Pursuit. In a few days, we detailed the results (see [6]). The analysis (1) revealed insights about Pursuit's overall design, (2) provided a way to characterize the differences between the two representation languages, and (3) clarified the tradeoffs between these differences. For example, the mostly graphical language is more role expressive, meaning it more closely reflects desktop objects and operations. The mostly textual language is terser, meaning more of it appears in the program window. The cost of greater role expressiveness (terseness) is less terseness (role expressiveness).

The Strategy Choosing Problem.

A surprising result, uncovered while analyzing Pursuit along the Look-Ahead dimension, applies to Pursuit as well as to other demonstrational systems. Look-Ahead constraints impose an order on user actions. For example, to select a menu item the user must first expose the menu. These constraints require users to plan before they execute any actions -- the more planning, the greater the burden (i.e., look-ahead) on the user.

In Pursuit (and some other demonstrational systems), users must decide a priori how to demonstrate a program. That is, the user must determine the specification strategy, which involves thoroughly examining the state of the desktop and inferring state changes that may result from intermediate program actions. We refer to this as the strategy-choosing problem . We added a feature to Pursuit to automatically handle certain classes of this problem: during the demonstration, if Pursuit recognizes an inappropriate demonstration strategy, it notifies the user, changes the strategy, updates the program to reflect the change, and enables the user to continue the demonstration. This reduces look-ahead because the user is less constrained to examine the system state, etc. before a demonstration.

EMPIRICAL EVALUATION OF PURSUIT

After incorporating the changes suggested by the CD analysis into the prototypes, we performed a user study. Sixteen non-programmers were randomly assigned to use one of the prototypes and were given program construction and comprehension tasks. Both groups successfully constructed and comprehended programs containing loops, variables and conditionals. Thus, Pursuit met its goal of enabling non-programmers to access the power of programming.

We also wanted to understand the effects of the language tradeoffs on the usability of Pursuit. An interesting result was the effect on users' ability to construct programs: the more graphical language group was twice as accurate in constructing programs (F(1,28)=13.00, p<.002) and was also better at comprehending programs containing control constructs and variables (t(14)=1.84, p<.04). Since user actions to construct a program are identical for both prototypes, these differences could only be due to the different representation languages. These findings were consistent with the CD analysis, which suggested that since the mostly graphical language was more role expressive and closer to the representations in the interface, it might better facilitate learning and comprehension.

The Strategy-Choosing Problem Revisited.

The study also confirmed the strategy-choosing problem. By examining the log files from the program construction tasks, we discovered that users often had difficulty determining how to demonstrate a program. Of the 16 users, all but one chose an incorrect demonstration strategy at least once. In only 18% of these cases did the user eventually create a correct program -- by starting the programming task over with another strategy .

Recall, that Pursuit incorporated a feature to handle the strategy-choosing problem. Although the mechanism was not documented (to reduce what users had to learn prior to the construction task), 9 of the 16 users accidentally happened on it. Of those 9 initially incorrect programming attempts, 6 went on to correctly construct the program by adopting the new strategy and continuing the demonstration . Thus, the mechanism provided a 67% recovery rate from an error in strategy without the user starting over as compared to an 18% recovery rate in general with the user starting over .

DISCUSSION AND CONCLUSIONS

There is ongoing study of the effectiveness, applicability, learnability and usability of different usability evaluation techniques. Much of this work compares performance outcomes of the different techniques (e.g. [1,3,5]), although John [4] has used the case-study approach to understand what people do when using these techniques. Our work supplements these results by adding CD's as an evaluation technique to study and by suggesting further investigation into how each of these techniques might interact with a formal user study. Our experience has taught us several lessons. First, a computer scientist with little psychology or HCI training can learn and use CD's in a few days. For designers, we recommend CD's not only for revealing potential usability problems, but also for understanding different design tradeoffs and their potential impact on usability.

The discovery of the strategy-choosing problem in Pursuit and the confirmation by the empirical study of the severity of this problem suggest that designers of demonstrational systems need to consider ways to support the strategy-selection process for users. The ability to provide this support, at least for some types of strategy-selection errors, was demonstrated by the successful use of a feature added to Pursuit.

Finally, the CD analysis influenced how we analyzed the empirical data. Because the CD analysis revealed the strategy-choosing problem, we looked for confirmatory evidence in the data logs. We might not have looked for this problem otherwise. We thus might have missed a stumbling block for users (at least prior to the empirical study), might have incurred greater cost to fix it after the study in terms of additional user testing, and would not have learned as much from the user study. Moreover, the analysis of the data logs as a result of the CD analysis not only showed us how our solution to the strategy-choosing problem helped users, it also revealed particular instances where it failed and suggested future research into devising mechanisms for handling different types of strategy-selection problems in demonstrational systems.

REFERENCES

D. L. Cuomo and C. D. Bowen. Understanding Usability Issues Addressed by Three User-System Interface Evaluation Techniques. Interacting with Computers , 6(1):86--108, 1994.
T.R.G. Green. Cognitive Dimensions of Notations. In People and Computers V , 1989.
R. Jeffries et 1al. User Interface Evaluation in the Real World: A Comparison of Four Techniques. In Proceedings of CHI '91 .
B. E. John and H. Packer. Learning and Using the Cognitive Walkthrough Method: A Case Study Approach. In Procedings of CHI'95 .
C. M. Karat. A Comparison of User Interface Evaluation Methods. In J. Nielsen and R. L. Mack, editors, Usability Inspection Methods .
F. Modugno. Extending End-User Programming in a Visual Shell with Programming by Demonstration and Graphical Language Techniques . PhD thesis, Carnegie Mellon University, March 1995.