05-830, User Interface Software, Spring, 1997
Benchmark List
Brad Myers

Human Computer Interaction Institute
School of Computer Science
Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, PA 15213-3891 USA
FAX: 412-268-5576

Benchmarks for User Interface Toolkits


We have created a set of benchmark tasks in order to help learn about and evaluate different toolkits.  The benchmark tasks are designed to span a wide range of user interface styles, and thus some benchmark should be appropriate for evaluating a toolkit with respect to almost any application type.  Another goal for the benchmarks is that they should be fully implementable in an appropriate toolkit by an experienced toolkit user in less than 8 hours, and in a few hundred lines of code.

The benchmarks can be used in two ways.  In our class, we are using them as a way to learn various toolkits.  Each student is implementing one of the benchmarks four times in four different toolkits.  Because the benchmarks are small, they can be implemented quickly once the student learns the toolkit.  It appears that for most toolkits, the students can learn them from scratch, and implement the benchmark in about 20 to 30 hours (about 3 weeks), which allows plenty of time for four implementations.

Another use of the benchmarks is to try to evaluate and compare different toolkits.  The student's impressions do a good job of highlighting the ease of learning for the toolkits, but they are not particularly enlightening about the effectiveness for experts.  Therefore, it would be useful if some experts in various toolkits tried the benchmarks to see if the results are significantly different. This will help to identify the strengths and weaknesses of the toolkits with respect to the various kinds of applications, and with respect to different aspects of usability (ease of learning, efficiency, error rates, etc.)

The Experiment

We would like to see implementations of these benchmarks using as many different toolkits as possible. And multiple implementations by different people using the same toolkit will make comparisons more valid. Therefore, we would appreciate receiving as many implementations of these as possible. We estimate that if you already know the toolkit, that is should take no more than a day to implement any of these.

Therefore, please implement one or more of the following benchmarks in a toolkit you are familiar with (or one you want to learn), and then answer our questionaire. We plan to write a paper summarizing the results.

Initial Benchmarks

The students in the class have created a set of benchmarks that span a wide range of user interface styles and capabilities.  See the assignment about creating the benchmarks.  The following are the benchmarks they created, plus an old benchmark that I created.

The task is to implement a benchmark using a particular toolkit, and then answer a set of questions.  Please mail the answers to the questions to bam@cs.cmu.edu and I will collect them.

Questions to be Answered

There are a set of questions about each implementation that will help us evaluate how well this experiment is working.  Any other comments would also be appreciated!


So far, at least one of the benchmarks has been implemented at least once using all of the following tools (But more implementations of the benchmarks using these will be helpful!):

We would also like to see implementations using:


If you want to learn a toolkit, or you are already an expert in a toolkit, please implement one of the benchmarks using the tool, and send us the results!

Thank you in advance for your help.


An article is being prepared about this study, and how benchmarks are being used in the class.  A draft of the article is available in postscript format.  (Note: postscript prints ok and works in gs but not in gv for some reason.)  Comments on the paper would be appreciated.