Natural Programming

The Whyline

The Whyline is a debugging tool that allows programmers to ask "Why did" and "Why didn't" questions about their program's output. Programmers choose from a set of questions generated automatically via static and dynamic analyses, and the tool provides answers in terms of the runtime events that caused or prevented the desired output. In user studies of the Whyline (prototyped in the Alice programming environment), programmers using the Whyline to debug spent a factor of 8 less time debugging the same bugs than programmers without the Whyline.

The Whyline's design was heavily motivated by several exploratory studies of programmers' debugging strategies.

Ko, A. J. and Myers, B. A. (2004). Designing the Whyline: A Debugging Interface for Asking Questions About Program Failures. CHI 2004, Vienna, Austria, April 24-29, 151-158.
[local] [ACM]
Watch the Whyline video (H264, 28 MB, requires Quicktime 7)
Watch the CHI 2004 presentation (MPEG4, requires Quicktime 6)

Now Patented!

The Whyline work is now patented: U.S. Patent No. 7,735,066 [patent].

an illustrative scenario

To illustrate its use, consider this scenario (which comes from a real user study).

Ellen is creating a Pac-Man game, and trying to make Pac shrink when the ghost is chasing and touches Pac. She plays the world and makes Pac collide with the ghost, but to her surprise, Pac does not shrink...

Pac did not shrink because Ellen (a pseudonym) has code that prevents Pac from resizing after the big dot is eaten. Either Ellen did not notice that Pac ate the big dot, or she forgot about the dependency.

When Ellen played the world, Alice hid the code and expanded the worldview and property panel, as seen in Figure 1. This relates property values to program output. Ellen presses the why button after noticing that Pac did not shrink, and a menu appears with the items why did and why didn't, as in Figure 2. The submenus contain the objects in the world that were or could have been affected. The menu supports exploration and diagnosis by increasing visibility and decreasing the viscosity of considering them.

Questions can be asked using the
why button

Figure 1. Ellen notices that Pac isn't resizing.

Because Ellen expected Pac to resize after touching the ghost, she selects why didn't and scans the property changes and animations that could have happened. When she hovers the mouse over a menu item, the code that caused the output in question is highlighted and centered in the code area (see Figure 2). This supports diagnosis by exposing hidden dependencies between the failure and the code that might be responsible for it. This also avoids premature commitment in diagnosis by showing the subject of the question without requiring that the question be asked.

Whyline answering why Pac is no
resizing.

Figure 2. Ellen asks, "Why didn't Pac resize?"

Ellen asks why didn't Pac resize .5? and the camera focuses on Pac to increase his visibility. The Whyline answers the question by analyzing the runtime actions that did and did not happen, and provides the answer shown in Figure 3. The actions included are only those that prevented Pac from resizing: the predicate whose expression was false and the actions that defined the properties used by the expression. By excluding unrelated actions, we support observation and hypothesizing by increasing the visibility of the actions that likely contain the fault. To support diagnosis, the names and colors are the same as the code that caused them. This improves consistency and closeness of mapping with code.

Whyline reveals the problem using a data and control flow causality diagram

Figure 3. The Whyline reveals that Pac didn't resize because the big dot was eaten.

The arrows represent data and control flow causality. Predicate arrows are labeled true or false and dataflow arrows are labeled with the data used by the action they point to. The arrows support progressive evaluation, and thus hypothesizing, by helping Ellen follow the runtime s computation and control flow.

Along the x-axis is event-relative time, improving the closeness of mapping to the time-based Alice runtime system. Along the y-axis are event threads: this allows co-occurring events to be shown, supporting juxtaposibility.

Ellen interacts with the timeline by dragging the time cursor (the vertical black line in Figure 3). Doing so changes all properties to their values at the time represented by the time s location. This supports exploration of runtime data. When Ellen moves the cursor over an action, the action and the code that caused it become selected, supporting diagnosis and repair. These features allow Ellen to rewind, fast-forward, and the execution history, receiving immediate feedback about the state of the world. This exposes hidden dependencies between actions and data that might not be shown directly on the Whyline, and between current values and program output.

To reduce the viscosity of exploration, Ellen can double-click on an action to implicitly ask what caused this to happen? and actions causing the runtime action are revealed. Ellen can also hover her mouse cursor over expressions in the code to see current values and to evaluate expressions based on the current time. This improves the visibility of runtime data and supports progressive evaluation. Finally, the Whyline supports provisionality by making previous answers available through the Questions ve Asked button. The button prevents the hard mental operation of recalling facts determined earlier in debugging activity.

So this says Pac didn't resize because BigDot.isEaten is true. Oh! The ghost isn't chasing because Pac ate the big dot. Let's try again without getting the big dot.

Without the Whyline, the misperception could have led to an unnecessary search for non-existent errors. In fact, in numerous user tests without the Whyline, users frequently did just this.

evaluation

In comparing equivalent debugging scenarios between user tests with and without the Whyline, we have shown that the Whyline reduced programmer's average debugging time by a factor of 7.8. Furthermore, the Whyline helped programmers complete 40% more of their task than without the Whyline. We are in the process of refining the Whyline and performing more formal investigations of the Whyline's effectiveness.