Human-Computer Interaction Thesis Defense

  • Gates Hillman Centers and Zoom
  • Reddy Conference Room 4405 and Virtual
  • Ph.D. Student
  • Human-Computer Interaction Institute
  • Carnegie Mellon University
Thesis Orals

Designing Effective History Support for Exploratory Programming Data Work

Why did you model the data that way? How do we reproduce this plot? Programming for data science or modeling is a highly valued skill today. Yet when data workers experiment with data by coding — an intensely iterative process called exploratory programming — the details of what they try along the way to a solution tend to get lost. Since experimentation underies essential workflows in data analysis, machine learning, AI, and visualization, this is a serious flaw. Ask any data worker today, and regardless of organization or years of experience, they have faced at least some results that cannot be readily reproduced, or mysterious data decisions missing a rationale. Modern best practices for managing experimentation take high human effort and still leave considerable room for error. With rising demand for responsibility and accountability of analyses and models, it is vital that people have proper support for documenting and answering why things were built the way they were.

This dissertation explores history tooling to support exploratory programming data work. To do this, first we conducted interviews, surveys, and design exercises with practitioners to learn about their needs and current workflows for experimenting today. We contribute two studies: 1) a study detailing the mix of tools and ad-hoc methods data workers use to manage their experiments, and 2) an investigation of how data workers use computational notebooks for iteration. Our results point to two key barriers: the manual effort needed to collect experiment history today is unsustainable, and recovering semantic process information out of a pile of history logs is far too cumbersome for practitioners to fit into their workflows.

We aim to help practitioners record their experimentation without any manual effort, and moreover, quickly recover history facts to answer rationale questions about their work. 

Next in this dissertation, we design, build, and test new interactive tools to meet these design goals, over a 5 year iterative human-centered design process. We contribute: 1) a series of 5 experiment history tool prototypes and 4 usability studies with practitioners, each of which illuminates a different aspect of the design space, 2) a set of novel visualization and interaction techniques for concisely summarizing history, 3) a fully implemented experiment history tool called Verdant, deployed in the wild as a computational notebook extension, and 4) an observational study where data workers use Verdant during exploratory programming and afterwards to answer rationale questions about the history of their experiments. With Verdant, participants were able to answer 97% of history questions about their work in under 2 minutes 30 seconds. All participants reported ways in which Verdant’s style of history support would help in their own real life work practices. In the conclusions of this thesis we discuss the broader design space of experiment support tooling that rich history data enables.

In Person and Zoom Participation.  See announcement.

For More Information, Please Contact: