UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Computational experiment comprehension using provenance summarization Boufford, Nichole

Abstract

Scientists often use complex multistep workflows to computationally analyze data. These workflows might include downloading datasets, installing packages, data processing, model training and evaluating the results. It is difficult to effectively manage and track these computational workflows. The fast-paced and iterative nature of research programming leads these workflows to include unused code, multiple versions of the same script, and untracked dependencies. These issues cause difficulties when researchers try to reproduce code that someone else has written, or even code that they have written themselves. Research programmers can address these problems by collecting data provenance: a record of what happened during an experiment, including files touched, execution order, and software dependencies. Provenance provides a record of experiment execution, but provenance graphs are often large and complicated, and quickly become incomprehensible. We propose a new method for summarizing provenance graphs using recent advances in prompting large language models. We use large language model prompting to develop textual summaries of provenance graphs. We perform a user study to compare textual summaries to traditional node-link diagrams for experiment reproduction tasks. Our results show that textual summaries are a promising approach to summarizing provenance for experiment reproduction. We use qualitative results from the user study to motivate future designs for reproducibility tools.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International