- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Computational experiment comprehension using provenance...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Computational experiment comprehension using provenance summarization Boufford, Nichole
Abstract
Scientists often use complex multistep workflows to computationally analyze data. These workflows might include downloading datasets, installing packages, data processing, model training and evaluating the results. It is difficult to effectively manage and track these computational workflows. The fast-paced and iterative nature of research programming leads these workflows to include unused code, multiple versions of the same script, and untracked dependencies. These issues cause difficulties when researchers try to reproduce code that someone else has written, or even code that they have written themselves. Research programmers can address these problems by collecting data provenance: a record of what happened during an experiment, including files touched, execution order, and software dependencies. Provenance provides a record of experiment execution, but provenance graphs are often large and complicated, and quickly become incomprehensible. We propose a new method for summarizing provenance graphs using recent advances in prompting large language models. We use large language model prompting to develop textual summaries of provenance graphs. We perform a user study to compare textual summaries to traditional node-link diagrams for experiment reproduction tasks. Our results show that textual summaries are a promising approach to summarizing provenance for experiment reproduction. We use qualitative results from the user study to motivate future designs for reproducibility tools.
Item Metadata
Title |
Computational experiment comprehension using provenance summarization
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2024
|
Description |
Scientists often use complex multistep workflows to computationally analyze data. These workflows might include downloading datasets, installing packages, data processing, model training and evaluating the results. It is difficult to effectively manage and track these computational workflows. The fast-paced and iterative nature of research programming leads these workflows to include unused code, multiple versions of the same script, and untracked dependencies. These issues cause difficulties when researchers try to reproduce code that someone else has written, or even code that they have written themselves. Research programmers can address these problems by collecting data provenance: a record of what happened during an experiment, including files touched, execution order, and software dependencies. Provenance provides a record of experiment execution, but provenance graphs are often large and complicated, and quickly become incomprehensible. We propose a new method for summarizing provenance graphs using recent advances in prompting large language models. We use large language model prompting to develop textual summaries of provenance graphs. We perform a user study to compare textual summaries to traditional node-link diagrams for experiment reproduction tasks. Our results show that textual summaries are a promising approach to summarizing provenance for experiment reproduction. We use qualitative results from the user study to motivate future designs for reproducibility tools.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2024-03-28
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0440963
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2024-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International