UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A study of provenance in databases and improving the usability of provenance database systems AlOmeir, Omar

Abstract

Provenance refers to information about the origin of a piece of data and the process that led to its creation. Provenance information has been a focus of database research for quite some time. In this field, most of the focus has been on the sub-problem of finding the source data that contributed to the results of a query. More formally, the problem is defined as follows: given a query q and a tuple t in the results of q, which tuples from the relation R accessed by q caused t to appear in the results of q. The most studied aspect of this problem has been on developing models and semantics that allow this provenance information to be generated and queried. The motivations for studying provenance in databases vary across domains; provenance information is relevant to curated databases, data integration systems, and data warehouses for updating and maintaining views. In this thesis, I look extensively at provenance models as well as different system implementations.I compare the different approaches, analyze them, and point out the advantages and disadvantages of each approach. Based on my findings, I develop a provenance system based on the most attractive features of the previous systems, built on top of a relational database management system. My focus is on identifying areas that could potentially make provenance information easier to understand for users, using visualization techniques to extend the system with a provenance browsing component. I provide a case study using my provenance explorer, looking at a large dataset of financial data that comes from multiple sources. Provenance information helps with tracking the sources and transformations this data went through and explains them to the users in a way they can trust and reason about. There has not been much work focused on presenting and explaining provenance information to database users. Some of the current approaches support limited facilities for visualizing and reporting provenance information. Other approaches simply rely on the user to query and explore the results via different data manipulation languages. My approach presents novel techniques for the user to interact with provenance information.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivs 2.5 Canada