A study of provenance in databases and improving the usability of provenance database systems

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

A study of provenance in databases and improving the usability of provenance database systems AlOmeir, Omar

Abstract

Provenance refers to information about the origin of a piece of data and the process that led to its creation. Provenance information has been a focus of database research for quite some time. In this field, most of the focus has been on the sub-problem of finding the source data that contributed to the results of a query. More formally, the problem is defined as follows: given a query q and a tuple t in the results of q, which tuples from the relation R accessed by q caused t to appear in the results of q. The most studied aspect of this problem has been on developing models and semantics that allow this provenance information to be generated and queried. The motivations for studying provenance in databases vary across domains; provenance information is relevant to curated databases, data integration systems, and data warehouses for updating and maintaining views. In this thesis, I look extensively at provenance models as well as different system implementations.I compare the different approaches, analyze them, and point out the advantages and disadvantages of each approach. Based on my findings, I develop a provenance system based on the most attractive features of the previous systems, built on top of a relational database management system. My focus is on identifying areas that could potentially make provenance information easier to understand for users, using visualization techniques to extend the system with a provenance browsing component. I provide a case study using my provenance explorer, looking at a large dataset of financial data that comes from multiple sources. Provenance information helps with tracking the sources and transformations this data went through and explains them to the users in a way they can trust and reason about. There has not been much work focused on presenting and explaining provenance information to database users. Some of the current approaches support limited facilities for visualizing and reporting provenance information. Other approaches simply rely on the user to query and explore the results via different data manipulation languages. My approach presents novel techniques for the user to interact with provenance information.

Item Metadata

Title	A study of provenance in databases and improving the usability of provenance database systems
Creator	AlOmeir, Omar
Publisher	University of British Columbia
Date Issued	2015
Description	Provenance refers to information about the origin of a piece of data and the process that led to its creation. Provenance information has been a focus of database research for quite some time. In this field, most of the focus has been on the sub-problem of finding the source data that contributed to the results of a query. More formally, the problem is defined as follows: given a query q and a tuple t in the results of q, which tuples from the relation R accessed by q caused t to appear in the results of q. The most studied aspect of this problem has been on developing models and semantics that allow this provenance information to be generated and queried. The motivations for studying provenance in databases vary across domains; provenance information is relevant to curated databases, data integration systems, and data warehouses for updating and maintaining views. In this thesis, I look extensively at provenance models as well as different system implementations.I compare the different approaches, analyze them, and point out the advantages and disadvantages of each approach. Based on my findings, I develop a provenance system based on the most attractive features of the previous systems, built on top of a relational database management system. My focus is on identifying areas that could potentially make provenance information easier to understand for users, using visualization techniques to extend the system with a provenance browsing component. I provide a case study using my provenance explorer, looking at a large dataset of financial data that comes from multiple sources. Provenance information helps with tracking the sources and transformations this data went through and explains them to the users in a way they can trust and reason about. There has not been much work focused on presenting and explaining provenance information to database users. Some of the current approaches support limited facilities for visualizing and reporting provenance information. Other approaches simply rely on the user to query and explore the results via different data manipulation languages. My approach presents novel techniques for the user to interact with provenance information.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2015-12-16
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivs 2.5 Canada
DOI	10.14288/1.0221370
URI	http://hdl.handle.net/2429/55895
Degree	Master of Science - MSc
Program	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2016-02
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/2.5/ca/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

A study of provenance in databases and improving the usability of provenance database systems AlOmeir, Omar

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights