Managing data updates and transformations : a study of the what and how

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Managing data updates and transformations : a study of the what and how Wong, Jessica Hei-Man

Abstract

Cleaning data (i.e., making sure data contains no errors) can take a large part of a project’s lifetime and cost. As dirty data can be introduced into a system through user actions (e.g., accidental rewrite of a value or simply incorrect information), or through the process of data integration, datasets require a constant iterative process of collecting, transforming, storing, and cleaning. In fact, it has been estimated that 80% of a project’s development and cost is spent on data cleaning. The research we are undertaking seeks to improve this process for users who are using a centralized database. While expert users may be able to write a script or use a database to help manage, verify, and correct their data, non-computer experts often lack these skills and thus, trawling through a large dataset is no easy feat for them. Non-expert users may lack the skills to effectively find what they need and often may not even be able to efficiently find the starting point of their data exploration task. They may look at a piece of data and be unsure of whether or not this piece of data is worth trusting (i.e., how reliable and accurate is it?). This thesis focuses on a system that facilitates this data verification and update process to help minimize the amount of effort and time put in to help clean the data. Most of our effort concentrated on building this system and working on the details needed to make it work. The system has a small visualization component designed to help users determine the transformation process that a piece of data has gone through. We want to show users when a piece of data was created along with what changes users have made to it along the way. To evaluate this system, an accuracy test was run on the system to determine if it could successfully manage updates. A user study was run to evaluate the visualization portion of the system.

Item Metadata

Title	Managing data updates and transformations : a study of the what and how
Creator	Wong, Jessica Hei-Man
Publisher	University of British Columbia
Date Issued	2016
Description	Cleaning data (i.e., making sure data contains no errors) can take a large part of a project’s lifetime and cost. As dirty data can be introduced into a system through user actions (e.g., accidental rewrite of a value or simply incorrect information), or through the process of data integration, datasets require a constant iterative process of collecting, transforming, storing, and cleaning. In fact, it has been estimated that 80% of a project’s development and cost is spent on data cleaning. The research we are undertaking seeks to improve this process for users who are using a centralized database. While expert users may be able to write a script or use a database to help manage, verify, and correct their data, non-computer experts often lack these skills and thus, trawling through a large dataset is no easy feat for them. Non-expert users may lack the skills to effectively find what they need and often may not even be able to efficiently find the starting point of their data exploration task. They may look at a piece of data and be unsure of whether or not this piece of data is worth trusting (i.e., how reliable and accurate is it?). This thesis focuses on a system that facilitates this data verification and update process to help minimize the amount of effort and time put in to help clean the data. Most of our effort concentrated on building this system and working on the details needed to make it work. The system has a small visualization component designed to help users determine the transformation process that a piece of data has gone through. We want to show users when a piece of data was created along with what changes users have made to it along the way. To evaluate this system, an accuracy test was run on the system to determine if it could successfully manage updates. A user study was run to evaluate the visualization portion of the system.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2016-04-22
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0300153
URI	http://hdl.handle.net/2429/57767
Degree	Master of Science - MSc
Program	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2016-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Managing data updates and transformations : a study of the what and how Wong, Jessica Hei-Man

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights