- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Managing data updates and transformations : a study...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Managing data updates and transformations : a study of the what and how Wong, Jessica Hei-Man
Abstract
Cleaning data (i.e., making sure data contains no errors) can take a large part of a project’s lifetime and cost. As dirty data can be introduced into a system through user actions (e.g., accidental rewrite of a value or simply incorrect information), or through the process of data integration, datasets require a constant iterative process of collecting, transforming, storing, and cleaning. In fact, it has been estimated that 80% of a project’s development and cost is spent on data cleaning. The research we are undertaking seeks to improve this process for users who are using a centralized database. While expert users may be able to write a script or use a database to help manage, verify, and correct their data, non-computer experts often lack these skills and thus, trawling through a large dataset is no easy feat for them. Non-expert users may lack the skills to effectively find what they need and often may not even be able to efficiently find the starting point of their data exploration task. They may look at a piece of data and be unsure of whether or not this piece of data is worth trusting (i.e., how reliable and accurate is it?). This thesis focuses on a system that facilitates this data verification and update process to help minimize the amount of effort and time put in to help clean the data. Most of our effort concentrated on building this system and working on the details needed to make it work. The system has a small visualization component designed to help users determine the transformation process that a piece of data has gone through. We want to show users when a piece of data was created along with what changes users have made to it along the way. To evaluate this system, an accuracy test was run on the system to determine if it could successfully manage updates. A user study was run to evaluate the visualization portion of the system.
Item Metadata
Title |
Managing data updates and transformations : a study of the what and how
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2016
|
Description |
Cleaning data (i.e., making sure data contains no errors) can take a large part of a project’s lifetime and cost. As dirty data can be introduced into a system through user actions (e.g., accidental rewrite of a value or simply incorrect information), or through the process of data integration, datasets require a constant iterative process of collecting, transforming, storing, and cleaning. In fact, it has been estimated that 80% of a project’s development and cost is spent on data cleaning. The research we are undertaking seeks to improve this process for users who are using a centralized database. While expert users may be able to write a script or use a database to help manage, verify, and correct their data, non-computer experts often lack these skills and thus, trawling through a large dataset is no easy feat for them. Non-expert users may lack the skills to effectively find what they need and often may not even be able to efficiently find the starting point of their data exploration task. They may look at a piece of data and be unsure of whether or not this piece of data is worth trusting (i.e., how reliable and accurate is it?). This thesis focuses on a system that facilitates this data verification and update process to help minimize the amount of effort and time put in to help clean the data. Most of our effort concentrated on building this system and working on the details needed to make it work. The system has a small visualization component designed to help users determine the transformation process that a piece of data has gone through. We want to show users when a piece of data was created along with what changes users have made to it along the way. To evaluate this system, an accuracy test was run on the system to determine if it could successfully manage updates. A user study was run to evaluate the visualization portion of the system.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2016-04-22
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0300153
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2016-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International