UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Managing data updates and transformations : a study of the what and how Wong, Jessica Hei-Man

Abstract

Cleaning data (i.e., making sure data contains no errors) can take a large part of a project’s lifetime and cost. As dirty data can be introduced into a system through user actions (e.g., accidental rewrite of a value or simply incorrect information), or through the process of data integration, datasets require a constant iterative process of collecting, transforming, storing, and cleaning. In fact, it has been estimated that 80% of a project’s development and cost is spent on data cleaning. The research we are undertaking seeks to improve this process for users who are using a centralized database. While expert users may be able to write a script or use a database to help manage, verify, and correct their data, non-computer experts often lack these skills and thus, trawling through a large dataset is no easy feat for them. Non-expert users may lack the skills to effectively find what they need and often may not even be able to efficiently find the starting point of their data exploration task. They may look at a piece of data and be unsure of whether or not this piece of data is worth trusting (i.e., how reliable and accurate is it?). This thesis focuses on a system that facilitates this data verification and update process to help minimize the amount of effort and time put in to help clean the data. Most of our effort concentrated on building this system and working on the details needed to make it work. The system has a small visualization component designed to help users determine the transformation process that a piece of data has gone through. We want to show users when a piece of data was created along with what changes users have made to it along the way. To evaluate this system, an accuracy test was run on the system to determine if it could successfully manage updates. A user study was run to evaluate the visualization portion of the system.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International