UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Managing updates and transformations in data sharing systems Thrastarson, Arni Mar

Abstract

Dealing with dirty data is an expensive and time consuming task. Estimates suggest that up to 80% of the total cost of large data projects is spent on data cleaning alone. This work is often done manually by domain experts in data applications, working with data copies and limited database access. We propose a new system of update propagation to manage data cleaning transformations in such data sharing scenarios. By spreading the changes made by one user to all users working with the same data, we hope to reduce repeated manual labour and improve overall data quality. We describe a modular system design, drawing from different research areas of data management, and highlight system requirements and challenges for implementation. Our goal is not to achieve full synchronization, but to propagate updates that individual users consider valuable to their operation.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivs 2.5 Canada