UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Hierarchical summaries of change in multidimensional data Kim, Alexandra

Abstract

Multidimensional data is prevalent in data warehousing and OLAP. Changes to the data in a data warehouse are of particular interest to analysts as well as knowledge workers since they may be instrumental in understanding trends in a business or enterprise. At the same time, an explicit enumeration of all changes to the fact tuples in a data warehouse is too verbose to be useful; instead, a summary of the changes is more desirable since it allows the user to quickly understand the trends and patterns. In this thesis, we study the problem of summarizing changes in hierarchical multidimensional data with non-overlapping but containable hyperrectangles. An advantage of such summaries is that they naturally follow a tree structure and therefore are arguably easy to interpret. Hierarchies are naturally present in data warehouses and our constructed summaries are designed to leverage the hierarchical structure along each dimension to provide concise summaries of the changes. We study the problem of generating lossless as well as lossy summaries of changes. While lossless summaries allow the exact changes to be recovered, lossy summaries trade accuracy of reconstruction for conciseness. In a lossy summary, the maximum amount of lossiness per tuple can be regulated with a single parameter α. We provide a detailed analysis of the algorithm, then we empirically evaluate its performance, compare it to existing alternative methods under various settings and demonstrate with a detailed set of experiments on real and synthetic data that our algorithm outperforms the baselines in terms of conciseness of summaries or accuracy of reconstruction w.r.t. the size, dimensionality and level of correlation in data.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International