UBC Theses and Dissertations
Parallel computation of data cubes Momen-Pour, Soroush
Since its proposal, data cube has attracted a great deal of attention in both academic and industry research communities. Many research papers have been published about different issues related to data cubes and many commercial OLAP (On-Line Analytical Processing) systems have been released to market with data cube operations as their core functions. Several algorithms have been proposed to compute data cubes more efficiently. PIPESORT and PIPEHASH algorithms proposed by Agrawal et. al., OVERLAP algorithm proposed by Naughton et. al., Partitioned-Cube algorithm proposed by Ross et. al. and the Multi-way Array algorithm proposed by Naughton et. al. are the most significant ones. All of these algorithms are designed for implementation on sequential machines, however computing a data cube can be an expensive task. For some organizations it may take a very powerful computer working around the clock for a week to compute all the data cubes they may want to use. Application of parallel processing can speed up this process. Despite the popularity and importance of data cubes, very little research has been carried out on the parallel computation of them. The only parallel algorithm for computation of data cubes, which I am aware of, is the algorithm proposed by Goil et. al.. Their algorithm works for cases where the data set fits in main memory, however, real world data sets rarely fit in main memory. The wide spread availability of inexpensive cluster machines makes it possible to use parallel processing for computation of data cubes, even in small size firms and as a result there could be a real demand for efficient parallel data cube construction algorithms. I have designed and implemented two parallel data cube computation algorithms (Parallel Partitioned-Cube algorithm and Parallel Single-pass Multi-way Array algorithm) based on sequential algorithms proposed in literature. The former algorithm is classified as a ROLAP (Relational OLAP) algorithm and the second one is considered as a MOLAP (Multi-dimensional OLAP) algorithm.
Item Citations and Data