UBC Theses and Dissertations
Semantic compression and pattern extraction with fascicles Madar, Jason Chun-Sing
Often many records in a database share similar values for several attributes. If one is able to identify and group these records together that share similar values for some — even if not all — attributes, not only does one have the possibility of a more parsimonious representation of the data, but one may also gain useful insight into the data from an analysis and mining perspective. In this thesis, we introduce the notion of fascicles. A fascicle F(k,t) is a subset of records that have k compact attributes. An attribute A of a collection F of records is compact if the width of the range of A-values (for numeric attributes) or the number of distinct A-values (for categorical attributes) of all the records in F does not exceed t. We introduce and study two problems related to fascicles. First, we consider how to find fascicles such that the total storage of the relation is minimized. Second, we study how best to extract fascicles whose sizes exceed a given minimum threshold and that represent patterns of maximal quality, where quality is measured by the pair (k,t). We develop algorithms to attack both of the above problems. We show that these two problems are very hard to solve optimally. But we demonstrate empirically that good solutions can be obtained using our algorithms.
Item Citations and Data