Semantic compression and pattern extraction with fascicles

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Semantic compression and pattern extraction with fascicles Madar, Jason Chun-Sing

Abstract

Often many records in a database share similar values for several attributes. If one is able to identify and group these records together that share similar values for some — even if not all — attributes, not only does one have the possibility of a more parsimonious representation of the data, but one may also gain useful insight into the data from an analysis and mining perspective. In this thesis, we introduce the notion of fascicles. A fascicle F(k,t) is a subset of records that have k compact attributes. An attribute A of a collection F of records is compact if the width of the range of A-values (for numeric attributes) or the number of distinct A-values (for categorical attributes) of all the records in F does not exceed t. We introduce and study two problems related to fascicles. First, we consider how to find fascicles such that the total storage of the relation is minimized. Second, we study how best to extract fascicles whose sizes exceed a given minimum threshold and that represent patterns of maximal quality, where quality is measured by the pair (k,t). We develop algorithms to attack both of the above problems. We show that these two problems are very hard to solve optimally. But we demonstrate empirically that good solutions can be obtained using our algorithms.

Item Metadata

Title	Semantic compression and pattern extraction with fascicles
Creator	Madar, Jason Chun-Sing
Publisher	University of British Columbia
Date Issued	1999
Description	Often many records in a database share similar values for several attributes. If one is able to identify and group these records together that share similar values for some — even if not all — attributes, not only does one have the possibility of a more parsimonious representation of the data, but one may also gain useful insight into the data from an analysis and mining perspective. In this thesis, we introduce the notion of fascicles. A fascicle F(k,t) is a subset of records that have k compact attributes. An attribute A of a collection F of records is compact if the width of the range of A-values (for numeric attributes) or the number of distinct A-values (for categorical attributes) of all the records in F does not exceed t. We introduce and study two problems related to fascicles. First, we consider how to find fascicles such that the total storage of the relation is minimized. Second, we study how best to extract fascicles whose sizes exceed a given minimum threshold and that represent patterns of maximal quality, where quality is measured by the pair (k,t). We develop algorithms to attack both of the above problems. We show that these two problems are very hard to solve optimally. But we demonstrate empirically that good solutions can be obtained using our algorithms.
Extent	4388717 bytes
Genre	Thesis/Dissertation
Type	Text
File Format	application/pdf
Language	eng
Date Available	2009-06-15
Provider	Vancouver : University of British Columbia Library
Rights	For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
DOI	10.14288/1.0051612
URI	http://hdl.handle.net/2429/9180
Degree (Theses)	Master of Science - MSc
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	1999-05
Campus	UBCV
Scholarly Level	Graduate
Aggregated Source Repository	DSpace

Item Media

ubc_1999-0277.pdf -- 4.19MB

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.

Open Collections

UBC Theses and Dissertations

Semantic compression and pattern extraction with fascicles Madar, Jason Chun-Sing

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights