- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Semantic compression and pattern extraction with fascicles
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Semantic compression and pattern extraction with fascicles Madar, Jason Chun-Sing
Abstract
Often many records in a database share similar values for several attributes. If one is able to identify and group these records together that share similar values for some — even if not all — attributes, not only does one have the possibility of a more parsimonious representation of the data, but one may also gain useful insight into the data from an analysis and mining perspective. In this thesis, we introduce the notion of fascicles. A fascicle F(k,t) is a subset of records that have k compact attributes. An attribute A of a collection F of records is compact if the width of the range of A-values (for numeric attributes) or the number of distinct A-values (for categorical attributes) of all the records in F does not exceed t. We introduce and study two problems related to fascicles. First, we consider how to find fascicles such that the total storage of the relation is minimized. Second, we study how best to extract fascicles whose sizes exceed a given minimum threshold and that represent patterns of maximal quality, where quality is measured by the pair (k,t). We develop algorithms to attack both of the above problems. We show that these two problems are very hard to solve optimally. But we demonstrate empirically that good solutions can be obtained using our algorithms.
Item Metadata
Title |
Semantic compression and pattern extraction with fascicles
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
1999
|
Description |
Often many records in a database share similar values for several attributes. If one is able to
identify and group these records together that share similar values for some — even if not all —
attributes, not only does one have the possibility of a more parsimonious representation of the
data, but one may also gain useful insight into the data from an analysis and mining perspective.
In this thesis, we introduce the notion of fascicles. A fascicle F(k,t) is a subset of records that
have k compact attributes. An attribute A of a collection F of records is compact if the width of
the range of A-values (for numeric attributes) or the number of distinct A-values (for categorical
attributes) of all the records in F does not exceed t. We introduce and study two problems
related to fascicles. First, we consider how to find fascicles such that the total storage of the
relation is minimized. Second, we study how best to extract fascicles whose sizes exceed a given
minimum threshold and that represent patterns of maximal quality, where quality is measured by
the pair (k,t). We develop algorithms to attack both of the above problems. We show that these
two problems are very hard to solve optimally. But we demonstrate empirically that good
solutions can be obtained using our algorithms.
|
Extent |
4388717 bytes
|
Genre | |
Type | |
File Format |
application/pdf
|
Language |
eng
|
Date Available |
2009-06-15
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
|
DOI |
10.14288/1.0051612
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
1999-05
|
Campus | |
Scholarly Level |
Graduate
|
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.