- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Generalized MDL approach for data summarization
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Generalized MDL approach for data summarization Zhou, Xiaodong
Abstract
There are many applications that identify some data of interest. Usually, such applications just return a set of records that satisfy the criteria applied. However, such results cannot provide enough information for the user. A concise description is more preferable than the individual data. Minimum Description Length (MDL) is a well-known approach to handle such problems. In this thesis, we extend the MDL principle to the Generalized MDL (GMDL) principle by including some "do not care" data. We apply the MDL and GMDL principles to solve the problem of data summarization both in the spatial case and in the hierarchical case. For the spatial case, we improve one current top-down algorithm for high-dimensional data. We also study the GMDL problem for the hierarchical case and find that there exists a unique, non-redundant, and bluemaximal MDL covering. We propose MDL-Tree and GMDL-Tree algorithms to find MDL covering and GMDL covering respectively in the hierarchical case. The experimental results show that GMDL coverings have a much shorter description than MDL covering in the hierarchical case.
Item Metadata
Title |
Generalized MDL approach for data summarization
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2002
|
Description |
There are many applications that identify some data of interest. Usually, such
applications just return a set of records that satisfy the criteria applied. However,
such results cannot provide enough information for the user. A concise description
is more preferable than the individual data. Minimum Description Length (MDL)
is a well-known approach to handle such problems.
In this thesis, we extend the MDL principle to the Generalized MDL (GMDL)
principle by including some "do not care" data. We apply the MDL and GMDL
principles to solve the problem of data summarization both in the spatial case and
in the hierarchical case. For the spatial case, we improve one current top-down
algorithm for high-dimensional data. We also study the GMDL problem for the
hierarchical case and find that there exists a unique, non-redundant, and bluemaximal
MDL covering. We propose MDL-Tree and GMDL-Tree algorithms to
find MDL covering and GMDL covering respectively in the hierarchical case. The
experimental results show that GMDL coverings have a much shorter description
than MDL covering in the hierarchical case.
|
Extent |
2569767 bytes
|
Genre | |
Type | |
File Format |
application/pdf
|
Language |
eng
|
Date Available |
2009-10-09
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
|
DOI |
10.14288/1.0051405
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2003-05
|
Campus | |
Scholarly Level |
Graduate
|
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.