Generalized MDL approach for data summarization

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Generalized MDL approach for data summarization Zhou, Xiaodong

Abstract

There are many applications that identify some data of interest. Usually, such applications just return a set of records that satisfy the criteria applied. However, such results cannot provide enough information for the user. A concise description is more preferable than the individual data. Minimum Description Length (MDL) is a well-known approach to handle such problems. In this thesis, we extend the MDL principle to the Generalized MDL (GMDL) principle by including some "do not care" data. We apply the MDL and GMDL principles to solve the problem of data summarization both in the spatial case and in the hierarchical case. For the spatial case, we improve one current top-down algorithm for high-dimensional data. We also study the GMDL problem for the hierarchical case and find that there exists a unique, non-redundant, and bluemaximal MDL covering. We propose MDL-Tree and GMDL-Tree algorithms to find MDL covering and GMDL covering respectively in the hierarchical case. The experimental results show that GMDL coverings have a much shorter description than MDL covering in the hierarchical case.

Item Metadata

Title	Generalized MDL approach for data summarization
Creator	Zhou, Xiaodong
Publisher	University of British Columbia
Date Issued	2002
Description	There are many applications that identify some data of interest. Usually, such applications just return a set of records that satisfy the criteria applied. However, such results cannot provide enough information for the user. A concise description is more preferable than the individual data. Minimum Description Length (MDL) is a well-known approach to handle such problems. In this thesis, we extend the MDL principle to the Generalized MDL (GMDL) principle by including some "do not care" data. We apply the MDL and GMDL principles to solve the problem of data summarization both in the spatial case and in the hierarchical case. For the spatial case, we improve one current top-down algorithm for high-dimensional data. We also study the GMDL problem for the hierarchical case and find that there exists a unique, non-redundant, and bluemaximal MDL covering. We propose MDL-Tree and GMDL-Tree algorithms to find MDL covering and GMDL covering respectively in the hierarchical case. The experimental results show that GMDL coverings have a much shorter description than MDL covering in the hierarchical case.
Extent	2569767 bytes
Genre	Thesis/Dissertation
Type	Text
File Format	application/pdf
Language	eng
Date Available	2009-10-09
Provider	Vancouver : University of British Columbia Library
Rights	For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
DOI	10.14288/1.0051405
URI	http://hdl.handle.net/2429/13859
Degree (Theses)	Master of Science - MSc
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2003-05
Campus	UBCV
Scholarly Level	Graduate
Aggregated Source Repository	DSpace

Item Media

ubc_2003-0047.pdf -- 2.45MB

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.

Open Collections

UBC Theses and Dissertations

Generalized MDL approach for data summarization Zhou, Xiaodong

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights