UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Generalized MDL approach for data summarization Zhou, Xiaodong

Abstract

There are many applications that identify some data of interest. Usually, such applications just return a set of records that satisfy the criteria applied. However, such results cannot provide enough information for the user. A concise description is more preferable than the individual data. Minimum Description Length (MDL) is a well-known approach to handle such problems. In this thesis, we extend the MDL principle to the Generalized MDL (GMDL) principle by including some "do not care" data. We apply the MDL and GMDL principles to solve the problem of data summarization both in the spatial case and in the hierarchical case. For the spatial case, we improve one current top-down algorithm for high-dimensional data. We also study the GMDL problem for the hierarchical case and find that there exists a unique, non-redundant, and bluemaximal MDL covering. We propose MDL-Tree and GMDL-Tree algorithms to find MDL covering and GMDL covering respectively in the hierarchical case. The experimental results show that GMDL coverings have a much shorter description than MDL covering in the hierarchical case.

Item Media

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.