- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Bayesian cluster validation
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Bayesian cluster validation Koepke, Hoyt Adam
Abstract
We propose a novel framework based on Bayesian principles for validating clusterings and present efficient algorithms for use with centroid or exemplar based clustering solutions. Our framework treats the data as fixed and introduces perturbations into the clustering procedure. In our algorithms, we scale the distances between points by a random variable whose distribution is tuned against a baseline null dataset. The random variable is integrated out, yielding a soft assignment matrix that gives the behavior under perturbation of the points relative to each of the clusters. From this soft assignment matrix, we are able to visualize inter-cluster behavior, rank clusters, and give a scalar index of the the clustering stability. In a large test on synthetic data, our method matches or outperforms other leading methods at predicting the correct number of clusters. We also present a theoretical analysis of our approach, which suggests that it is useful for high dimensional data.
Item Metadata
Title |
Bayesian cluster validation
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2008
|
Description |
We propose a novel framework based on Bayesian principles for validating clusterings and present efficient algorithms for use with centroid or exemplar based clustering solutions. Our framework treats the data as fixed and introduces perturbations into the clustering procedure. In our algorithms, we scale the distances between points by a random variable whose distribution is tuned against a baseline null dataset. The random variable is integrated out, yielding a soft assignment matrix that gives the behavior under perturbation of the points relative to each of the clusters. From this soft assignment matrix, we are able to visualize inter-cluster behavior, rank clusters, and give a scalar index of the the clustering stability. In a large test on synthetic data, our method matches or outperforms other leading methods at predicting the correct number of clusters. We also present a theoretical analysis of our approach, which suggests that it is useful for high dimensional data.
|
Extent |
4024259 bytes
|
Genre | |
Type | |
File Format |
application/pdf
|
Language |
eng
|
Date Available |
2008-08-25
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0051357
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2008-11
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International