- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Robust variable selection for clustering and classification
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Robust variable selection for clustering and classification Gardener, Jordan Arthur
Abstract
Variable selection and other dimensionality reduction methods are more important than ever before. Data sets are getting increasingly massive as time goes on. These huge data sets can be cumbersome, or even impossible,to analyse with many methods. This thesis attempts to improve upon an established method of variable selection for clustering and classification by making it robust to outliers. This is done by initializing using a mixture model of contaminated normal distributions. From these contaminated normal distributions, each observation is placed into clustering groups made up of subgroups of good observations and outlier observations. The variable indicating membership to the good observation subgroup can be used as a weight measure for calculations used within a variable selection technique.This reduces the effect that an outlier may have on the variable selection process. In order to showcase the proposed methodology, the new robust version was then compared against the original on both a simulated and real data set. In these comparisons it was found that the robust model did perform better in the presence of outliers.
Item Metadata
Title |
Robust variable selection for clustering and classification
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2019
|
Description |
Variable selection and other dimensionality reduction methods are more important than ever before. Data sets are getting increasingly massive as time goes on. These huge data sets can be cumbersome, or even impossible,to analyse with many methods. This thesis attempts to improve upon an established method of variable selection for clustering and classification by making it robust to outliers. This is done by initializing using a mixture model of contaminated normal distributions. From these contaminated normal distributions, each observation is placed into clustering groups made up of subgroups of good observations and outlier observations. The variable indicating membership to the good observation subgroup can be used as a weight measure for calculations used within a variable selection technique.This reduces the effect that an outlier may have on the variable selection process. In order to showcase the proposed methodology, the new robust version was then compared against the original on both a simulated and real data set. In these comparisons it was found that the robust model did perform better in the presence of outliers.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2019-12-23
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0387297
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2020-02
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International