UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Robust variable selection for clustering and classification Gardener, Jordan Arthur

Abstract

Variable selection and other dimensionality reduction methods are more important than ever before. Data sets are getting increasingly massive as time goes on. These huge data sets can be cumbersome, or even impossible,to analyse with many methods. This thesis attempts to improve upon an established method of variable selection for clustering and classification by making it robust to outliers. This is done by initializing using a mixture model of contaminated normal distributions. From these contaminated normal distributions, each observation is placed into clustering groups made up of subgroups of good observations and outlier observations. The variable indicating membership to the good observation subgroup can be used as a weight measure for calculations used within a variable selection technique.This reduces the effect that an outlier may have on the variable selection process. In order to showcase the proposed methodology, the new robust version was then compared against the original on both a simulated and real data set. In these comparisons it was found that the robust model did perform better in the presence of outliers.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International