Robust variable selection for clustering and classification

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Robust variable selection for clustering and classification Gardener, Jordan Arthur

Abstract

Variable selection and other dimensionality reduction methods are more important than ever before. Data sets are getting increasingly massive as time goes on. These huge data sets can be cumbersome, or even impossible,to analyse with many methods. This thesis attempts to improve upon an established method of variable selection for clustering and classification by making it robust to outliers. This is done by initializing using a mixture model of contaminated normal distributions. From these contaminated normal distributions, each observation is placed into clustering groups made up of subgroups of good observations and outlier observations. The variable indicating membership to the good observation subgroup can be used as a weight measure for calculations used within a variable selection technique.This reduces the effect that an outlier may have on the variable selection process. In order to showcase the proposed methodology, the new robust version was then compared against the original on both a simulated and real data set. In these comparisons it was found that the robust model did perform better in the presence of outliers.

Item Metadata

Title	Robust variable selection for clustering and classification
Creator	Gardener, Jordan Arthur
Publisher	University of British Columbia
Date Issued	2019
Description	Variable selection and other dimensionality reduction methods are more important than ever before. Data sets are getting increasingly massive as time goes on. These huge data sets can be cumbersome, or even impossible,to analyse with many methods. This thesis attempts to improve upon an established method of variable selection for clustering and classification by making it robust to outliers. This is done by initializing using a mixture model of contaminated normal distributions. From these contaminated normal distributions, each observation is placed into clustering groups made up of subgroups of good observations and outlier observations. The variable indicating membership to the good observation subgroup can be used as a weight measure for calculations used within a variable selection technique.This reduces the effect that an outlier may have on the variable selection process. In order to showcase the proposed methodology, the new robust version was then compared against the original on both a simulated and real data set. In these comparisons it was found that the robust model did perform better in the presence of outliers.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2019-12-23
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0387297
URI	http://hdl.handle.net/2429/72932
Degree (Theses)	Master of Science - MSc
Program (Theses)	Mathematics
Affiliation	Arts and Sciences, Irving K. Barber School of (Okanagan); Computer Science, Mathematics, Physics and Statistics, Department of (Okanagan)
Degree Grantor	University of British Columbia
Graduation Date	2020-02
Campus	UBCO
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Robust variable selection for clustering and classification Gardener, Jordan Arthur

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights