Robust and sparse regression in the presence of cellwise and casewise contamination with application in data quality modelling

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Robust and sparse regression in the presence of cellwise and casewise contamination with application in data quality modelling McGuinness, Glenn

Abstract

This thesis considers the problem of robust and sparse estimation of linear regression parameters in data with structural and independent contamination. Independent outliers can propagate in data with relatively large numbers of dimensions, resulting in a high fraction of observations with at least one outlying cells. Recent work has shown that traditional robust regression methods are not highly robust to such outliers. We investigate the application of Robust Least Angle Regression (RLARS) to data with independent contamination. We also propose two modified versions of RLARS to further improve its performance. The first method applies RLARS to data which has been filtered of independent outliers. The second method performs RLARS with the Lasso modification for Least Angle Regression (LARS). Extensive simulations show that RLARS is resilient to structural and independent contamination. Compared with RLARS, simulation results show that the first modified version has significantly improved robustness to independent contamination and the second modified version has improved robustness when there are a large number of predictors. We also consider the application of the proposed methods to data quality modelling in a case study for MineSense Technologies Ltd. (MineSense). MineSense develops sensor packages for use in the harsh conditions of an active mine. To maintain high system availability and performance, data must be monitored for a deterioration in sensor health or a change in the data generating process, such as a change in ore body, which can manifest as outliers. We pose the problem of contamination detection, the identification of whether a dataset contains outliers, as a distinct problem from outlier detection, the identification of which cases or cells are outliers. We propose a contamination detection method based on the comparison of robust and non-robust linear regression estimates. When outliers are present, the robust and non-robust estimates differ significantly, indicating the presence of contamination. Simulation results and analysis of real sensor data provided by MineSense suggest that our method can effectively detect the presence of contamination with a low false detection rate.

Item Metadata

Title	Robust and sparse regression in the presence of cellwise and casewise contamination with application in data quality modelling
Creator	McGuinness, Glenn
Publisher	University of British Columbia
Date Issued	2020
Description	This thesis considers the problem of robust and sparse estimation of linear regression parameters in data with structural and independent contamination. Independent outliers can propagate in data with relatively large numbers of dimensions, resulting in a high fraction of observations with at least one outlying cells. Recent work has shown that traditional robust regression methods are not highly robust to such outliers. We investigate the application of Robust Least Angle Regression (RLARS) to data with independent contamination. We also propose two modified versions of RLARS to further improve its performance. The first method applies RLARS to data which has been filtered of independent outliers. The second method performs RLARS with the Lasso modification for Least Angle Regression (LARS). Extensive simulations show that RLARS is resilient to structural and independent contamination. Compared with RLARS, simulation results show that the first modified version has significantly improved robustness to independent contamination and the second modified version has improved robustness when there are a large number of predictors. We also consider the application of the proposed methods to data quality modelling in a case study for MineSense Technologies Ltd. (MineSense). MineSense develops sensor packages for use in the harsh conditions of an active mine. To maintain high system availability and performance, data must be monitored for a deterioration in sensor health or a change in the data generating process, such as a change in ore body, which can manifest as outliers. We pose the problem of contamination detection, the identification of whether a dataset contains outliers, as a distinct problem from outlier detection, the identification of which cases or cells are outliers. We propose a contamination detection method based on the comparison of robust and non-robust linear regression estimates. When outliers are present, the robust and non-robust estimates differ significantly, indicating the presence of contamination. Simulation results and analysis of real sensor data provided by MineSense suggest that our method can effectively detect the presence of contamination with a low false detection rate.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2020-04-09
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0389791
URI	http://hdl.handle.net/2429/73971
Degree	Master of Science - MSc
Program	Statistics
Affiliation	Science, Faculty of; Statistics, Department of
Degree Grantor	University of British Columbia
Graduation Date	2020-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Robust and sparse regression in the presence of cellwise and casewise contamination with application in data quality modelling McGuinness, Glenn

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights