Boosting for regression problems with complex data

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Boosting for regression problems with complex data Ju, Xiaomeng

Abstract

Boosting is a highly flexible and powerful approach when it comes to making predictions in non-parametric settings. By constructing an estimator using a combination of “base learners”, it can achieve high prediction accuracy and scale to data with many explanatory variables. In spite of the popularity and practical success of boosting algorithms, there is a lack of focus on its generalizations to “complex data”, such as data with “outliers” or functional variables. For data like these, we develop new boosting algorithms that fit in the framework of gradient boosting machines (GBM). We illustrate our findings on simulated and real datasets and developed openly available R packages implementing our proposals. For data contaminated with outliers, we propose a two-stage boosting algorithm similar to what is done for robust linear MM-regression: it first minimizes a robust residual scale estimator and then improves it by optimizing a bounded loss function. Unlike previous robust boosting proposals this approach does not require computing an ad hoc residual scale estimator in each boosting iteration. We address the issue of the initialization of our boosting algorithm and provide a permutation-based procedure to robustly measure the importance of each variable. For data containing functional predictors, we propose a boosting algorithm that uses tree “base-learners” that are constructed with multiple projections. Our proposal incorporates possible interactions between indices, making it capable of approximating complex regression functions. In addition, our estimator is constructed using relatively simple regression trees, which are notably easier to compute than multi-dimensional kernel smoothers used in other proposals. Finally, we extend the proposal above to robust functional regression in the presence of outliers, which may appear in the measurements of the response, the functional predictors, or both. We explore robust boosting algorithms derived from M-estimators or MM-estimators respectively and make suggestions on which method to use based on the type of contamination and computing budget.

Item Metadata

Title	Boosting for regression problems with complex data
Creator	Ju, Xiaomeng
Supervisor	Salibián-Barrera, Matías
Publisher	University of British Columbia
Date Issued	2022
Description	Boosting is a highly flexible and powerful approach when it comes to making predictions in non-parametric settings. By constructing an estimator using a combination of “base learners”, it can achieve high prediction accuracy and scale to data with many explanatory variables. In spite of the popularity and practical success of boosting algorithms, there is a lack of focus on its generalizations to “complex data”, such as data with “outliers” or functional variables. For data like these, we develop new boosting algorithms that fit in the framework of gradient boosting machines (GBM). We illustrate our findings on simulated and real datasets and developed openly available R packages implementing our proposals. For data contaminated with outliers, we propose a two-stage boosting algorithm similar to what is done for robust linear MM-regression: it first minimizes a robust residual scale estimator and then improves it by optimizing a bounded loss function. Unlike previous robust boosting proposals this approach does not require computing an ad hoc residual scale estimator in each boosting iteration. We address the issue of the initialization of our boosting algorithm and provide a permutation-based procedure to robustly measure the importance of each variable. For data containing functional predictors, we propose a boosting algorithm that uses tree “base-learners” that are constructed with multiple projections. Our proposal incorporates possible interactions between indices, making it capable of approximating complex regression functions. In addition, our estimator is constructed using relatively simple regression trees, which are notably easier to compute than multi-dimensional kernel smoothers used in other proposals. Finally, we extend the proposal above to robust functional regression in the presence of outliers, which may appear in the measurements of the response, the functional predictors, or both. We explore robust boosting algorithms derived from M-estimators or MM-estimators respectively and make suggestions on which method to use based on the type of contamination and computing budget.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2022-08-26
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0418448
URI	http://hdl.handle.net/2429/82606
Degree	Doctor of Philosophy - PhD
Program	Statistics
Affiliation	Science, Faculty of; Statistics, Department of
Degree Grantor	University of British Columbia
Graduation Date	2022-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Boosting for regression problems with complex data Ju, Xiaomeng

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights