Statistical and machine learning classification methods for credit ratings

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Statistical and machine learning classification methods for credit ratings Hu, Xixi

Abstract

Credit rating is an ordinal categorical label that serves as an important measure of a financial institution’s credit worthiness. It is frequently used to decide whether or not to grant loans as well as how much interest to charge. Companies with higher credit ratings often enjoy lower interest rate and more flexibility in obtaining loans. Due to the increased competition in the lending market, there is renewed interest in the business community in applying statistical and machine learning methods to assign credit ratings. The challenge of adapting and generalizing these methods often lies in understanding and interpreting them in addition to matching ratings accurately. Our goal is to compare the classification performance and interpretability of four statistical learning methods on a credit rating dataset from the industry, where the rating variable comes from human expert opinions. We fit the ordinal regression, ordinal gradient boosting, multinomial gradient boosting and random forest methods with the goal of finding an interpretable method that can replicate the human expert ratings as closely as possible. We find that while the linear ordinal regression is the most interpretable, it fails to achieve high classification accuracy during cross-validation. Furthermore, the ordinal models (ordinal regression and ordinal gradient boosting) produce significant amount of negative fitted probabilities in practice due to the lack of numerical constraints. While ordinal gradient boosting and random forest perform the best in our three measures of classification accuracy: perfect match rate, within one-class match rate and 80% prediction intervals, ordinal gradient boosting produces high proportions of negative values and non-unimodality in the fitted probability mass function. Thus we choose random forest as the most preferred method and focus on its interpretation using variable importance ranking, partial derivative of the probability mass function and cumulative probability function, as well as local interpretable model-agnostic explanation plots.

Item Metadata

Title	Statistical and machine learning classification methods for credit ratings
Creator	Hu, Xixi
Publisher	University of British Columbia
Date Issued	2018
Description	Credit rating is an ordinal categorical label that serves as an important measure of a financial institution’s credit worthiness. It is frequently used to decide whether or not to grant loans as well as how much interest to charge. Companies with higher credit ratings often enjoy lower interest rate and more flexibility in obtaining loans. Due to the increased competition in the lending market, there is renewed interest in the business community in applying statistical and machine learning methods to assign credit ratings. The challenge of adapting and generalizing these methods often lies in understanding and interpreting them in addition to matching ratings accurately. Our goal is to compare the classification performance and interpretability of four statistical learning methods on a credit rating dataset from the industry, where the rating variable comes from human expert opinions. We fit the ordinal regression, ordinal gradient boosting, multinomial gradient boosting and random forest methods with the goal of finding an interpretable method that can replicate the human expert ratings as closely as possible. We find that while the linear ordinal regression is the most interpretable, it fails to achieve high classification accuracy during cross-validation. Furthermore, the ordinal models (ordinal regression and ordinal gradient boosting) produce significant amount of negative fitted probabilities in practice due to the lack of numerical constraints. While ordinal gradient boosting and random forest perform the best in our three measures of classification accuracy: perfect match rate, within one-class match rate and 80% prediction intervals, ordinal gradient boosting produces high proportions of negative values and non-unimodality in the fitted probability mass function. Thus we choose random forest as the most preferred method and focus on its interpretation using variable importance ranking, partial derivative of the probability mass function and cumulative probability function, as well as local interpretable model-agnostic explanation plots.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2018-08-21
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0371167
URI	http://hdl.handle.net/2429/66863
Degree (Theses)	Master of Science - MSc
Program (Theses)	Statistics
Affiliation	Science, Faculty of; Statistics, Department of
Degree Grantor	University of British Columbia
Graduation Date	2018-09
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Statistical and machine learning classification methods for credit ratings Hu, Xixi

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights