- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Statistical and machine learning classification methods...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Statistical and machine learning classification methods for credit ratings Hu, Xixi
Abstract
Credit rating is an ordinal categorical label that serves as an important measure of a financial institution’s credit worthiness. It is frequently used to decide whether or not to grant loans as well as how much interest to charge. Companies with higher credit ratings often enjoy lower interest rate and more flexibility in obtaining loans. Due to the increased competition in the lending market, there is renewed interest in the business community in applying statistical and machine learning methods to assign credit ratings. The challenge of adapting and generalizing these methods often lies in understanding and interpreting them in addition to matching ratings accurately. Our goal is to compare the classification performance and interpretability of four statistical learning methods on a credit rating dataset from the industry, where the rating variable comes from human expert opinions. We fit the ordinal regression, ordinal gradient boosting, multinomial gradient boosting and random forest methods with the goal of finding an interpretable method that can replicate the human expert ratings as closely as possible. We find that while the linear ordinal regression is the most interpretable, it fails to achieve high classification accuracy during cross-validation. Furthermore, the ordinal models (ordinal regression and ordinal gradient boosting) produce significant amount of negative fitted probabilities in practice due to the lack of numerical constraints. While ordinal gradient boosting and random forest perform the best in our three measures of classification accuracy: perfect match rate, within one-class match rate and 80% prediction intervals, ordinal gradient boosting produces high proportions of negative values and non-unimodality in the fitted probability mass function. Thus we choose random forest as the most preferred method and focus on its interpretation using variable importance ranking, partial derivative of the probability mass function and cumulative probability function, as well as local interpretable model-agnostic explanation plots.
Item Metadata
Title |
Statistical and machine learning classification methods for credit ratings
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2018
|
Description |
Credit rating is an ordinal categorical label that serves as an important measure of
a financial institution’s credit worthiness. It is frequently used to decide whether or
not to grant loans as well as how much interest to charge. Companies with higher
credit ratings often enjoy lower interest rate and more flexibility in obtaining loans.
Due to the increased competition in the lending market, there is renewed interest
in the business community in applying statistical and machine learning methods
to assign credit ratings. The challenge of adapting and generalizing these methods
often lies in understanding and interpreting them in addition to matching ratings
accurately.
Our goal is to compare the classification performance and interpretability of
four statistical learning methods on a credit rating dataset from the industry, where
the rating variable comes from human expert opinions. We fit the ordinal regression,
ordinal gradient boosting, multinomial gradient boosting and random forest
methods with the goal of finding an interpretable method that can replicate the human
expert ratings as closely as possible. We find that while the linear ordinal
regression is the most interpretable, it fails to achieve high classification accuracy
during cross-validation. Furthermore, the ordinal models (ordinal regression and
ordinal gradient boosting) produce significant amount of negative fitted probabilities
in practice due to the lack of numerical constraints. While ordinal gradient
boosting and random forest perform the best in our three measures of classification
accuracy: perfect match rate, within one-class match rate and 80% prediction intervals,
ordinal gradient boosting produces high proportions of negative values and
non-unimodality in the fitted probability mass function. Thus we choose random
forest as the most preferred method and focus on its interpretation using variable importance ranking, partial derivative of the probability mass function and cumulative
probability function, as well as local interpretable model-agnostic explanation
plots.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2018-08-21
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0371167
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2018-09
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International