- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- BIRS Workshop Lecture Videos /
- Inference and Variable Selection for Random Forests
Open Collections
BIRS Workshop Lecture Videos
BIRS Workshop Lecture Videos
Inference and Variable Selection for Random Forests Mentch, Lucas
Description
Despite the success of tree-¬based learning algorithms (bagging, boosting, random forests), these methods are often seen as prediction-¬only tools whereby the interpretability and intuition of traditional statistical models is sacrificed for predictive accuracy. We present an overview of recent work that suggests this black-¬box perspective need not be the case. We consider a general resampling scheme in which predictions are averaged across base-learners built with subsamples and demonstrate that the resulting estimator belongs to an extended class of U-¬statistics. As such, a corresponding central limit theorem is developed allowing for confidence intervals to accompany predictions, as well as formal hypothesis tests for variable significance and additivity. The test statistics proposed can also be extended to produce consistent measures of variable importance. In particular, we propose to extend the typical randomized node-wise feature availability to tree-wise feature availability, allowing for hold-out variable importance measures that, unlike traditional out-of-bag measures, are robust to correlation structures between
Item Metadata
Title |
Inference and Variable Selection for Random Forests
|
Creator | |
Publisher |
Banff International Research Station for Mathematical Innovation and Discovery
|
Date Issued |
2018-01-15T11:17
|
Description |
Despite the success of tree-¬based learning algorithms (bagging, boosting, random forests), these methods are often seen as prediction-¬only tools whereby the interpretability and intuition of traditional statistical models is sacrificed for predictive accuracy. We present an overview of recent work that suggests this black-¬box perspective need not be the case. We consider a general resampling scheme in which predictions are averaged across base-learners built with subsamples and demonstrate that the resulting estimator belongs to an extended class of U-¬statistics. As such, a corresponding central limit theorem is developed allowing for confidence intervals to accompany predictions, as well as formal hypothesis tests for variable significance and additivity. The test statistics proposed can also be extended to produce consistent measures of variable importance. In particular, we propose to extend the typical randomized node-wise feature availability to tree-wise feature availability, allowing for hold-out variable importance measures that, unlike traditional out-of-bag measures, are robust to correlation structures between
|
Extent |
49 minutes
|
Subject | |
Type | |
File Format |
video/mp4
|
Language |
eng
|
Notes |
Author affiliation: University of Pittsburgh
|
Series | |
Date Available |
2018-07-15
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0368934
|
URI | |
Affiliation | |
Peer Review Status |
Unreviewed
|
Scholarly Level |
Faculty
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International