BIRS Workshop Lecture Videos

Banff International Research Station Logo

BIRS Workshop Lecture Videos

Inference and Variable Selection for Random Forests Mentch, Lucas


Despite the success of tree-¬based learning algorithms (bagging, boosting, random forests), these methods are often seen as prediction-¬only tools whereby the interpretability and intuition of traditional statistical models is sacrificed for predictive accuracy. We present an overview of recent work that suggests this black-¬box perspective need not be the case. We consider a general resampling scheme in which predictions are averaged across base-learners built with subsamples and demonstrate that the resulting estimator belongs to an extended class of U-¬statistics. As such, a corresponding central limit theorem is developed allowing for confidence intervals to accompany predictions, as well as formal hypothesis tests for variable significance and additivity. The test statistics proposed can also be extended to produce consistent measures of variable importance. In particular, we propose to extend the typical randomized node-wise feature availability to tree-wise feature availability, allowing for hold-out variable importance measures that, unlike traditional out-of-bag measures, are robust to correlation structures between

Item Media

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International