BIRS Workshop Lecture Videos

Banff International Research Station Logo

BIRS Workshop Lecture Videos

Selection of Variables and Functional Forms in Multivariable Analysis: Current Issues and Future Directions Harrell, Frank

Description

This talk begins with a contrast of exploratory data analysis (a la Tukey) and formal analysis. Challenges of "too many variables and too few subjects" are briefly discussed in this context. The discussion turns to ways in which variable selection is misleading, contrasting feature selection with successful "kitchen sink" machine learning approaches. This leads to a statistical analogy of Maxwell's demon in which some of the information in the system is "stolen" by feature selection. An example in which the bootstrap is useful in quantifying the difficulty of the task will be shown; this involves getting confidence intervals for importance ranks for predictors. Instead of feature selection, pooled tests of overlapping predictors is advocated for assisting in model interpretation.

Some issues relating to fitting predictor functional form will be addressed, and the statistical advantages of pre-specifying knot locations in regression splines will be outlined. Many statistical analysts are unaware that modern methods for high-dimensional data such as lasso and elastic net frequently trade one set of problems for another, especially related to predictor transformations. This talk attempts to bring these issues more in the open, mentioning how a Bayesian might operate. Finally, some future directions in interaction modeling will be covered.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International