Beware Default Random Forest Importances

Important reading for anyone using linear model coefficients or tree-based feature importance from Terence Parr et al.
Published

October 20, 2018

Modified

June 5, 2024

Original Blog Post: Beware Default Random Forest Importances
Authors: Terence Parr, Kerem Turgutlu, Christopher Csiszar, Jeremy Howard
Published: 2018-10-20

My summary:

If there is one article on this website that I would recommend the most, it’s this one.

I don’t know how common it is for people to use default feature importances, or coefficients from linear models, as proxies for the real impact of a feature on the target variable, but there are issues that need to be considered.

This article goes through those issues, and I think explains well how the default feature importance computation works. There’s also lots of helpful links sprinkled throughout the post.

At the very end, they mention the following:

Extremely randomized trees, at least in theory, do not suffer from this problem. Better still, they’re generally faster to train that RFs, and more accurate.

I wonder if anyone has looked into this last point in more depth.

Here is the link again to their blog post for more details: Beware Default Random Forest Importances