Back to the Main Page

On Feature Importance in Random Forest

As a part of the Lead Scoring project at Spreedly, I used feature importance of Random Forest (RF) to determine the important features that lead trial sign-ups to conversion. However, due to the lack of robustness and the fact that the results did not match the intuition from the domain knowledge, I decided to dig deeper into the problem where I learned about the extensive research on the RF’ feature importance. This note, that I try to keep short, is a summary of the problem and a couple of solutions discussed in literature.

Please note this post is a summary of the feature importance topic discussed in the three references below for future quick references.

Problem

Feature Importance in Random Forest

Feature Importance in Random Forest Is Biased

The Root of the Bias

1. Gini Split Criterion

2. Bootstrapping

3. Collinearity

Solution

1. cForest

2. Permutation Importance

3. Drop-column Importance

4. Repeated Permutation

In Practice

Back to the Main Page