26  Leaf Encoding

Leaf encoding, also called decision tree encoding, is a method where a single decision tree fits using a target, typically the outcome, and a single categorical variable as the predictor. The encoding is then done by using the predictions of the tree to replace the categorical labels.

This should work in both classification and regression settings, but they serve different purposes. If used in a classification setting, we are replacing a categorial predictor with another categorical predictor with fewer levels. For regression settings, we have that the categorical predictor is replaced with a numeric variable. In some ways, this feels much like target encoding explored in Chapter 23.

Suppose we use leaf encoding on the MS_SubClass predictor of the ames data set, using the numeric target Sale_Price. A possible fitted tree on that data would yield the following encoding table.

leaf MS_SubClass
99353.69 One_Story_1945_and_Older
99353.69 One_and_Half_Story_Unfinished_All_Ages
99353.69 PUD_Multilevel_Split_Level_Foyer
144435.77 One_and_Half_Story_Finished_All_Ages
144435.77 Split_Foyer
144435.77 Two_Story_PUD_1946_and_Newer
144435.77 Split_or_Multilevel
144435.77 Duplex_All_Styles_and_Ages
144435.77 Two_Family_conversion_All_Styles_and_Ages
144435.77 Two_Story_1945_and_Older
144435.77 One_Story_with_Finished_Attic_All_Ages
144435.77 One_and_Half_Story_PUD_All_Ages
190646.00 One_Story_1946_and_Newer_All_Styles
190646.00 One_Story_PUD_1946_and_Newer
190646.00 Two_and_Half_Story_All_Ages
239364.29 Two_Story_1946_and_Newer

This table has 4 different values, meaning that the tree has 4 different leafs. Now prediction happens by using this lookup table.

Instead, let’s see what happens if we choose a categorical target. Using the same MS_SubClass predictor, but instead using the categorical variable Lot_Shape as the target.

leaf MS_SubClass
leaf1 One_Story_1946_and_Newer_All_Styles
leaf1 One_and_Half_Story_Finished_All_Ages
leaf1 Split_Foyer
leaf1 Two_Story_PUD_1946_and_Newer
leaf1 One_Story_1945_and_Older
leaf1 Duplex_All_Styles_and_Ages
leaf1 Two_Family_conversion_All_Styles_and_Ages
leaf1 One_and_Half_Story_Unfinished_All_Ages
leaf1 Two_Story_1945_and_Older
leaf1 Two_and_Half_Story_All_Ages
leaf1 One_Story_with_Finished_Attic_All_Ages
leaf1 PUD_Multilevel_Split_Level_Foyer
leaf1 One_and_Half_Story_PUD_All_Ages
leaf2 Two_Story_1946_and_Newer
leaf2 One_Story_PUD_1946_and_Newer
leaf2 Split_or_Multilevel

And we now have a mapping that takes 16 levels and compresses them into n_distinct(res$leaf) levels. We note two insights for the categorical target case. Firstly, the number of unique levels can’t exceed the number of levels in the target. Because it is not possible to predict a level that doesn’t exist for the target. Secondly, you will produce the same or fewer levels in your leaf. We saw earlier that it is possible to produce fewer. To produce the same about of levels, we would need a target with the same or more levels than the predictor and have each predictor level map to a different target level.

Since we are fitting a tree, it has the opportunity to be hyper-parameter-tuned, as the size and shape tree will affect the encoding. You will be fitting a different tree for each of the categorical variables you are encoding, and they likely won’t have the same optimal tree size. Here you have to make a choice. Either meticulously tune each tree in the broader scope of the model, or use decent defaults. The latter choice is likely the best one.

Lastly, this method doesn’t work with unseen levels as talked about in Chapter 17, as decision trees generally don’t have a way to handle unseen levels.

https://feature-engine.trainindata.com/en/1.7.x/user_guide/encoding/index.html#decision-tree-encoding

26.2 Pros and Cons

26.2.1 Pros

  • Produces a single column.

26.2.2 Cons

  • Doesn’t handle unseen levels.
  • Can be unstable, due to using a decision tree.
  • It may be overly simplistic.

26.3 R Examples

Has not yet been implemented.

See https://github.com/EmilHvitfeldt/feature-engineering-az/issues/40 for progress.

26.4 Python Examples

We are using the ames data set for examples. {feature_engine} provided the YeoJohnsonTransformer() that we can use.

from feazdata import ames
from sklearn.compose import ColumnTransformer
from feature_engine.encoding import DecisionTreeEncoder

ct = ColumnTransformer(
    [('treeEncoding', DecisionTreeEncoder(), ['MS_SubClass'])], 
    remainder="passthrough")

ct.fit(ames, y=ames[["Sale_Price"]].values.flatten())
ColumnTransformer(remainder='passthrough',
                  transformers=[('treeEncoding', DecisionTreeEncoder(),
                                 ['MS_SubClass'])])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ct.transform(ames).filter(regex="treeEncoding.*")
      treeEncoding__MS_SubClass
0                    187355.694
1                    187355.694
2                    187355.694
3                    187355.694
4                    239364.285
...                         ...
2925                 168009.364
2926                 187355.694
2927                 138618.386
2928                 187355.694
2929                 239364.285

[2930 rows x 1 columns]