30  James-Stein Encoding

The James-Stein encoding method is another variation of target encoding as seen in Chapter 23. This page will explain how the James-Stein encoding is different than target encoding, and it is thus encouraged to read that chapter first.

The main difference between James-Stein and target encoding is the way it handles the shrinkage. It uses the variance of the groups and the global variance to denote the amount of shrinkage to apply. If the variance of the group is larger than the whole, then we pull the estimate close to the overall mean, if the group variance is smaller than the whole then we don’t pull as much.

For the following equation

\[ JS_i = (1 - B_i) * \text{mean}(y_i) + B * \text{mean}(y) \]

we have that \(JS_i\) is the James-Stein estimate for the \(i\)’th group, with \(\text{mean}(y_i)\) being the mean of the \(i\)’th group of the target, and \(\text{mean}(y)\) is the overall mean of the target.

We now need to find \(B_i\) which is our amount of shrinkage for each group.

\[ B_i = \dfrac{\text{var}(y_i)}{\text{var}(y_i) + \text{var}(y)} \]

Which when put into words are expressed as so.

\[ B_i = \dfrac{\text{group variance}}{\text{group variance} + \text{overall variance}} \]

Since variances are non-negative, the value of \(B_i\) is bounded between 0 and 1, with it being 0 when the group variance is 0 and tending towards 1 when the group variance is larger than the overall variance.

All of the other considerations we have with target encoding apply to this method. There is no clear-cut reason why you should pick James-Stein over target encoding or vice versa. Trying both and seeing how it does is recommended.

30.2 Pros and Cons

30.2.1 Pros

  • Can deal with categorical variables with many levels
  • Can deal with unseen levels in a sensible way
  • Runs fast with sensible shrinkage
  • Less prone to overfitting that unkinked target encoding

30.2.2 Cons

  • Only defined for normal distributions. Unsure whether this matters

30.3 R Examples

Has not yet been implemented.

See https://github.com/EmilHvitfeldt/feature-engineering-az/issues/40 for progress.

30.4 Python Examples

We are using the ames data set for examples. {category_encoders} provided the JamesSteinEncoder() method we can use.

from feazdata import ames
from sklearn.compose import ColumnTransformer
from category_encoders.james_stein import JamesSteinEncoder

ct = ColumnTransformer(
    [('jamesstein', JamesSteinEncoder(), ['MS_Zoning'])], 
    remainder="passthrough")

ct.fit(ames, y=ames[["Sale_Price"]].values.flatten())
ColumnTransformer(remainder='passthrough',
                  transformers=[('jamesstein', JamesSteinEncoder(),
                                 ['MS_Zoning'])])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ct.transform(ames)
      jamesstein__MS_Zoning  ... remainder__Latitude
0                187726.407  ...              42.054
1                141453.434  ...              42.053
2                187726.407  ...              42.053
3                187726.407  ...              42.051
4                187726.407  ...              42.061
...                     ...  ...                 ...
2925             187726.407  ...              41.989
2926             187726.407  ...              41.988
2927             187726.407  ...              41.987
2928             187726.407  ...              41.991
2929             187726.407  ...              41.989

[2930 rows x 74 columns]