1+1
[1] 2
The James-Stein encoding method is another variation of target encoding as seen in Chapter 23. This page will explain how the James-Stein encoding is different than target encoding, and it is thus encouraged to read that chapter first.
The main difference between James-Stein and target encoding is the way it handles the shrinkage. It uses the variance of the groups and the global variance to denote the amount of shrinkage to apply. If the variance of the group is larger than the whole, then we pull the estimate close to the overall mean, if the group variance is smaller than the whole then we donβt pull as much.
For the following equation
\[ JS_i = (1 - B_i) * \text{mean}(y_i) + B * \text{mean}(y) \]
we have that \(JS_i\) is the James-Stein estimate for the \(i\)βth group, with \(\text{mean}(y_i)\) being the mean of the \(i\)βth group of the target, and \(\text{mean}(y)\) is the overall mean of the target.
We now need to find \(B_i\) which is our amount of shrinkage for each group.
\[ B_i = \dfrac{\text{var}(y_i)}{\text{var}(y_i) + \text{var}(y)} \]
Which when put into words are expressed as so.
\[ B_i = \dfrac{\text{group variance}}{\text{group variance} + \text{overall variance}} \]
Since variances are non-negative, the value of \(B_i\) is bounded between 0 and 1, with it being 0 when the group variance is 0 and tending towards 1 when the group variance is larger than the overall variance.
All of the other considerations we have with target encoding apply to this method. There is no clear-cut reason why you should pick James-Stein over target encoding or vice versa. Trying both and seeing how it does is recommended.
Not yet implemented
1+1
[1] 2
We are using the ames
data set for examples. {category_encoders} provided the JamesSteinEncoder()
method we can use.
from feazdata import ames
from sklearn.compose import ColumnTransformer
from category_encoders.james_stein import JamesSteinEncoder
= ColumnTransformer(
ct 'jamesstein', JamesSteinEncoder(), ['MS_Zoning'])],
[(="passthrough")
remainder
=ames[["Sale_Price"]].values.flatten()) ct.fit(ames, y
ColumnTransformer(remainder='passthrough', transformers=[('jamesstein', JamesSteinEncoder(), ['MS_Zoning'])])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
ColumnTransformer(remainder='passthrough', transformers=[('jamesstein', JamesSteinEncoder(), ['MS_Zoning'])])
['MS_Zoning']
JamesSteinEncoder()
['MS_SubClass', 'Lot_Frontage', 'Lot_Area', 'Street', 'Alley', 'Lot_Shape', 'Land_Contour', 'Utilities', 'Lot_Config', 'Land_Slope', 'Neighborhood', 'Condition_1', 'Condition_2', 'Bldg_Type', 'House_Style', 'Overall_Cond', 'Year_Built', 'Year_Remod_Add', 'Roof_Style', 'Roof_Matl', 'Exterior_1st', 'Exterior_2nd', 'Mas_Vnr_Type', 'Mas_Vnr_Area', 'Exter_Cond', 'Foundation', 'Bsmt_Cond', 'Bsmt_Exposure', 'BsmtFin_Type_1', 'BsmtFin_SF_1', 'BsmtFin_Type_2', 'BsmtFin_SF_2', 'Bsmt_Unf_SF', 'Total_Bsmt_SF', 'Heating', 'Heating_QC', 'Central_Air', 'Electrical', 'First_Flr_SF', 'Second_Flr_SF', 'Gr_Liv_Area', 'Bsmt_Full_Bath', 'Bsmt_Half_Bath', 'Full_Bath', 'Half_Bath', 'Bedroom_AbvGr', 'Kitchen_AbvGr', 'TotRms_AbvGrd', 'Functional', 'Fireplaces', 'Garage_Type', 'Garage_Finish', 'Garage_Cars', 'Garage_Area', 'Garage_Cond', 'Paved_Drive', 'Wood_Deck_SF', 'Open_Porch_SF', 'Enclosed_Porch', 'Three_season_porch', 'Screen_Porch', 'Pool_Area', 'Pool_QC', 'Fence', 'Misc_Feature', 'Misc_Val', 'Mo_Sold', 'Year_Sold', 'Sale_Type', 'Sale_Condition', 'Sale_Price', 'Longitude', 'Latitude']
passthrough
ct.transform(ames)
jamesstein__MS_Zoning ... remainder__Latitude
0 187726.407 ... 42.054
1 141453.434 ... 42.053
2 187726.407 ... 42.053
3 187726.407 ... 42.051
4 187726.407 ... 42.061
... ... ... ...
2925 187726.407 ... 41.989
2926 187726.407 ... 41.988
2927 187726.407 ... 41.987
2928 187726.407 ... 41.991
2929 187726.407 ... 41.989
[2930 rows x 74 columns]