1+1
[1] 2
The M-estimator encoding method is another variation of target encoding, as seen in Chapter 23. This page will explain how M-estimator encoding is different from target encoding, so it is encouraged to read that chapter first.
The idea behind M-estimator encoding is the same as the other target encoding methods. But we are using a different mean, namely M-estimator which is a statistical estimator that is less influenced by extreme values in the target value.
We use the following formula to calculate the effect of each level.
\[ M_i = \dfrac{\text{count}(category_i) \cdot \text{mean}(category_i) + M \cdot \text{mean}(target)}{\text{count}(category_i) + M} \]
Note that it contains a hyperparameter \(M\). This value has to be tuned, and will thus invite data leakage if not tuned correctly.
The method by itself doesnβt perform shrinkage so you run into issues associated with lack of shrinkage.
Not yet implemented
1+1
[1] 2
We are using the ames
data set for examples. {category_encoders} provided the MEstimateEncoder()
method we can use.
from feazdata import ames
from sklearn.compose import ColumnTransformer
from category_encoders.m_estimate import MEstimateEncoder
= ColumnTransformer(
ct 'mestimate', MEstimateEncoder(), ['MS_Zoning'])],
[(="passthrough")
remainder
=ames[["Sale_Price"]].values.flatten()) ct.fit(ames, y
ColumnTransformer(remainder='passthrough', transformers=[('mestimate', MEstimateEncoder(), ['MS_Zoning'])])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
ColumnTransformer(remainder='passthrough', transformers=[('mestimate', MEstimateEncoder(), ['MS_Zoning'])])
['MS_Zoning']
MEstimateEncoder()
['MS_SubClass', 'Lot_Frontage', 'Lot_Area', 'Street', 'Alley', 'Lot_Shape', 'Land_Contour', 'Utilities', 'Lot_Config', 'Land_Slope', 'Neighborhood', 'Condition_1', 'Condition_2', 'Bldg_Type', 'House_Style', 'Overall_Cond', 'Year_Built', 'Year_Remod_Add', 'Roof_Style', 'Roof_Matl', 'Exterior_1st', 'Exterior_2nd', 'Mas_Vnr_Type', 'Mas_Vnr_Area', 'Exter_Cond', 'Foundation', 'Bsmt_Cond', 'Bsmt_Exposure', 'BsmtFin_Type_1', 'BsmtFin_SF_1', 'BsmtFin_Type_2', 'BsmtFin_SF_2', 'Bsmt_Unf_SF', 'Total_Bsmt_SF', 'Heating', 'Heating_QC', 'Central_Air', 'Electrical', 'First_Flr_SF', 'Second_Flr_SF', 'Gr_Liv_Area', 'Bsmt_Full_Bath', 'Bsmt_Half_Bath', 'Full_Bath', 'Half_Bath', 'Bedroom_AbvGr', 'Kitchen_AbvGr', 'TotRms_AbvGrd', 'Functional', 'Fireplaces', 'Garage_Type', 'Garage_Finish', 'Garage_Cars', 'Garage_Area', 'Garage_Cond', 'Paved_Drive', 'Wood_Deck_SF', 'Open_Porch_SF', 'Enclosed_Porch', 'Three_season_porch', 'Screen_Porch', 'Pool_Area', 'Pool_QC', 'Fence', 'Misc_Feature', 'Misc_Val', 'Mo_Sold', 'Year_Sold', 'Sale_Type', 'Sale_Condition', 'Sale_Price', 'Longitude', 'Latitude']
passthrough
ct.transform(ames)
mestimate__MS_Zoning ... remainder__Latitude
0 191278.640 ... 42.054
1 138004.645 ... 42.053
2 191278.640 ... 42.053
3 191278.640 ... 42.051
4 191278.640 ... 42.061
... ... ... ...
2925 191278.640 ... 41.989
2926 191278.640 ... 41.988
2927 191278.640 ... 41.987
2928 191278.640 ... 41.991
2929 191278.640 ... 41.989
[2930 rows x 74 columns]