1 + 1
[1] 2
Quantile encoding (Mougan et al. 2021), is a reimagined version of target encoding Chapter 23 and M-estimator encoding Chapter 31 that uses quantiles instead of means and M regulatization from M-estimator.
Whereas target encoding uses the mean as an aggregation function, quantile encoding uses any quantile as its aggregation function. Most of the things we know about target encoding are also true for quantile encoding. The differences come with how quantiles differ from means. Quantiles are generally more robust to outliers, for quantiles away from the end. This same pattern is mirrored in quantile encoding.
Quantile encoding is suggested to be paired with M-estimator style regularization to deal with the issue of having smaller groups.
The following formula is used to calculate the quantile encodings.
\[ QE_i = \dfrac{q(category_i) \cdot n_i + q(whole) \cdot M}{n_i + M} \]
\(QE_i\) is the encoding value for the \(i\)βth category. \(q(category_i)\) is the quantile of the values within the \(i\)βth category, \(q(whole)\) is the quantile of the whole data set. \(n_i\) is the number of observations in the \(i\)βth category and \(M\) is the hyperparameter \(M\) that handles the regularization.
In essense we have 2 hyper parameters for this style on encoding, one is \(M\) which we very much has to tune, and the other one is the quantile of choice. We could set the quantile to specific values, such as 0.5 for median, but tuning it is likely to give better results. But this again is a trade-off between computational time and performance.
Not yet implemented
1 + 1
[1] 2
We are using the ames
data set for examples. {category_encoders} provided the QuantileEncoder()
method we can use.
from feazdata import ames
from sklearn.compose import ColumnTransformer
from category_encoders.quantile_encoder import QuantileEncoder
= ColumnTransformer(
ct 'quantile', QuantileEncoder(), ['MS_Zoning'])],
[(="passthrough")
remainder
=ames[["Sale_Price"]].values.flatten()) ct.fit(ames, y
ColumnTransformer(remainder='passthrough', transformers=[('quantile', QuantileEncoder(), ['MS_Zoning'])])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
ColumnTransformer(remainder='passthrough', transformers=[('quantile', QuantileEncoder(), ['MS_Zoning'])])
['MS_Zoning']
QuantileEncoder()
['MS_SubClass', 'Lot_Frontage', 'Lot_Area', 'Street', 'Alley', 'Lot_Shape', 'Land_Contour', 'Utilities', 'Lot_Config', 'Land_Slope', 'Neighborhood', 'Condition_1', 'Condition_2', 'Bldg_Type', 'House_Style', 'Overall_Cond', 'Year_Built', 'Year_Remod_Add', 'Roof_Style', 'Roof_Matl', 'Exterior_1st', 'Exterior_2nd', 'Mas_Vnr_Type', 'Mas_Vnr_Area', 'Exter_Cond', 'Foundation', 'Bsmt_Cond', 'Bsmt_Exposure', 'BsmtFin_Type_1', 'BsmtFin_SF_1', 'BsmtFin_Type_2', 'BsmtFin_SF_2', 'Bsmt_Unf_SF', 'Total_Bsmt_SF', 'Heating', 'Heating_QC', 'Central_Air', 'Electrical', 'First_Flr_SF', 'Second_Flr_SF', 'Gr_Liv_Area', 'Bsmt_Full_Bath', 'Bsmt_Half_Bath', 'Full_Bath', 'Half_Bath', 'Bedroom_AbvGr', 'Kitchen_AbvGr', 'TotRms_AbvGrd', 'Functional', 'Fireplaces', 'Garage_Type', 'Garage_Finish', 'Garage_Cars', 'Garage_Area', 'Garage_Cond', 'Paved_Drive', 'Wood_Deck_SF', 'Open_Porch_SF', 'Enclosed_Porch', 'Three_season_porch', 'Screen_Porch', 'Pool_Area', 'Pool_QC', 'Fence', 'Misc_Feature', 'Misc_Val', 'Mo_Sold', 'Year_Sold', 'Sale_Type', 'Sale_Condition', 'Sale_Price', 'Longitude', 'Latitude']
passthrough
ct.transform(ames)
quantile__MS_Zoning ... remainder__Latitude
0 171994.723 ... 42.054
1 140714.286 ... 42.053
2 171994.723 ... 42.053
3 171994.723 ... 42.051
4 171994.723 ... 42.061
... ... ... ...
2925 171994.723 ... 41.989
2926 171994.723 ... 41.988
2927 171994.723 ... 41.987
2928 171994.723 ... 41.991
2929 171994.723 ... 41.989
[2930 rows x 74 columns]