1 + 1
[1] 2
You can repeat quantile encoding Chapter 33, using using different quantiles for more information extraction, e.i. with 0.25, 0.5, and 0.75 quantile. This is called summary encoding.
One of the downsides of quantile encoding is that you need to pick or tune to find a good quantile. Summary encoding curcomvents this issue by calculating a lot of quantiles at the same time.
Not yet implemented
1 + 1
[1] 2
We are using the ames
data set for examples. {category_encoders} provided the SummaryEncoder()
method we can use.
from feazdata import ames
from sklearn.compose import ColumnTransformer
from category_encoders.quantile_encoder import SummaryEncoder
= ColumnTransformer(
ct 'summary', SummaryEncoder(), ['MS_Zoning'])],
[(="passthrough")
remainder
=ames[["Sale_Price"]].values.flatten()) ct.fit(ames, y
ColumnTransformer(remainder='passthrough', transformers=[('summary', SummaryEncoder(), ['MS_Zoning'])])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
ColumnTransformer(remainder='passthrough', transformers=[('summary', SummaryEncoder(), ['MS_Zoning'])])
['MS_Zoning']
SummaryEncoder()
['MS_SubClass', 'Lot_Frontage', 'Lot_Area', 'Street', 'Alley', 'Lot_Shape', 'Land_Contour', 'Utilities', 'Lot_Config', 'Land_Slope', 'Neighborhood', 'Condition_1', 'Condition_2', 'Bldg_Type', 'House_Style', 'Overall_Cond', 'Year_Built', 'Year_Remod_Add', 'Roof_Style', 'Roof_Matl', 'Exterior_1st', 'Exterior_2nd', 'Mas_Vnr_Type', 'Mas_Vnr_Area', 'Exter_Cond', 'Foundation', 'Bsmt_Cond', 'Bsmt_Exposure', 'BsmtFin_Type_1', 'BsmtFin_SF_1', 'BsmtFin_Type_2', 'BsmtFin_SF_2', 'Bsmt_Unf_SF', 'Total_Bsmt_SF', 'Heating', 'Heating_QC', 'Central_Air', 'Electrical', 'First_Flr_SF', 'Second_Flr_SF', 'Gr_Liv_Area', 'Bsmt_Full_Bath', 'Bsmt_Half_Bath', 'Full_Bath', 'Half_Bath', 'Bedroom_AbvGr', 'Kitchen_AbvGr', 'TotRms_AbvGrd', 'Functional', 'Fireplaces', 'Garage_Type', 'Garage_Finish', 'Garage_Cars', 'Garage_Area', 'Garage_Cond', 'Paved_Drive', 'Wood_Deck_SF', 'Open_Porch_SF', 'Enclosed_Porch', 'Three_season_porch', 'Screen_Porch', 'Pool_Area', 'Pool_QC', 'Fence', 'Misc_Feature', 'Misc_Val', 'Mo_Sold', 'Year_Sold', 'Sale_Type', 'Sale_Condition', 'Sale_Price', 'Longitude', 'Latitude']
passthrough
ct.transform(ames)
summary__MS_Zoning_25 ... remainder__Latitude
0 137496.482 ... 42.054
1 111612.500 ... 42.053
2 137496.482 ... 42.053
3 137496.482 ... 42.051
4 137496.482 ... 42.061
... ... ... ...
2925 137496.482 ... 41.989
2926 137496.482 ... 41.988
2927 137496.482 ... 41.987
2928 137496.482 ... 41.991
2929 137496.482 ... 41.989
[2930 rows x 75 columns]