"happy", "neutral", "neutral", "sad", "happy"
32 Thermometer Encoding
32.1 Thermometer Encoding
Thermometer encoding (also called Rank Hot Encoding) is a variation of Dummy Encoding. It is intended only for ordinal data.
Where one-hot encoding produces 1 for the current level and 0 for all other levels, thermometer encoding produces 1 for the current level and lesser levels and 0 for other levels.
Considering this short ordinal variable of emotions, we observe there are 3 unique values βsadβ < βneutralβ < βhappyβ. These values clearly have an order as listed.
There should be 3 columns one for each of the levels.
sad | neutral | happy |
---|---|---|
1 | 1 | 1 |
1 | 1 | 0 |
1 | 1 | 0 |
1 | 0 | 0 |
1 | 1 | 1 |
Notice how the happy instances have 1s all across and sad only has 1. You can think of this encoding as making this cumulative. Asking the question βis this emotion at least this Xβ.
While this method is often called rank hot encoding, you should use the dummy variant, since the first column produced by definition will be constant.
32.2 Pros and Cons
32.2.1 Pros
- explainable results
- fast calculations
32.2.2 Cons
- should only be used for ordinal data
32.3 R Examples
Has not yet been implemented.
See https://github.com/EmilHvitfeldt/feature-engineering-az/issues/40 for progress.
32.4 Python Examples
We are using the ames
data set for examples. {category_encoders} provided the RankHotEncoder()
method we can use.
from feazdata import ames
from sklearn.compose import ColumnTransformer
from category_encoders.rankhot import RankHotEncoder
= ColumnTransformer(
ct 'rankhot', RankHotEncoder(), ['MS_Zoning'])],
[(="passthrough")
remainder
ct.fit(ames)
ColumnTransformer(remainder='passthrough', transformers=[('rankhot', RankHotEncoder(), ['MS_Zoning'])])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ColumnTransformer(remainder='passthrough', transformers=[('rankhot', RankHotEncoder(), ['MS_Zoning'])])
['MS_Zoning']
RankHotEncoder()
['MS_SubClass', 'Lot_Frontage', 'Lot_Area', 'Street', 'Alley', 'Lot_Shape', 'Land_Contour', 'Utilities', 'Lot_Config', 'Land_Slope', 'Neighborhood', 'Condition_1', 'Condition_2', 'Bldg_Type', 'House_Style', 'Overall_Cond', 'Year_Built', 'Year_Remod_Add', 'Roof_Style', 'Roof_Matl', 'Exterior_1st', 'Exterior_2nd', 'Mas_Vnr_Type', 'Mas_Vnr_Area', 'Exter_Cond', 'Foundation', 'Bsmt_Cond', 'Bsmt_Exposure', 'BsmtFin_Type_1', 'BsmtFin_SF_1', 'BsmtFin_Type_2', 'BsmtFin_SF_2', 'Bsmt_Unf_SF', 'Total_Bsmt_SF', 'Heating', 'Heating_QC', 'Central_Air', 'Electrical', 'First_Flr_SF', 'Second_Flr_SF', 'Gr_Liv_Area', 'Bsmt_Full_Bath', 'Bsmt_Half_Bath', 'Full_Bath', 'Half_Bath', 'Bedroom_AbvGr', 'Kitchen_AbvGr', 'TotRms_AbvGrd', 'Functional', 'Fireplaces', 'Garage_Type', 'Garage_Finish', 'Garage_Cars', 'Garage_Area', 'Garage_Cond', 'Paved_Drive', 'Wood_Deck_SF', 'Open_Porch_SF', 'Enclosed_Porch', 'Three_season_porch', 'Screen_Porch', 'Pool_Area', 'Pool_QC', 'Fence', 'Misc_Feature', 'Misc_Val', 'Mo_Sold', 'Year_Sold', 'Sale_Type', 'Sale_Condition', 'Sale_Price', 'Longitude', 'Latitude']
passthrough
filter(regex="rankhot.*") ct.transform(ames).
rankhot__MS_Zoning_1 ... rankhot__MS_Zoning_7
0 1 ... 0
1 1 ... 0
2 1 ... 0
3 1 ... 0
4 1 ... 0
... ... ... ...
2925 1 ... 0
2926 1 ... 0
2927 1 ... 0
2928 1 ... 0
2929 1 ... 0
[2930 rows x 7 columns]