32 Thermometer Encoding

32.1 Thermometer Encoding

Thermometer encoding (also called Rank Hot Encoding) is a variation of Dummy Encoding. It is intended only for ordinal data.

Where one-hot encoding produces 1 for the current level and 0 for all other levels, thermometer encoding produces 1 for the current level and lesser levels and 0 for other levels.

Considering this short ordinal variable of emotions, we observe there are 3 unique values “sad” < “neutral” < “happy”. These values clearly have an order as listed.

"happy", "neutral", "neutral", "sad", "happy"

There should be 3 columns one for each of the levels.

sad	neutral	happy
1	1	1
1	1	0
1	1	0
1	0	0
1	1	1

Notice how the happy instances have 1s all across and sad only has 1. You can think of this encoding as making this cumulative. Asking the question “is this emotion at least this X”.

While this method is often called rank hot encoding, you should use the dummy variant, since the first column produced by definition will be constant.

32.2 Pros and Cons

32.2.1 Pros

explainable results
fast calculations

32.2.2 Cons

should only be used for ordinal data

32.3 R Examples

Has not yet been implemented.

See https://github.com/EmilHvitfeldt/feature-engineering-az/issues/40 for progress.

32.4 Python Examples

We are using the ames data set for examples. {category_encoders} provided the RankHotEncoder() method we can use.

from feazdata import ames
from sklearn.compose import ColumnTransformer
from category_encoders.rankhot import RankHotEncoder

ct = ColumnTransformer(
    [('rankhot', RankHotEncoder(), ['MS_Zoning'])], 
    remainder="passthrough")

ct.fit(ames)

ColumnTransformer(remainder='passthrough',
                  transformers=[('rankhot', RankHotEncoder(), ['MS_Zoning'])])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

ct.transform(ames).filter(regex="rankhot.*")

      rankhot__MS_Zoning_1  ...  rankhot__MS_Zoning_7
0                        1  ...                     0
1                        1  ...                     0
2                        1  ...                     0
3                        1  ...                     0
4                        1  ...                     0
...                    ...  ...                   ...
2925                     1  ...                     0
2926                     1  ...                     0
2927                     1  ...                     0
2928                     1  ...                     0
2929                     1  ...                     0

[2930 rows x 7 columns]