32  Thermometer Encoding

Thermometer encoding (also called Rank Hot Encoding) is a variation of dummy encoding as seen in Chapter 18. It is intended only for ordinal data.

Where one-hot encoding produces 1 for the current level and 0 for all other levels, thermometer encoding produces 1 for the current level and lesser levels and 0 for other levels.

Considering this short ordinal variable of emotions, we observe there are 3 unique values β€œsad” < β€œneutral” < β€œhappy”. These values clearly have an order as listed.

[1] "happy"   "neutral" "neutral" "sad"     "happy"  

There should be 3 columns one for each of the levels.

     sad neutral happy
[1,]   1       1     1
[2,]   1       1     0
[3,]   1       1     0
[4,]   1       0     0
[5,]   1       1     1

Notice how the happy instances have 1s all across and sad only has 1. You can think of this encoding as making this cumulative. Asking the question β€œis this emotion at least this X”.

While this method is often called rank hot encoding, you should use the dummy variant, since the first column produced by definition will be constant.

32.2 Pros and Cons

32.2.1 Pros

  • explainable results
  • fast calculations

32.2.2 Cons

  • should only be used for ordinal data

32.3 R Examples

Not implemented yet

1 + 1
[1] 2

32.4 Python Examples

We are using the ames data set for examples. {category_encoders} provided the RankHotEncoder() method we can use.

from feazdata import ames
from sklearn.compose import ColumnTransformer
from category_encoders.rankhot import RankHotEncoder

ct = ColumnTransformer(
    [('rankhot', RankHotEncoder(), ['MS_Zoning'])], 
    remainder="passthrough")

ct.fit(ames)
ColumnTransformer(remainder='passthrough',
                  transformers=[('rankhot', RankHotEncoder(), ['MS_Zoning'])])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ct.transform(ames).filter(regex="rankhot.*")
      rankhot__MS_Zoning_1  ...  rankhot__MS_Zoning_7
0                        1  ...                     0
1                        1  ...                     0
2                        1  ...                     0
3                        1  ...                     0
4                        1  ...                     0
...                    ...  ...                   ...
2925                     1  ...                     0
2926                     1  ...                     0
2927                     1  ...                     0
2928                     1  ...                     0
2929                     1  ...                     0

[2930 rows x 7 columns]