29  Weight of Evidence Encoding

The Weight of Evidence (WOE) encoding method is a method that specifically works with a binary target variable and a categorical predictor. It has a background in the financial sector. Its main drawback is its reliance on a binary outcome, which is often common in that sector.

It works by taking calculating the logarithm of the odds ratio, as a quantification of the relationship between the categorical predictor and the binary target. The method assigns the target levels as the good outcome and one as the bad outcome. Or target and no target. This choice doesn’t matter beyond notation. Swapping these results in a sign change of the resulting numeric predictor. It uses the following formula:

\[ WOE_c = log\left( \frac{P(X = c | Y = 1)}{P(X = c | Y = 0)} \right) \]

Where c represents a given level of the categorical predictor. We read it as the probability of a specific observation having a given level when the target has one level over the other level.

We can run into bad values with this formula if there aren’t enough counts. If \(P(X = c | Y = 1) = 0\) we get undefined behavior when we take the logarithm, and if \(P(X = c | Y = 0) = 0\) we get a division by 0 problem. Both of these issues are typically handled with the use of a Laplace value. This value, typically quite small is added to the numerator and denominator to avoid these issues.

The resulting single numeric predictor takes non-infinite values. 0 means that according to the training data set, the category doesn’t have any information one way or another. Positive values mean a stronger relationship between the predictor and the β€œgood” outcome, and negative values mean a stronger relationship to the β€œbad” outcome. Missing values and unseen levels typically default to have WOE = 0 as we don’t have information about them. One could get information out of missing values, by treating it as another level.

The weight of evidence encoding has been reported to be effective for addressing imbalanced data sets, by capturing the minority class effectively.

Note

It is often stated that WOE can be used on numeric predictors by first discretizing the predictor and then applying this encoding. This is trivially true for all categorical methods but is not recommended as per Chapter 11.

29.2 Pros and Cons

29.2.1 Pros

29.2.2 Cons

29.3 R Examples

library(recipes)
library(embed)

data(ames, package = "modeldata")

rec_target <- recipe(Street ~ Neighborhood, data = ames) |>
  step_woe(Neighborhood, outcome = vars(Street)) |>
  prep()

rec_target |>
  bake(new_data = NULL)
# A tibble: 2,930 Γ— 2
   Street woe_Neighborhood
   <fct>             <dbl>
 1 Pave            -14.4  
 2 Pave            -14.4  
 3 Pave            -14.4  
 4 Pave            -14.4  
 5 Pave              0.394
 6 Pave              0.394
 7 Pave            -12.3  
 8 Pave            -12.3  
 9 Pave            -12.3  
10 Pave              0.394
# β„Ή 2,920 more rows

And we see that it works as intended, we can pull out the exact levels using the tidy() method

rec_target |>
  tidy(1)
# A tibble: 28 Γ— 10
   terms        value   n_tot n_Grvl n_Pave p_Grvl  p_Pave     woe outcome id   
   <chr>        <chr>   <int>  <dbl>  <dbl>  <dbl>   <dbl>   <dbl> <chr>   <chr>
 1 Neighborhood Bloomi…    28      0     28 0      9.60e-3 -11.7   Street  woe_…
 2 Neighborhood Blueste    10      0     10 0      3.43e-3 -10.6   Street  woe_…
 3 Neighborhood Briard…    30      0     30 0      1.03e-2 -11.7   Street  woe_…
 4 Neighborhood Brooks…   108      0    108 0      3.70e-2 -13.0   Street  woe_…
 5 Neighborhood Clear_…    44      0     44 0      1.51e-2 -12.1   Street  woe_…
 6 Neighborhood Colleg…   267      0    267 0      9.15e-2 -13.9   Street  woe_…
 7 Neighborhood Crawfo…   103      0    103 0      3.53e-2 -13.0   Street  woe_…
 8 Neighborhood Edwards   194      1    193 0.0833 6.61e-2   0.231 Street  woe_…
 9 Neighborhood Gilbert   165      1    164 0.0833 5.62e-2   0.394 Street  woe_…
10 Neighborhood Green_…     2      0      2 0      6.85e-4  -9.01  Street  woe_…
# β„Ή 18 more rows

29.4 Python Examples

We are using the ames data set for examples. {category_encoders} provided the CatBoostEncoder() method we can use.

from feazdata import ames
from sklearn.compose import ColumnTransformer
from category_encoders.woe import WOEEncoder

ct = ColumnTransformer(
    [('WOEEncoding', WOEEncoder(), ['MS_Zoning'])], 
    remainder="passthrough")

ct.fit(ames, y=ames[["Street"]].values.flatten() == "Pave")
ColumnTransformer(remainder='passthrough',
                  transformers=[('WOEEncoding', WOEEncoder(), ['MS_Zoning'])])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ct.transform(ames).filter(regex="WOEEncoding.*")
      WOEEncoding__MS_Zoning
0                      0.778
1                     -2.008
2                      0.778
3                      0.778
4                      0.778
...                      ...
2925                   0.778
2926                   0.778
2927                   0.778
2928                   0.778
2929                   0.778

[2930 rows x 1 columns]