67 Zero Variance Filter

67.1 Zero Variance Filter

Zero-variance predictors is a fancy way of saying that a predictor only takes 1 value. Another word for this is constant predictors. A zero variance predictor by definition contains no information as there isn’t a relationship between the outcome and the predictor. These types of predictors come in many different data sets. And are sometimes created in the course of the feature engineering process, such as when we do Dummy Encoding on categorical predictors with known possible levels.

The reason why this chapter exists is two-fold. Firstly, since these predictors have zero information in them, they are safe to remove which would lead to simpler and faster models. Secondly, many model implementations will error if zero-variance predictors are present in the data. Even some methods in this book don’t handle zero-variance predictors gracefully. Take the normalization methods in the Normalization chapter, some of these require division with the standard deviation, which is zero thus resulting in division by 0. Other methods like PCA can get in trouble as zero variance predictors can yield non-invertible matrices that they can’t normally handle.

The solution to this problem is very simple. For each variable in the data set, count the number of unique values. If the number is 1, then mark the variable for removal.

The general algorithm goes like the following:

Calculate the number of unique values of each predictor
Identify predictors with 1 unique value
Tag those predictors for removal
Remove predictors now, and for future data sets

For computational reasons you might want to combine 1. and 2. to instead identify predictors with more than 1 value. this typically runs a lot faster as you can determine that there is more than 1 unique values after a handful of values.

The zero-variance only matters on the training data set. So you could be in a situation where the testing data contained other values. This doesn’t matter as zero-variance predictors only affect the fitting of the model, which is done on the training data set.

There are a couple of variants to this problem. Some models require multiple values for predictors across groups. And we need to handle that accordingly. Another more complicated problem is working with predictors that have almost zero variance but not quite. Say a predictor has 999 instances of 10 and 1 instance of 15. According to the above definition, it doesn’t have zero variance. But it feels very close to it. These might be considered so low in information that they would be worth removing as well.

More care has to be taken as these predictors could have information in them, but they have low evidence. The way we flag near-zero variance predictors isn’t going to be as straightforward as how we did it above. We can’t just look at the number of unique values, as having 2 unique values by itself isn’t bad, as a 50/50 split of a variable is far from constant. We need to find a way to indicate that the variable takes few values. One metric could be looking at the percentage that the most common value is taken, if this is high it would be a prime candidate for near-zero variance. One could calculate the variance and pick a threshold. This would be harder to do since the calculated variance depends on scale. We could look at the ratio of the frequency of the most common value to the frequency of the second most common value. If this value is large then we have another contender for near-zero variance.

These different characteristics can be combined in different ways to suit your need for your data. you will likely need to tune the threshold values as well.

67.2 Pros and Cons

67.2.1 Pros

Removing zero variance predictors should provide no downside
Faster and smaller models
Easy to explain and execute

67.2.2 Cons

Removal of near-zero predictors requires care to avoid removing useful predictors

67.3 R Examples

We will use the step_zv() and step_nzv() steps which are used to remove zero variance and near-zero variance preditors respectively.

TODO

find a good data set

Below we are using the step_zv() function to remove

library(recipes)

data("ames", package = "modeldata")

zv_rec <- recipe(Sale_Price ~ ., data = ames) |>
  step_zv(all_predictors()) |>
  prep()

We can use the tidy() method to find out which variables were removed

zv_rec |>
  tidy(1)

# A tibble: 0 × 2
# ℹ 2 variables: terms <chr>, id <chr>

We can remove non-zero variance predictors in the same manner using step_nzv()

nzv_rec <- recipe(Sale_Price ~ ., data = ames) |>
  step_nzv(all_predictors()) |>
  prep()

nzv_rec |>
  tidy(1)

# A tibble: 21 × 2
   terms          id       
   <chr>          <chr>    
 1 Street         nzv_RUieL
 2 Alley          nzv_RUieL
 3 Land_Contour   nzv_RUieL
 4 Utilities      nzv_RUieL
 5 Land_Slope     nzv_RUieL
 6 Condition_2    nzv_RUieL
 7 Roof_Matl      nzv_RUieL
 8 Bsmt_Cond      nzv_RUieL
 9 BsmtFin_Type_2 nzv_RUieL
10 BsmtFin_SF_2   nzv_RUieL
# ℹ 11 more rows

67.4 Python Examples

We are using the ames data set for examples. {sklearn} provided the VarianceThreshold() method we can use. With this, we can set the threshold argument to specify the threshold of when to remove. The default 0 will remove zero-variance columns.

from feazdata import ames
from sklearn.compose import ColumnTransformer
from sklearn.feature_selection import VarianceThreshold
from sklearn.compose import make_column_selector
import numpy as np

ct = ColumnTransformer(
    [('onehot', VarianceThreshold(threshold=0), make_column_selector(dtype_include=np.number))], 
    remainder="passthrough")

ct.fit(ames)

ColumnTransformer(remainder='passthrough',
                  transformers=[('onehot', VarianceThreshold(threshold=0),
                                 <sklearn.compose._column_transformer.make_column_selector object at 0x1535521b0>)])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

ct.transform(ames)

      onehot__Lot_Frontage  ...  remainder__Sale_Condition
0                      141  ...                     Normal
1                       80  ...                     Normal
2                       81  ...                     Normal
3                       93  ...                     Normal
4                       74  ...                     Normal
...                    ...  ...                        ...
2925                    37  ...                     Normal
2926                     0  ...                     Normal
2927                    62  ...                     Normal
2928                    77  ...                     Normal
2929                    74  ...                     Normal

[2930 rows x 74 columns]

but we can change that threshold to remove near-zero variance columns.

ct = ColumnTransformer(
    [('onehot', VarianceThreshold(threshold=0.2), make_column_selector(dtype_include=np.number))], 
    remainder="passthrough")

ct.fit(ames)

ColumnTransformer(remainder='passthrough',
                  transformers=[('onehot', VarianceThreshold(threshold=0.2),
                                 <sklearn.compose._column_transformer.make_column_selector object at 0x15287aff0>)])

ct.transform(ames)

      onehot__Lot_Frontage  ...  remainder__Sale_Condition
0                      141  ...                     Normal
1                       80  ...                     Normal
2                       81  ...                     Normal
3                       93  ...                     Normal
4                       74  ...                     Normal
...                    ...  ...                        ...
2925                    37  ...                     Normal
2926                     0  ...                     Normal
2927                    62  ...                     Normal
2928                    77  ...                     Normal
2929                    74  ...                     Normal

[2930 rows x 70 columns]