9  Max Abs Scaling

The Max-Abs scaling method works by making sure that the training data lies within the range \([-1, 1]\) by applying the following formula

\[ X_{scaled} = \dfrac{X}{\text{max}(\text{abs}(X))} \tag{9.1}\]

This is similar to the scaling we saw in Chapter 7. And we see that the only difference is whether we are aiming for the statistical properly (standard deviation of 1) or a specific decision (dividing by the largest quantity seen). This method is a learned transformation. So we use the training data to derive the right value of \(\text{max}(\text{abs}(X))\) and then this value is used to perform the transformations when applied to new data. For this, there is no specific guidance as to which method you want to use and you need to look at your data and see what works best.

9.2 Pros and Cons

9.2.1 Pros

  • Fast calculations
  • Transformation can easily be reversed, making its interpretations easier on the original scale
  • Doesn’t affect sparsity
  • Can be used on a zero variance variable. Doesn’t matter much since you likely should get rid of it

9.2.2 Cons

  • Is highly affected by outliers

9.3 R Examples

We will be using the ames data set for these examples.

# remotes::install_github("emilhvitfeldt/extrasteps")
library(recipes)
library(extrasteps)
library(modeldata)
data("ames")

ames |>
  select(Sale_Price, Lot_Area, Wood_Deck_SF, Mas_Vnr_Area)
# A tibble: 2,930 Γ— 4
   Sale_Price Lot_Area Wood_Deck_SF Mas_Vnr_Area
        <int>    <int>        <int>        <dbl>
 1     215000    31770          210          112
 2     105000    11622          140            0
 3     172000    14267          393          108
 4     244000    11160            0            0
 5     189900    13830          212            0
 6     195500     9978          360           20
 7     213500     4920            0            0
 8     191500     5005            0            0
 9     236500     5389          237            0
10     189000     7500          140            0
# β„Ή 2,920 more rows

We will be using the step_maxabs() step for this, and it can be found in the extrasteps extension package.

maxabs_rec <- recipe(Sale_Price ~ ., data = ames) |>
  step_maxabs(all_numeric_predictors()) |>
  prep()

maxabs_rec |>
  bake(new_data = NULL, Sale_Price, Lot_Area, Wood_Deck_SF, Mas_Vnr_Area)
# A tibble: 2,930 Γ— 4
   Sale_Price Lot_Area Wood_Deck_SF Mas_Vnr_Area
        <int>    <dbl>        <dbl>        <dbl>
 1     215000   0.148        0.147        0.07  
 2     105000   0.0540       0.0983       0     
 3     172000   0.0663       0.276        0.0675
 4     244000   0.0518       0            0     
 5     189900   0.0643       0.149        0     
 6     195500   0.0464       0.253        0.0125
 7     213500   0.0229       0            0     
 8     191500   0.0233       0            0     
 9     236500   0.0250       0.166        0     
10     189000   0.0348       0.0983       0     
# β„Ή 2,920 more rows

We can also pull out what the max values were for each variable using tidy()

maxabs_rec |>
  tidy(1)
# A tibble: 33 Γ— 4
   terms          statistic  value id          
   <chr>          <chr>      <dbl> <chr>       
 1 Lot_Frontage   max          313 maxabs_Bp5vK
 2 Lot_Area       max       215245 maxabs_Bp5vK
 3 Year_Built     max         2010 maxabs_Bp5vK
 4 Year_Remod_Add max         2010 maxabs_Bp5vK
 5 Mas_Vnr_Area   max         1600 maxabs_Bp5vK
 6 BsmtFin_SF_1   max            7 maxabs_Bp5vK
 7 BsmtFin_SF_2   max         1526 maxabs_Bp5vK
 8 Bsmt_Unf_SF    max         2336 maxabs_Bp5vK
 9 Total_Bsmt_SF  max         6110 maxabs_Bp5vK
10 First_Flr_SF   max         5095 maxabs_Bp5vK
# β„Ή 23 more rows

9.4 Python Examples

We are using the ames data set for examples. {sklearn} provided the MaxAbsScaler() method we can use.

from feazdata import ames
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import MaxAbsScaler

ct = ColumnTransformer(
    [('maxabs', MaxAbsScaler(), ['Sale_Price', 'Lot_Area', 'Wood_Deck_SF',  'Mas_Vnr_Area'])], 
    remainder="passthrough")

ct.fit(ames)
ColumnTransformer(remainder='passthrough',
                  transformers=[('maxabs', MaxAbsScaler(),
                                 ['Sale_Price', 'Lot_Area', 'Wood_Deck_SF',
                                  'Mas_Vnr_Area'])])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ct.transform(ames)
      maxabs__Sale_Price  ...  remainder__Latitude
0                  0.285  ...               42.054
1                  0.139  ...               42.053
2                  0.228  ...               42.053
3                  0.323  ...               42.051
4                  0.252  ...               42.061
...                  ...  ...                  ...
2925               0.189  ...               41.989
2926               0.174  ...               41.988
2927               0.175  ...               41.987
2928               0.225  ...               41.991
2929               0.249  ...               41.989

[2930 rows x 74 columns]