69 Principal Component Analysis Variants

69.1 Principal Component Analysis Variants

This chapter goes over some of the variants of PCA. It is highly encouraged that you read that chapter before reading this one, as all of the following sections will be described as different from the main implementation. The variants presented in this chapter are the ones I feel are the most relevant and truly feel like variants. Some would argue that Non-negative Matrix Factorization (NMF) is a variant of PCA, however, this book treats it as a separate method.

69.1.1 Sparse PCA

Sparse PCA(Zou, Hastie, and Tibshirani 2006) is a variant of PCA where regularization is used to force some of the loadings to be zero. One of the downsides to regular PCA is that each resulting vector is a linear combination of all the input variables. This leads to harder interpretability, especially for data sets with many columns. By forcing some of the loadings to be zero you then limit how many input input variables are used in each output.

Note

Remember that sparse here refers to the loading values, not the data.

The amount of zeros would either be controlled with a regularization amount or as a proportion of zeros. And it doesn’t really matter which way, As long as you find the right tradeoff. With no regularization, you are left with regular PCA. With too much regularization you only allow 1 non-zero loading, meaning that each output vector will be a scaled version of 1 input vector. This end technically doesn’t do anything we want to do. With no regularization we go with the most optimal compression of the data, with too much regularization we don’t do any compression. The hope is that there is a good tradeoff between compression and interpretability.

Note

The prediction of a regularized PCA will technically be faster as it requires fewer calculations.

If you use PCA as a dimensionality reduction method, then regularizing your PCA will help to eliminate noisy input features as they will show up less or not at all.

69.1.2 Robust PCA

Robust PCA(Candès et al. 2011) is used when you suspect that there are large amounts of outliers in your data set. It does this by decomposing the data into two parts, one is the assumed clean data set that works well with PCA, and the other is a sparse data set of the outliers and corruption. The hope is that this decomposition filters off the unwanted outliers.

Please see the outliers section for more methods of how to handle outliers in your data.

69.1.3 Kernel PCA

Kernal PCA(Mika et al. 1998) is a variation of PCA for when the assumption of linear relationship no longer holds. We want to keep the PCA framework and do that by extending it to better handle non-linear relationships.

The standard implementation of PCA resolves around using inner product calculations to transform the data into a new dimension. What we can do instead is to use a kernel function in place of the inner product to allow for distance calculations in the higher dimensional feature space without transforming all of the data set into that space. This is the same trick used for Kernel Support Vector Machines.

These calculations allow the user to explore non-linear trends if there are any. There isn’t a way to reverse the transformation like there is with traditional PCA, which is similar to why interpretability is worse for kernel PCA. It also introduces a hyperparameter that would need to be tuned.

69.2 R Examples

We will be using the ames data set for these examples.

library(recipes)
library(embed)
library(modeldata)

ames_num <- ames |>
  select(where(is.numeric))

{embed} provides step_pca_sparse() to perform sparse PCA and {recipes} provides step_kpca(), step_kpca_poly(), step_kpca_rbf() to perform kernel PCA with different kernels.

For step_pca_sparse() the predictor_prop argument is used to determine the maximum proportion of non-zero coefficients.

pca_sparse_rec <- recipe(~ ., data = ames_num) |>
  step_normalize(all_numeric_predictors()) |>
  step_pca_sparse(all_numeric_predictors(), predictor_prop = 0.2)

pca_sparse_rec_prepped <- prep(pca_sparse_rec)
  
pca_sparse_rec_prepped |>
  bake(new_data = NULL) |>
  glimpse()

Rows: 2,930
Columns: 5
$ PC1 <dbl> 0.928277538, -1.121799088, -0.766999671, 2.465148494, 0.098799277,…
$ PC2 <dbl> 0.42651536, 2.05092903, 0.27170407, -0.85236446, -0.99100986, -1.2…
$ PC3 <dbl> -1.2169854, -1.1145117, -0.9661340, -0.5375621, 1.1432034, 1.16141…
$ PC4 <dbl> -1.4916224, 0.3556418, -0.7108605, -0.6941959, -0.5099111, -0.3670…
$ PC5 <dbl> 0.62932957, 0.89546095, 0.07687983, 0.60309059, 0.25547909, 0.2227…

It works the same as step_pca(), but if we take a look at the coefficients there are more zeroes.

pca_sparse_rec_prepped |>
  tidy(2)

# A tibble: 170 × 4
   terms          value component id              
   <chr>          <dbl> <chr>     <chr>           
 1 Lot_Frontage   0     PC1       pca_sparse_B5m51
 2 Lot_Area       0     PC1       pca_sparse_B5m51
 3 Year_Built     0     PC1       pca_sparse_B5m51
 4 Year_Remod_Add 0     PC1       pca_sparse_B5m51
 5 Mas_Vnr_Area   0     PC1       pca_sparse_B5m51
 6 BsmtFin_SF_1   0     PC1       pca_sparse_B5m51
 7 BsmtFin_SF_2   0     PC1       pca_sparse_B5m51
 8 Bsmt_Unf_SF    0     PC1       pca_sparse_B5m51
 9 Total_Bsmt_SF  0.268 PC1       pca_sparse_B5m51
10 First_Flr_SF   0.303 PC1       pca_sparse_B5m51
# ℹ 160 more rows

All the kernel pca steps work the same way.

pca_kpca_rec <- recipe(~ ., data = ames_num) |>
  step_normalize(all_numeric_predictors()) |>
  step_kpca(all_numeric_predictors())

pca_sparse_rec |>
  prep() |>
  bake(new_data = NULL) |>
  glimpse()

Rows: 2,930
Columns: 5
$ PC1 <dbl> 0.928277538, -1.121799088, -0.766999671, 2.465148494, 0.098799277,…
$ PC2 <dbl> 0.42651536, 2.05092903, 0.27170407, -0.85236446, -0.99100986, -1.2…
$ PC3 <dbl> -1.2169854, -1.1145117, -0.9661340, -0.5375621, 1.1432034, 1.16141…
$ PC4 <dbl> -1.4916224, 0.3556418, -0.7108605, -0.6941959, -0.5099111, -0.3670…
$ PC5 <dbl> 0.62934777, 0.89550255, 0.07688926, 0.60312887, 0.25548917, 0.2227…

69.1 Principal Component Analysis Variants

69.1.1 Sparse PCA

69.1.2 Robust PCA

69.1.3 Kernel PCA

69.2 R Examples

69.3 Python Examples