library(recipes)
library(embed)
library(modeldata)
<- ames |>
ames_num select(where(is.numeric))
69 Principal Component Analysis Variants
69.1 Principal Component Analysis Variants
This chapter goes over some of the variants of PCA. It is highly encouraged that you read that chapter before reading this one, as all of the following sections will be described as different from the main implementation. The variants presented in this chapter are the ones I feel are the most relevant and truly feel like variants. Some would argue that Non-negative Matrix Factorization (NMF) is a variant of PCA, however, this book treats it as a separate method.
69.1.1 Sparse PCA
Sparse PCA(Zou, Hastie, and Tibshirani 2006) is a variant of PCA where regularization is used to force some of the loadings to be zero. One of the downsides to regular PCA is that each resulting vector is a linear combination of all the input variables. This leads to harder interpretability, especially for data sets with many columns. By forcing some of the loadings to be zero you then limit how many input input variables are used in each output.
Remember that sparse here refers to the loading values, not the data.
The amount of zeros would either be controlled with a regularization amount or as a proportion of zeros. And it doesnβt really matter which way, As long as you find the right tradeoff. With no regularization, you are left with regular PCA. With too much regularization you only allow 1 non-zero loading, meaning that each output vector will be a scaled version of 1 input vector. This end technically doesnβt do anything we want to do. With no regularization we go with the most optimal compression of the data, with too much regularization we donβt do any compression. The hope is that there is a good tradeoff between compression and interpretability.
The prediction of a regularized PCA will technically be faster as it requires fewer calculations.
If you use PCA as a dimensionality reduction method, then regularizing your PCA will help to eliminate noisy input features as they will show up less or not at all.
69.1.2 Robust PCA
Robust PCA(Candès et al. 2011) is used when you suspect that there are large amounts of outliers in your data set. It does this by decomposing the data into two parts, one is the assumed clean data set that works well with PCA, and the other is a sparse data set of the outliers and corruption. The hope is that this decomposition filters off the unwanted outliers.
Please see the outliers section for more methods of how to handle outliers in your data.
69.1.3 Kernel PCA
Kernal PCA(Mika et al. 1998) is a variation of PCA for when the assumption of linear relationship no longer holds. We want to keep the PCA framework and do that by extending it to better handle non-linear relationships.
The standard implementation of PCA resolves around using inner product calculations to transform the data into a new dimension. What we can do instead is to use a kernel function in place of the inner product to allow for distance calculations in the higher dimensional feature space without transforming all of the data set into that space. This is the same trick used for Kernel Support Vector Machines.
These calculations allow the user to explore non-linear trends if there are any. There isnβt a way to reverse the transformation like there is with traditional PCA, which is similar to why interpretability is worse for kernel PCA. It also introduces a hyperparameter that would need to be tuned.
69.2 R Examples
We will be using the ames
data set for these examples.
{embed} provides step_pca_sparse()
to perform sparse PCA and {recipes} provides step_kpca()
, step_kpca_poly()
, step_kpca_rbf()
to perform kernel PCA with different kernels.
For step_pca_sparse()
the predictor_prop
argument is used to determine the maximum proportion of non-zero coefficients.
<- recipe(~ ., data = ames_num) |>
pca_sparse_rec step_normalize(all_numeric_predictors()) |>
step_pca_sparse(all_numeric_predictors(), predictor_prop = 0.2)
<- prep(pca_sparse_rec)
pca_sparse_rec_prepped
|>
pca_sparse_rec_prepped bake(new_data = NULL) |>
glimpse()
Rows: 2,930
Columns: 5
$ PC1 <dbl> -0.928277538, 1.121799088, 0.766999671, -2.465148494, -0.098799277β¦
$ PC2 <dbl> -0.42651536, -2.05092903, -0.27170407, 0.85236446, 0.99100986, 1.2β¦
$ PC3 <dbl> -1.2169854, -1.1145117, -0.9661340, -0.5375621, 1.1432034, 1.16141β¦
$ PC4 <dbl> -1.4916224, 0.3556418, -0.7108605, -0.6941959, -0.5099111, -0.3670β¦
$ PC5 <dbl> 0.62934269, 0.89542998, 0.07696528, 0.60312896, 0.25551665, 0.2227β¦
It works the same as step_pca()
, but if we take a look at the coefficients there are more zeroes.
|>
pca_sparse_rec_prepped tidy(2)
# A tibble: 170 Γ 4
terms value component id
<chr> <dbl> <chr> <chr>
1 Lot_Frontage 0 PC1 pca_sparse_jMybd
2 Lot_Area 0 PC1 pca_sparse_jMybd
3 Year_Built 0 PC1 pca_sparse_jMybd
4 Year_Remod_Add 0 PC1 pca_sparse_jMybd
5 Mas_Vnr_Area 0 PC1 pca_sparse_jMybd
6 BsmtFin_SF_1 0 PC1 pca_sparse_jMybd
7 BsmtFin_SF_2 0 PC1 pca_sparse_jMybd
8 Bsmt_Unf_SF 0 PC1 pca_sparse_jMybd
9 Total_Bsmt_SF -0.268 PC1 pca_sparse_jMybd
10 First_Flr_SF -0.303 PC1 pca_sparse_jMybd
# βΉ 160 more rows
All the kernel pca steps work the same way.
<- recipe(~ ., data = ames_num) |>
pca_kpca_rec step_normalize(all_numeric_predictors()) |>
step_kpca(all_numeric_predictors())
|>
pca_sparse_rec prep() |>
bake(new_data = NULL) |>
glimpse()
Rows: 2,930
Columns: 5
$ PC1 <dbl> 0.928277538, -1.121799088, -0.766999671, 2.465148494, 0.098799277,β¦
$ PC2 <dbl> 0.42651536, 2.05092903, 0.27170407, -0.85236446, -0.99100986, -1.2β¦
$ PC3 <dbl> 1.2169854, 1.1145117, 0.9661340, 0.5375621, -1.1432034, -1.1614148β¦
$ PC4 <dbl> 1.4916224, -0.3556418, 0.7108605, 0.6941959, 0.5099111, 0.3670430,β¦
$ PC5 <dbl> -0.62934150, -0.89546560, -0.07688815, -0.60311144, -0.25548742, -β¦