70  Independent Component Analysis

Independent Component Analysis (ICA) is a method quite similar to Principal Component Analysis. PCA aims to create a transformation that maximizes the variance of the resulting variables, while making them uncorrelated. ICA, on the other hand, aims to create variables that are statistically independent. Note that the ICA components are not assumed to be uncorrelated or orthogonal.

This allows ICA to pull out stronger signals in your data. It also doesn’t assume that the data is Gaussian.

One way to think about the difference between PCA and ICA, PCA can be used more effectively as a data compression technique, On the other hand, ICA helps uncover and separate the structure in the data itself.

The notion that ICA is a dimensionality reduction method is because the implementation of fastICA, which is commonly used, works incrementally.

ICA, much like PCA, requires that your data be normalized before it is applied.

70.2 Pros and Cons

70.2.1 Pros

  • Can identify stronger signals

70.2.2 Cons

  • Sensitive to noise and outliers
  • Computationally Intensive

70.3 R Examples

We will be using the ames data set for these examples.

library(recipes)
library(modeldata)

ames_num <- ames |>
  select(where(is.numeric))

{recipes} provides step_ica(), which is the standard way to perform PCA.

pca_rec <- recipe(~ ., data = ames_num) |>
  step_normalize(all_numeric_predictors()) |>
  step_ica(all_numeric_predictors())

pca_rec |>
  prep() |>
  bake(new_data = NULL) |>
  glimpse()
Rows: 2,930
Columns: 5
$ IC1 <dbl> -0.37052169, 0.51413974, -0.80280637, 0.12280549, -0.65078105, -0.…
$ IC2 <dbl> -0.104340006, 0.924875720, -0.778555091, -0.433058679, 0.391229969…
$ IC3 <dbl> -1.99091152, 0.20060881, -1.53014169, -1.70455188, 0.41543549, 0.3…
$ IC4 <dbl> 0.7762583, 0.9242792, 1.3091516, -0.1819816, -0.6325621, -0.748529…
$ IC5 <dbl> 0.81982434, 0.57624587, 0.44016790, 0.81289679, -1.01678079, -1.06…

70.4 Python Examples