73 Linear Discriminant Analysis
73.1 Linear Discriminant Analysis
This feature engineering method is heavily inspired by the supervised modeling method of the same name. Linear Discriminant Analysis (LDA) is a supervised model for classification that tries to separate the classes by converting the higher-dimensional data into a lower-dimensional space. In this process, we find linear combinations of features that best separate the classes. We turn this supervised classification model into a dimensionality reduction method by returning the lower-dimensional representation instead of the predictions.
One thing that is worth noting is that this method does compared to many other methods in this section, It uses the outcome and is thus a supervised method. The overall goal of the method is to find features that distinguish the classes of the outcome well. Hence, we need to provide the outcome to the method. It isnβt strictly necessary to pass the modeling outcome to the method. It could be any categorical predictors; however, it is often not ideal to use anything other than the direct modeling response, as that represents most clearly what we are trying to find a signal about.
This method, like most everything else in this chapter, is constrained to only work on numeric input with no missing values. In addition to these constraints, we need to remember that LDA comes with a couple of assumptions.
- Data within each class must be normally distributed
- Equal covariance matrices across classes
- The classes are linearly separable
It is worth remembering what it would mean if these assumptions are violated. It is likely that the code that implements them will run without error; however if the assumptions are not being upheld, it will likely not work very well, giving us subpar performance.
73.2 Pros and Cons
73.2.1 Pros
- good computational speed
- can handle multicolinearity
73.2.2 Cons
- Will perform poorly if assumptions are not met