Too Many Variables
72
🏗️ Partial Least Squares
Feature Engineering A-Z
Preface
Introduction
Numeric Features
1
Numeric Overview
2
Logarithms
3
Square Root
4
Box-Cox
5
Yeo-Johnson
6
Percentile Scaling
7
Normalization
8
Range Scaling
9
Max Abs Scaling
10
Robust Scaling
11
Binning
12
Splines
13
Polynomial Expansion
14
Arithmetic
Categorical Features
15
Categorical Overview
16
Cleaning
17
Unseen Levels
18
Dummy Encoding
19
Label Encoding
20
Ordinal Encoding
21
Binary Encoding
22
Frequency Encoding
23
Target Encoding
24
Hashing Encoding
25
Leave One Out Encoding
26
Leaf Encoding
27
GLMM Encoding
28
Catboost Encoding
29
Weight of Evidence Encoding
30
James-Stein Encoding
31
M-Estimator Encoding
32
Thermometer Encoding
33
Quantile Encoding
34
Summary Encoding
35
Collapsing Categories
36
Categorical Combination
37
Multi-Dummy Encoding
Datetime Features
38
Datetime Overview
39
Value Extraction
40
Advanced Features
41
Periodic Features
Missing Data
42
Missing Overview
43
Simple Imputation
44
Model Based Imputation
45
Missing Values Indicators
46
Remove Missing Values
Text Features
47
Text Overview
48
Manual Text Features
49
Text Cleaning
50
Tokenization
51
Stemming
52
N-grams
53
Stop words
54
Token Filter
55
Term Frequency
56
TF-IDF
57
Token Hashing
58
Sequence Encoding
59
LDA
60
word2vec
61
BERT
Periodic Features
62
Periodic Overview
63
Trigonometric
64
Periodic Splines
65
Periodic Indicators
Too Many Variables
66
Too Many Overview
67
Zero Variance Filter
68
Principal Component Analysis
69
Principal Component Analysis Variants
70
Independent Component Analysis
71
Non-Negative Matrix Factorization
72
🏗️ Partial Least Squares
73
🏗️ Linear Discriminant Analysis
74
🏗️ Generalized Discriminant Analysis
75
🏗️ Autoencoders
76
🏗️ Uniform Manifold Approximation and Projection
77
🏗️ ISOMAP
78
🏗️ Filter based feature selection
79
🏗️ Wrapper based feature selection
80
🏗️ Embedded based feature selection
Correlated Data
81
Correlated Overview
82
High Correlation Filter
Outliers
83
Outliers Overview
84
🏗️ Removal
85
🏗️ Imputation
86
🏗️ Indicate
Imbalanced Data
87
Imbalanced Overview
88
🏗️ Up-Sampling
89
🏗️ ROSE
90
🏗️ SMOTE
91
🏗️ SMOTE Variants
92
🏗️ Borderline SMOTE
93
🏗️ Adaptive Synthetic Algorithm
94
🏗️ Down-Sampling
95
🏗️ Near-Miss
96
🏗️ Tomek Links
97
🏗️ Condensed Nearest Neighbor
98
🏗️ Edited Nearest Neighbor
99
🏗️ Instance Hardness Threshold
100
🏗️ One Sided Selection
Miscellaneous
101
Miscellaneous Overview
102
🏗️ IDs
103
Colors
104
🏗️ Zip Codes
105
🏗️ Emails
Spatial
106
Spatial Overview
107
🏗️ Spatial Distance
108
🏗️ Spatial Nearest
109
🏗️ Spatial Count
110
🏗️ Spatial Query
111
🏗️ Spatial Embedding
112
🏗️ Spatial Characteristics
Time-Series Data
113
Time-series Overview
114
🏗️ Smoothing
115
🏗️ Sliding
116
🏗️ Log Interval
117
🏗️ Time series Missing values
118
🏗️ Time Series outliers
119
🏗️ Differences
120
🏗️ Lagging Features
121
🏗️ Rolling Window
122
🏗️ Expanding Window
123
🏗️ Fourier Features
124
🏗️ Wavelet
Image Data
125
Image Overview
126
🏗️ Edge and corner detection
127
🏗️ Texture Analysis
128
🏗️ Greyscale conversion
129
🏗️ Color Modifications
130
🏗️ Noise Reduction
131
🏗️ Value Normalization
132
🏗️ Resizing
133
🏗️ Changing Brightness
134
🏗️ Shifting, Flipping, and Rotation
135
🏗️ Cropping and Scaling
136
🏗️ Image embeddings
Ralational Data
137
Relational Overview
138
🏗️ Manual
139
🏗️ Automatic
Video Data
140
Video Overview
141
🏗️ Temporary
Sound Data
142
Sound Overview
143
🏗️ Temporary
144
🏗️ Order of transformations
145
🏗️ What should you do if you have sparse data?
146
🏗️ How Different Models Deal With Input
147
🏗️ Summary
References
Table of contents
72.1
Partial Least Squares
72.2
Pros and Cons
72.2.1
Pros
72.2.2
Cons
72.3
R Examples
72.4
Python Examples
Edit this page
Report an issue
View source
Too Many Variables
72
🏗️ Partial Least Squares
72
🏗️ Partial Least Squares
72.1
Partial Least Squares
WIP
72.2
Pros and Cons
72.2.1
Pros
72.2.2
Cons
72.3
R Examples
72.4
Python Examples
71
Non-Negative Matrix Factorization
73
🏗️ Linear Discriminant Analysis