144
🏗️ How Different Models Deal With Input
Feature Engineering A-Z
Preface
Introduction
Numeric Features
1
Numeric Overview
2
Logarithms
3
Square Root
4
Box-Cox
5
Yeo-Johnson
6
Percentile Scaling
7
Normalization
8
Range Scaling
9
Max Abs Scaling
10
Robust Scaling
11
Binning
12
Splines
13
Polynomial Expansion
14
Arithmetic
Categorical Features
15
Categorical Overview
16
Cleaning
17
Unseen Levels
18
Dummy Encoding
19
Label Encoding
20
Ordinal Encoding
21
Binary Encoding
22
Frequency Encoding
23
Target Encoding
24
Hashing Encoding
25
Leave One Out Encoding
26
Leaf Encoding
27
GLMM Encoding
28
Catboost Encoding
29
Weight of Evidence Encoding
30
James-Stein Encoding
31
M-Estimator Encoding
32
Thermometer Encoding
33
Quantile Encoding
34
Summary Encoding
35
Collapsing Categories
36
Categorical Combination
37
Multi-Dummy Encoding
Datetime Features
38
Datetime Overview
39
Value Extraction
40
Advanced Features
41
Periodic Features
Missing Data
42
Missing Overview
43
Simple Imputation
44
Model Based Imputation
45
Missing Values Indicators
46
Remove Missing Values
Text Features
47
Text Overview
48
Manual Text Features
49
Text Cleaning
50
Tokenization
51
Stemming
52
N-grams
53
Stop words
54
Token Filter
55
Term Frequency
56
TF-IDF
57
Token Hashing
58
Sequence Encoding
59
LDA
60
word2vec
61
BERT
Periodic Features
62
Periodic Overview
63
Trigonometric
64
Periodic Splines
65
Periodic Indicators
Too Many Variables
66
Too Many Overview
67
Zero Variance Filter
68
Principal Component Analysis
69
Principal Component Analysis Variants
70
Independent Component Analysis
71
Non-Negative Matrix Factorization
72
Partial Least Squares
73
Linear Discriminant Analysis
74
LDA Variants{#sec-too-many-gda}
75
Autoencoders
76
Uniform Manifold Approximation and Projection
77
ISOMAP
78
Filter based feature selection
Correlated Data
79
Correlated Overview
80
High Correlation Filter
Outliers
81
Outliers Overview
82
🏗️ Removal
83
🏗️ Imputation
84
🏗️ Indicate
Imbalanced Data
85
Imbalanced Overview
86
🏗️ Up-Sampling
87
🏗️ ROSE
88
🏗️ SMOTE
89
🏗️ SMOTE Variants
90
🏗️ Borderline SMOTE
91
🏗️ Adaptive Synthetic Algorithm
92
🏗️ Down-Sampling
93
🏗️ Near-Miss
94
🏗️ Tomek Links
95
🏗️ Condensed Nearest Neighbor
96
🏗️ Edited Nearest Neighbor
97
🏗️ Instance Hardness Threshold
98
🏗️ One Sided Selection
Miscellaneous
99
Miscellaneous Overview
100
🏗️ IDs
101
Colors
102
🏗️ Zip Codes
103
🏗️ Emails
Spatial
104
Spatial Overview
105
🏗️ Spatial Distance
106
🏗️ Spatial Nearest
107
🏗️ Spatial Count
108
🏗️ Spatial Query
109
🏗️ Spatial Embedding
110
🏗️ Spatial Characteristics
Time-Series Data
111
Time-series Overview
112
🏗️ Smoothing
113
🏗️ Sliding
114
🏗️ Log Interval
115
🏗️ Time series Missing values
116
🏗️ Time Series outliers
117
🏗️ Differences
118
🏗️ Lagging Features
119
🏗️ Rolling Window
120
🏗️ Expanding Window
121
🏗️ Fourier Features
122
🏗️ Wavelet
Image Data
123
Image Overview
124
🏗️ Edge and corner detection
125
🏗️ Texture Analysis
126
🏗️ Greyscale conversion
127
🏗️ Color Modifications
128
🏗️ Noise Reduction
129
🏗️ Value Normalization
130
🏗️ Resizing
131
🏗️ Changing Brightness
132
🏗️ Shifting, Flipping, and Rotation
133
🏗️ Cropping and Scaling
134
🏗️ Image embeddings
Ralational Data
135
Relational Overview
136
🏗️ Manual
137
🏗️ Automatic
Video Data
138
Video Overview
139
🏗️ Temporary
Sound Data
140
Sound Overview
141
🏗️ Temporary
142
🏗️ Order of transformations
143
🏗️ What should you do if you have sparse data?
144
🏗️ How Different Models Deal With Input
145
🏗️ Summary
References
Table of contents
144.1
How different Models Deal With Input
Edit this page
Report an issue
View source
144
🏗️ How Different Models Deal With Input
144.1
How different Models Deal With Input
WIP
143
🏗️ What should you do if you have sparse data?
145
🏗️ Summary