Categorical Features
29
🏗️ Weight of Evidence Encoding
Feature Engineering A-Z
Preface
Introduction
Numeric Features
1
Numeric Overview
2
Logarithms
3
Square Root
4
Box-Cox
5
Yeo-Johnson
6
Percentile Scaling
7
Normalization
8
Range Scaling
9
Max Abs Scaling
10
Robust Scaling
11
Binning
12
Splines
13
Polynomial Expansion
14
Arithmetic
Categorical Features
15
Categorical Overview
16
Cleaning
17
Unseen Levels
18
Dummy Encoding
19
Label Encoding
20
Ordinal Encoding
21
Binary Encoding
22
Frequency Encoding
23
Target Encoding
24
Hashing Encoding
25
Leave One Out Encoding
26
Leaf Encoding
27
GLMM Encoding
28
Catboost Encoding
29
🏗️ Weight of Evidence Encoding
30
🏗️ James-Stein Encoding
31
🏗️ M-Estimator Encoding
32
🏗️ Thermometer Encoding
33
🏗️ Quantile Encoding
34
🏗️ Summary Encoding
35
Collapsing Categories
36
🏗️ Combination
37
🏗️ Multi-Dummy Encoding
Datetime Features
38
Datetime Overview
39
Value Extraction
40
Advanced Features
41
Circular Features
Missing Data
42
Missing Overview
43
Simple Imputation
44
Model Based Imputation
45
Missing Values Indicators
46
Remove Missing Values
Text Features
47
Text Overview
48
Manual Text Features
49
Text Cleaning
50
Tokenization
51
Stemming
52
N-grams
53
Stop words
54
Token Filter
55
Term Frequency
56
TF-IDF
57
Token Hashing
58
Sequence Encoding
59
LDA
60
word2vec
61
BERT
Circular Features
62
Circular Overview
63
🏗️ Trigonometric
64
🏗️ Periodic Splines
65
🏗️ Periodic Indicators
Too Many Variables
66
Too Many Overview
67
Zero Variance Filter
68
🏗️ Principal Component Analysis
69
🏗️ Principal Component Analysis Variants
70
🏗️ Independent Component Analysis
71
🏗️ Non-Negative Matrix Factorization
72
🏗️ Linear Discriminant Analysis
73
🏗️ Generalized Discriminant Analysis
74
🏗️ Autoencoders
75
🏗️ Uniform Manifold Approximation and Projection
76
🏗️ ISOMAP
77
🏗️ Filter based feature selection
78
🏗️ Wrapper based feature selection
79
🏗️ Embedded based feature selection
Correlated Data
80
Correlated Overview
81
High Correlation Filter
Outliers
82
Outliers Overview
83
🏗️ Removal
84
🏗️ Imputation
85
🏗️ Indicate
Imbalanced Data
86
Imbalanced Overview
87
🏗️ Up-Sampling
88
🏗️ ROSE
89
🏗️ SMOTE
90
🏗️ SMOTE Variants
91
🏗️ Borderline SMOTE
92
🏗️ Adaptive Synthetic Algorithm
93
🏗️ Down-Sampling
94
🏗️ Near-Miss
95
🏗️ Tomek Links
96
🏗️ Condensed Nearest Neighbor
97
🏗️ Edited Nearest Neighbor
98
🏗️ Instance Hardness Threshold
99
🏗️ One Sided Selection
Miscellaneous
100
Miscellaneous Overview
101
🏗️ IDs
102
🏗️ Colors
103
🏗️ Zip Codes
104
🏗️ Emails
Spatial
105
Spatial Overview
106
🏗️ Spatial Distance
107
🏗️ Spatial Nearest
108
🏗️ Spatial Count
109
🏗️ Spatial Query
110
🏗️ Spatial Embedding
111
🏗️ Spatial Characteristics
Time-Series Data
112
Time-series Overview
113
🏗️ Smoothing
114
🏗️ Sliding
115
🏗️ Log Interval
116
🏗️ Time series Missing values
117
🏗️ Time Series outliers
118
🏗️ Differences
119
🏗️ Lagging Features
120
🏗️ Rolling Window
121
🏗️ Expanding Window
122
🏗️ Fourier Features
123
🏗️ Wavelet
Image Data
124
Image Overview
125
🏗️ Edge and corner detection
126
🏗️ Texture Analysis
127
🏗️ Greyscale conversion
128
🏗️ Color Modifications
129
🏗️ Noise Reduction
130
🏗️ Value Normalization
131
🏗️ Resizing
132
🏗️ Changing Brightness
133
🏗️ Shifting, Flipping, and Rotation
134
🏗️ Cropping and Scaling
135
🏗️ Image embeddings
Ralational Data
136
Relational Overview
137
🏗️ Manual
138
🏗️ Automatic
Video Data
139
Video Overview
140
🏗️ Temporary
Sound Data
141
Sound Overview
142
🏗️ Temporary
143
🏗️ Order of transformations
144
🏗️ What should you do if you have sparse data?
145
🏗️ How Different Models Deal With Input
146
🏗️ Summary
References
Table of contents
29.1
Weight of Evidence Encoding
29.2
Pros and Cons
29.2.1
Pros
29.2.2
Cons
29.3
R Examples
29.4
Python Examples
Edit this page
Report an issue
View source
Categorical Features
29
🏗️ Weight of Evidence Encoding
29
🏗️ Weight of Evidence Encoding
29.1
Weight of Evidence Encoding
29.2
Pros and Cons
29.2.1
Pros
29.2.2
Cons
29.3
R Examples
29.4
Python Examples
28
Catboost Encoding
30
🏗️ James-Stein Encoding