Are You Getting Burned By One-Hot Encoding?

A common technique for transforming categorical variables into a form suitable for machine learning is called "one-hot encoding" or "dummy encoding". This article discusses some of the limitations and folklore around this method (such as the...

Encoding categorical variables

Non-numeric features generally have to be encoded into one or more numeric features before applying machine learning models. This article covers some of the different encoding techniques, the category_encoders package, and some of the pros and...

The James-Stein Encoder

One technique, sometimes called "target" or "impact" encoding, uses the average value of the target variable per value to encode. The James-Stein encoder is a twist the "shrinks" the target value back to the global average to stop statistical...