Encoding categorical variables

Non-numeric features generally have to be encoded into one or more numeric features before applying machine learning models. This article covers some of the different encoding techniques, the category_encoders package, and some of the pros and...

Long vs Wide Data

What does it mean for data to be in long form vs wide form, and when would you use each? In Pandas, how do you convert from one form to another?

The James-Stein Encoder

One technique, sometimes called "target" or "impact" encoding, uses the average value of the target variable per value to encode. The James-Stein encoder is a twist the "shrinks" the target value back to the global average to stop statistical...