I have a column with some NaNs in it and I want to replace those NaNs with the average/median/mode.
Technically, the validation/ test data has never been seen before - so how could I include it in the average? That would bias the values.
Do I "fit" the average to my training data only, just like scaling? Or do I take the average using the entire dataset?