I am working on a ML problem to predict house prices and Zip Code
is one feature which will be useful. I am also trying to use Random Forest Regressor
to predict the log
of the price
.
However, should I use One Hot Encoding
or Label Encoder
for Zip Code
? Because I have about 2000 Zip Codes
in my dataset and performing One Hot Encoding
will expand the columns significantly.
When to use One Hot Encoding vs LabelEncoder vs DictVectorizor?