When doing regression or classification when faced with a categorical attribute with $n$ possible values there are two options:
- Feed this attribute directly into your model.
- Partition your data into $n$ pieces based on the categorical attribute and train a model for each separately. During inference choose the model appropriately based on the same attribute.
One of the advantages of approach #2 is that it allows you to do more specific feature engineering. E.g. if you are modeling property prices and you decided to make separate models for residential/industrial properties you can choose separate features that are relevant for each.
Another advantage of approach #2 I can think of is that it can linearize otherwise non-linear relations. E.g. for a residential property having a railroad track nearby almost always heavily reduces property value while for an industrial property it could be a massive value booster.
In general, what factors go into deciding between approach #1 and #2?