In my work, sometimes my client complain about a subset of predictions not being accurate. Despite I know it's nearly impossible to just change the model for fitting that subgroup, while other predictions going well. But is it the case? Other than building another model specifically for that subset, I wonder is there anything I can do to improve the predictions within that subgroup? What kind of adjustments possible?
2 Answers
It always happens that some subset or subgroup of data may not be predicted well using the ML Model. This may happen due to many reasons :
- The sub category may not have enough data to learn from
- Bias in how data was captured
- Features created are not able to capture the subcategory
As you mentioned model retraining may not improve the model but you could try the following ;
Row Wise Weighting : Increase the weights of these rows while training so that model gives a little more importance to them while learning. XGBoost provides this intrisically
Feature Engineering : Spend a little bit more time on feature engineering to capture variables which may help you in this specific group
Almost impossible, but see if you could collect more data

- 1,994
- 4
- 17
-
Regarding the 'feature engineering', do you mean I need to device new feature engineering techniques/transformations just for this subgroup of data? Will this inconsistency in feature engineering between this subgroup (which use method B), while other using method A, cause any issues? Or do you mean I need to change the feature engineering for all data and re-apply them, just for the sake of twisting the performance for this particular subgroup? – Student Apr 28 '22 at 03:14
-
No, features would be for all the rows. But the idea is to find features which does good on subgroup but will be created for all data and reapply model.. – Ashwiniku918 Apr 28 '22 at 06:17
The answer is really to take a deeper dive into why that subgroup is not performing. They may have different characteristics and may need a different model. However, when first looking into to this, consider this a business problem more than a tweaking problem, and I would start to pull out this group and look at them 1 by 1 and see if there is a business explanation for why the group is not performing the way you think it should. Certainly, if the business is complaining, they should be able to give you more insight on this.

- 421
- 2
- 8
-
1What kind of statistics you would look at when comparing this subgroup against the rest of others? Any methodologies? Any specific metrics? – Student Apr 28 '22 at 03:13
-
You can usually implement a subgroup analysis through the statistical model. for example in logistic regression model you would construct the interaction terms with the subgroups and then contrast the effects of each subgroup with the outcome. However that only identifies the possibilities. The business should be able to give you some clue on why they think the group is not performing. That could be some other variable that you do not have in your model. Even so, there is never a guarantee that a model will perform equally as well for all subgroups. – Ralph Winters Apr 28 '22 at 14:29