Working on a binary classification problem that tries to predict customer churn, the data set is imbalanced with 2000 observations of non-churn cases vs 600 observations of churn cases.
On using GLM I see that when the majority class[Non-churn] is the reference level I get ~40 % error rate[confusion matrix] on both the levels [churn non-churn] but when the minority class is set as the reference level I get 100% error rate in predicting the minority class or in a way everything gets predicted as non-churn case.
After balancing the data using SMOTE the same trend continues, how should I interpret this behaviour. ?
Is it in a way saying that the non-churn population has users who have similar behaviour as the churners and hence the high error rate, but at the same time the non-churn users have a subset which are quite different than the churners in their behaviour and hence lower error rate when the reference class is the majority or the non-churn class.
Outcome on test data when majority class is set as the reference class:
Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
0 1 Error Rate
0 268 419 0.609898 =419/687
1 46 168 0.214953 =46/214
Totals 314 587 0.516093 =465/901
Outcome on test data minority class is set as the reference class:
Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
1 0 Error Rate
1 3 211 0.985981 =211/214
0 1 686 0.001456 =1/687
Totals 4 897 0.235294 =212/901