3

I was building a model that predicts user churn for a website, where I have data on all users, both past and present.

I can build a model that only uses those users that have left, but then I'm leaving 2/3 of the total user population unused.

Is there a good way to incorporate data from these users into a model from a conceptual standpoint?

soandos
  • 133
  • 5

2 Answers2

5

This setting is common in reliability, health care, and mortality. The statistical analysis method is called Survival Analysis. All users are coded according to their start date (or week or month). You use the empirical data to estimate the survival function, which is the probability that the time of defection is later than some specified time t.

Your baseline model will estimate survival function for all users. Then you can do more sophisticated modeling to estimate what factors or behaviors might predict defection (churn), given your baseline survival function. Basically, any model that is predictive will yield a survival probability that is significantly lower than the baseline.


There's another approach which involves attempting to identify precursor events patterns or user behavior pattern that foreshadow defection. Any given event/behavior pattern might occur for users that defect, or for users that stay. For this analysis, you may need to censor your data to only include users that have been members for some minimum period of time. The minimum time period can be estimated using your estimate of survival function, or even simple histogram analysis of the distribution of membership period for users who have defected.

MrMeritology
  • 1,840
  • 13
  • 14
0

Check this video on YouTube https://www.youtube.com/watch?v=RHsO10q7e2Y It is about a basic model (you can still optimize it) for churn prediction. uses all records available (churned and not churned).

it only introduces sub-sampling when evaluating performances. You can also introduce sub-sampling in the training set, depending on the machine learning algorithm you are using.

Hope this helps

Rosaria

Rosaria
  • 11
  • 1