3

In January, there were 2,700 new sign ups, and 3,500 who opt out. As at end January, there are 60,000 customers in our database.

In February, there were 3,400 new sign ups and 4,300 who opt out. As at end February, there are 59,100 customers in our database.

Looking at the new sign ups and opt out components, we see that : (a) New sign ups increased by 700 from Jan to Feb (b) Opt out increased by 800 from Jan to Feb

From Jan to Feb, the no. of customers decreased by 900 in our database. However, my superior queried that the figures look odd - since the new sign ups increased by 700 and opt out increased by 800, the overall number of customers should only decrease by 100.

I tried explaining that the figures will not be able to reconcile, by summing up the difference between months. Needless to say, my explanation was not accepted. Is anyone able to advice on what paradox is this? It seems like it is similar to Simpson's paradox.

Appreciate your advice please!

  • It may help you to imagine this scenario: imagine that in February, the signup numbers were same as January. Then new signups would increase by from January to February by $0$, as would new opt outs. But number of customers would not change by $0$ (it would decrease by $800$). So the change in customers in the database from month 1 to month 2 only depends on the number of signups and opt outs in month 2. The previous month's signup and opt out numbers are irrelevant to this. And no, this isn't Simpson's Paradox. – Minus One-Twelfth Aug 27 '19 at 11:15
  • Presumably "opt out" refers to someone who opts to be removed from the database, not someone who simply declines to be added to it, right? – Barry Cipra Aug 27 '19 at 11:17
  • 1
    Ask your superior what if in february too there had been 2,700 new sign ups, and 3,500 who opted out. Would the overall customers remain unchanged? – AgentS Aug 27 '19 at 11:27
  • Simpson's Paradox involves data that are all of the same type, just separable into identifiable subsets. Here the error seems to be making an inference using data of one type (the change in the numbers of people leaving or joining) as if they were data of a different type (the actual numbers of people who left or joined in one month). It's a blatant error in reasoning; Simpson's Paradox is much more subtle. – David K Aug 28 '19 at 00:35

3 Answers3

1

No, it's not a paradox. The problem is with your [theoretical] supervisor who is observing only the rate of change (first derivative) of sign-ups vs opt-outs. It's like he total database content is distance, your sign-ups & opt-outs are speed and your supervisor is looking at acceleration.

poetasis
  • 6,338
0

No. The discrepancy is because you are trying to use the January figures for sign-ups and opt-outs even though you aren't showing the end-of-December subscription numbers. The difference between signups and opt-outs in February is -900, precisely the difference in subscriptions between the end of January and the end of February.

0

There's no Simpson's paradox here; your superior is merely confused in interpreting the numbers. It's more reminiscent of the Missing Dollar Riddle.

Barry Cipra
  • 79,832