1

Suppose we have created an army of n clones which are completely identical(except they may have different birthdays). The cloning happened at different times such that all 365(disregarding the 366th day) birthdays are equally likely.

What is the probability of at least 2 people sharing a birthday in this indistinguishable setting.

In the original birthday problem the solution is $$1-\frac{365Pn}{365^n}$$ But here the solution assumes the distinguishability of the people.

Also let us note that for the indistinguishable case the solution $$1-\frac{365 \choose n}{365+n-1 \choose n}$$ is incorrect because it fails to regard the probability weighting of each outcome as they are not equally likely(for example probability of two people having two Sep 1s is less that probability of having Sep 1 and Sep 2 as it can happen in two cases).

RobPratt
  • 45,619
  • 2
    I would say that "distinguishability" is a moderately useful and quite often abused didactical device. If it helps you choose the correct way of counting, good, if not, don't use it. The solution has to be $1-\frac{365\cdot 364\cdots(365-n+1)}{365^n}$ as you said. If you arrive to it by distinguishing the people, so be it. If they are indistinguishable, print out labels saying "Clone $1$" ... "Clone $n$" and stick to the clones and make them distinguishable. (TBC) –  Sep 17 '22 at 06:04
  • (Cont'd) The second ("indistinguishable case") solution is not jut incorrect but unjustified as well - you cannot just say "let me use the same formula as in the distinguishable case but replace any permutations with combinations, and it will automatically work." It won't necessarily. It may sometimes - but not here. –  Sep 17 '22 at 06:05
  • 3
    This question doesn't make sense. The answer can't possibly depend on whether you distinguish the clones or not; you ask for a probability, not a count, and your description of choosing the birthdays uniformly at random already determines the answer (because it's the same as the answer to the ordinary birthday paradox problem). – Qiaochu Yuan Sep 17 '22 at 06:05
  • @Stinking Bishop can you please elaborate why the "distinguishability" does not make a difference. Doesn't it affect the Sample and Event spaces? – John Man. Sep 17 '22 at 06:17
  • Sample and event space are of course affected, exactly as you said in the question. By choosing one formula vs. another you have picked one events space vs. the other. In one of them, as you said, all $365^n$ possibilities of the birthdays are equally probable, in the other one they are not. As the condition of the problem is that they must be equally probable, you have to use the formula which comes out using the "distinguishing" step. Perhaps it will help if you try it out with a hypothetical alien year that has only two days, and set $n=2$ and see what exactly happens. –  Sep 17 '22 at 06:22
  • (I mean, you've laid out those arguments in your question already. My point is: just saying that you "cannot distinguish the clones" in the problem statement does not change the event space by itself. It is your choice of the formula that changes it, and so you need to pick the right formula that corresponds to the given probability space, whatever it is.) –  Sep 17 '22 at 06:23
  • @StinkingBishop It all started make sense when I approached it in a probabilistic standpoint instead of combinatorial as you said. One question still remains for me(excuse me if it is not formulated well enough). Are there problem settings where distinguishability makes a difference from probabilistic standpoint? I want to understand whether it is just a trick for counting or an important probabilistic aspect on its own. – John Man. Sep 17 '22 at 06:37
  • @JohnMan. Sorry I don't really know the answer to that. In the probability theory one starts with the probability space (or assumes it is known), so "distinguishability" doesn't play any role. However, there are all sorts of "paradoxes" stemming out of incorrectly or ambiguously defining the probability space. (E.g. Monty Hall in a finite case, Bertrand's paradox in the infinite case.) I think that distinguishability is another aspect of that: it either helps you define the probability space correctly or leads you astray. But this is is a personal view, people may disagree. –  Sep 17 '22 at 06:49

1 Answers1

0

Since we are not given whether or not the clone's birthdays are independent or not, there is not enough information to determine the probability that no two share a birthday.

Suppose you were given that the clone's birthdays were independent. Let us number the clones in an arbitrary order. The definition of independence implies that for every sequence $(b_1,b_2,\dots,b_n)$, where $b_i$ is a day of year for each $i\in \{1,\dots,365\}$, the probability of that sequence occurring (meaning the $i^\text{th}$ clone has birthday $b_i$ for all $i$) is $(1/365)^n$. In particular, all of these sequences are equally likely. Therefore,we can find the probability of all birthdays being different by counting the number of sequences where all entries are different, and dividing by $365^n$. The result is $$ 365\cdot 364\cdots\cdot (365-n+1) \over 365^n $$ which is the same result as the distinguishable person case.

The takeaway message here is that independence implies that the underlying sample space is most conveniently thought of as distinguishable, at least for the purpose of counting cases.

Mike Earnest
  • 75,930