60

When hearing about the two-daughter problem, I first thought it to be quite clear (after, of course, at first falling into the trap like many of us), but on the second glance, I encountered some serious problems with my understanding.

The original problem seems to be quite easy: Assume that the only thing you know about a man with two kids is that at least one of the kids is a daughter. What is the probability that the other kid is a daughter as well? (Boys and girls are assumed to be born equally often.)

After the first impulse ("1/2 of course!"), it becomes clear that it is only 1/3. The problem can be mapped to a situation where from the multitude of families with two children, only those with M/M are ruled out, while the equally often cases F/F, F/M and M/F remain, making F/F only one third of all remaining cases.

But now, meet Mr. Smith. I don't know much about him (except that he has two children), but when he approached me, he told me: "I am so happy! Victoria just got the scholarship she wanted!"

Now what is the probability that Victoria has a sister?

Since I only know that Mr. Smith has two children, and one is obviously a girl, I am tempted to map this onto the two-daughter-problem, leading to the answer "1/3".

But wait! What if I ask Mr. Smith first, if Victoria is his elder daughter? Assume his answer is yes (and ignore any problems with twins - even then one is typically a few seconds "older" than the other). So now I know that from the cases (F/F, F/M, M/F), M/F also drops out. And now, the probability for F/F just rose to 1/2.

Okay, but what if his answer is no? Then Victoria is the younger one, and F/M drops out. Again, the probability rises to 1/2.

So I'm going to just ask him: "Well, Mr. Smith, is Victoria your elder daughter? Wait - don't answer, because whatever you may answer, it doesn't matter. The probability just rose from 1/3 to 1/2."

Or, even better, I do not even have to ask him, just thinking about the question will shift probabilities to 1/2, which means that the original probability for Victoria having a sister must already have been 1/2. But then the mapping to the two-daughter-problem is obviously false.

Where is my error?

Making things worse, I could also create a setup where Mr. Smith just tells me: "I have two kids, and at least one of them is a girl." I then ask him: "Oh, can you give me a name of a daughter of yours?" and he answers: "Sure. Victoria."

(Side note: I have a gut feeling that this has something to do with how to assume probability distributions behind situations, similar to the Two envelopes problem, but I can't figure this out completely yet.)

-------- UPDATE --------

It seems that my error is that the question "Is Victoria the older child?" does not change the probabilities. If I know for sure that Mr. Smith was picked from an equally distributed (M/F, F/M, F/F) sample, then the knowledge that Victoria is the older child does not change anything, as was pointed out here, and the probability for her having a sister is 1/3.

But it is very interesting that solely from the sentence "Victoria just got the scholarship she wanted!" I can NOT infer that Mr. Smith is indeed chosen from this uniform distribution.

Imagine that all kids have the same chance to get a scholarship, and the happy father will tell us if it is the case. Then it is actually twice as probable that Mr. Smith will tell us about his daughter's success if he has two girls, so the weighting of the four possibilities (M/M, F/M, M/F, F/F) is (0, 1, 1, 2). And in this case, the probability of Victoria having a sister is 1/2.

So another problem in my reasoning is the mapping of Mr. Smith's statement to the two-daughter-problem. Simply put, without knowing more about the circumstances that led to Mr. Smith telling me about Victoria, I simply can't say if the probability is 1/3 or 1/2.

Now I've got a headache...

polfosol
  • 9,245
Thern
  • 901
  • 6
    Forget the headache, have a pat on the back! You have reached the correct answer to the original question. Unless you are told how the Smith family was chosen in the first place it is unanswerable. (Although as you say in one comment, most plausible real-life scenarios that could lead to such a question give a probability of 1/2.) – David Hartley Feb 18 '17 at 14:47
  • 1
    I didn't even read past the first 2 paragraphs. For me, probability theory is not applicable here in the "pure form": you cannot ask about probabilities of actual facts of life. What is the probability that Napoleon had blue eyes according to what you know? There is no "probability" here: he either had blue eyes, or he hadn't, regardless of what you know. What is the probability i am typing this from my office on Saturday evening? – Alexey Feb 18 '17 at 15:27
  • @Alexey I just flipped a fair coin. Is it meaningful to talk about the probability of it being heads? – mathematician Feb 18 '17 at 15:54
  • @mathematician, if you have already flipped it, then you cannot talk about probability: either it is head, or tail, and you know it. What is the probability that your are lying? – Alexey Feb 18 '17 at 15:58
  • I have posted a more detailed answer below. – Alexey Feb 18 '17 at 15:59
  • klajok's answer is nice. But why the rush in accepting it? This way you might lose some other interesting views on your interesting question. – polfosol Feb 18 '17 at 18:04
  • @polfosol I am quite new to this page, but I will consider your point for my next question. – Thern Feb 18 '17 at 18:07
  • I think you may have failed in a sense to ask the question you intended, because if you ask him whether his older child is a daughter, and he says "yes", and then given this fact you then act about his younger child; the probability now will be a half, since this is a 2nd unbiased sample. See my answer below. – it's a hire car baby Feb 18 '17 at 19:04
  • @RobertFrost I did not ask him whether his older child is a daughter, but whether the child he mentioned is the older one. However, I made the mistake of accidentally shifting the base probabilities, as I argued that now the father must come from the {F/M, F/F} multitude - but now F/M and F/F are not equally probable anymore, as klajok pointed out in his/her answer. – Thern Feb 18 '17 at 19:17
  • @BlueRaja-DannyPflughoeft The question is similar, but I don't think it is a duplicate because I had a "proof" where I was looking for an error. The linked question contains valuable answers, but I think it would not have helped me to see the pitfalls in my own reasoning. – Thern Feb 19 '17 at 09:27
  • 2
    For more fun, suppose Mr. Smith tells you he has two children. You ask him to name a gender he has a child of. He answers female. What is the probability he has two girls? Assuming Mr. Smith makes his choices uniformly randomly, the answer is back to one half. –  Feb 19 '17 at 12:03
  • 1
    But wait! What if I ask Mr. Smith first, if Victoria is his elder daughter? Assume his answer is yes (and ignore any problems with twins - even then one is typically a few seconds "older" than the other). So now I know that from the cases (F/F, F/M, M/F), M/F also drops out. And now, the probability for F/F just rose to 1/2. I would have said in that case it rose to 1. He can't have an elder daughter if he has only one daughter. – Martin Rattigan Feb 19 '17 at 18:40
  • @Nebr I'm sorry, you state "...while the equally often cases F/F, F/M and M/F remain...", I don't understand how the cases F/M and M/F are considered separately when asking whether or not the man has two daughters (with no additional details provided), could you please explain? As it seems to me, the man can have two sons, two daughters or a son and a daughter. If we know that he has at least one daughter, the two sons case drops out, leaving us with two equally probable cases, one of which is favorable, thus the chance is 50%. – user3209815 Feb 20 '17 at 14:24
  • @user3209815 If you look at the multitude of all families with two children, one son and one daughter will occur twice as often as two daughters. The probability for two daughters is 1/2 * 1/2 = 1/4, the probability for two sons is 1/4 as well, so this leaves 1 - 1/4 - 1/4 = 1/2 for the remaining case of one son and one daughter. – Thern Feb 20 '17 at 15:30

18 Answers18

44

I think the confusion arises because the classical boy-girl problem is ambiguous:

'You know that Mr.Smith has two kids, one of which is a girl. What is the chance she has a sister?'

The ambiguity here is that from this description, it is not clear how we came to know that 'Mr.Smith has two kids, one of which is a daughter.'

Consider the following two scenarios:

Scenario 1:

You have never met Mr. Smith before, but one day you run into him in the store. He has a little girl with him, which he tells you is one of his two children.

Scenario 2:

You are a TV producer, and you decide to do a show on 'what is it like to raise a daughter?' and you put out a call for such parents to come on the show. Mr.Smith agrees to come on the show, and as you get talking he tells you that he has two children.

Now notice: the original description applies to both cases. That is, in both cases it is true that you know that 'Mr.Smith has two children, one of which is a daughter'.

However, in scenario 1, the chance of Mr. Smith having two daughters is $\frac{1}{2}$, but in scenario 2 it is $\frac{1}{3}$. The difference is that in the first scenario one specific child has been identified as female (and thus the chance of having two daughters amounts to her sibling being female, which is $\frac{1}{2}$), while in the second scenario no specific child is identified, so we can't talk about 'her sibling' anymore, and instead have to consider a conditional probability which turns out to be $\frac{1}{3}$.

Now, your original scenario, where you don't know anything about Mr. Smith other than that he has two children, and then Mr.Smith says 'I am so happy Victoria got a scholarship!' is like scenario 1, not scenario 2. That is, unless Mr. smith has two daughters called Victoria (which is possible, but extremely unlikely, and if he did one would have expected him to say something like 'my older Victoria'), with his statement Mr.Smith has singled out 1 of his two children, making it equivalent to scenario 1.

Indeed, I would bet that most real life cases where at some point it is true that 'you know of some parent to have two children, one of which is a girl' are logically isomorph to scenario 1, not scenario 2. That is, the classic two-girl problem is fun and all, but most of the time the description of the problem is ambiguous from the start, and if you are careful to phrase it in a way so that the answer is $\frac{1}{3}$, you will realize how uncommon it is for that kind of scenario to occur in real life. (Indeed, notice how I had to work pretty hard to come up with a real life scenario that is at least somewhat plausible).

Finally, all the variations of whether Victoria is the oldest, youngest, or whether you don't even know her name ('Mr. Smith tells you one his children got a scholarship to the All Girls Academy') do not change any of the probabilities (as you argued correctly): in most real life scenarios, the way you come to know that 'Mr.Smith has two children, one of which is a girl' (and I would say that includes your original scenario) means that the chance of the other child being a girl is $\frac{1}{2}$, not $\frac{1}{3}$.

So, when at the end of you original post you ask "where is my error?" I would reply: your 'error' is that you assumed that the correct answer should be $\frac{1}{3}$, and that since your argument implied that is would be $\frac{1}{2}$, you concluded that there must have been an error in your reasoning. But, as it turns out, there wasn't! For your scenario, the answer is indeed $\frac{1}{2}$, and not $\frac{1}{3}$. So your 'error' was to think that you had made an error!

Put a different way: you were temporarily blinded by the pure math ( and I say 'temporarily', because you ended up asking all the right citical questions, and later realized that the classic two-girl problem is ambiguous: good job!). But what I mean is: we have seen this two-girl problem so often, and we have been told that the solution is $\frac{1}{3}$ so many times, that you immediately assume that also in your descibed scenario that is the correct answer... When in fact that is not case because the initial assumptions are different: the classic problem assumes a Type 2 scenario, but the original scenario described in your post is a Type 1 scenario.

It's just like the Monty Hall problem ... We have seen it so often, that as soon as it 'smells' like the Monty Hall problem, we say 'switch!' ... when in fact there are all kinds of subtle variants in which switching is not any better, and sometimes even worse!

Also take a look at the Monkey Business Illusion: we have see that video of the gorilla appearing in the middle of people passing a basketball so many times that we can now surprise people on the basis of that!

Bram28
  • 100,612
  • 6
  • 70
  • 118
  • +1 I came to the same conclusion just now. It strongly depends on the circumstances of the setting. Unfortunately, I can't mark more than one answer as the correct one, but your answer and those of klajok and zoli explain the problems I have encountered. (And, interestingly, the first gut feeling of probability 1/2 seems to be mostly correct in typical real world scenarios.) Thanks everybody for your answers and ideas! – Thern Feb 18 '17 at 13:48
  • @Nebr You're welcome! And yes, for your scenario, I would say 1/2 is the corrct answer. – Bram28 Feb 18 '17 at 14:04
  • @Nebr I think what this example really demonstrates is that mathematical idealizations (or at least the assumptions going into them) don't always apply to real life. This particularly holds true for problems about incomplete information, e.g. In the Monty Hall problem, how can we possibly know that Monty behaves a certain way rather than some other way? And this is even more so when dealing with word problems about those situations, since it can be hard to separate between what is not stated but should nevertheless be assumed, and what is not stated because we really don't know. – Bram28 Feb 18 '17 at 14:13
  • 1
    @Nebr Just looking at your addendum where at the end you say that you can't tell whether it should be $\frac{1}{3}$ or $\frac{1}{2}$ ... I would say that your * original description* where you assume that you don't know anything about Mr. Smith other than that he has two children (right before he makes the announcement that tells you that he has a daughter called Victoria) is definitely a Type 1 scenario, and thus the answer really is $\frac{1}{2}$ ... So what I am saying is that you had it right all along: it is $\frac{1}{2}$, not $\frac{1}{3}$. Your original reasoning was all correct! – Bram28 Feb 18 '17 at 18:47
  • Only if I interpret "I know nothing about Mr. Smith except that he has two children" as "Mr. Smith has been chosen arbitrarily from a multiple of parents with two children of equally distributed sexes". But this interpretion is not necessarily correct, although it is plausible in a real-world scenario. If I would meet Mr. Smith by chance in reality, and he would make this statement, my confidence of his other child being a girl would indeed be 1/2, and I would also bet money this way. But in a strictly mathematical way, the problem is ambiguously formulated. – Thern Feb 18 '17 at 18:57
  • With the first scenario, with the little daughter at the store, isn't it possible that he has a younger child, that might be off with a babysitter, or home with his/her mom, etc.? – Drew Christensen Feb 20 '17 at 04:50
  • @Nebr - Bram28 - More on "specific" : // [a] Ask the father with two children if he has AT LEAST one daughter. If the answer is yes, the chance of another daughter is 1/3. // [b] Ask the father with two children to put his hand on ONE of his children, then ask if that one he has his hand on is a daughter. Whatever he answers, the chance that the remaining child is a daughter is 1/2. // Only in the case of [a] is the fathers answer any help in increasing information (reducing entropy) about the sex of the remaining sibling, not surprising because the father's answer in [b] was ignored. – Craig Hicks Feb 20 '17 at 12:16
19

Let us take a pragmatic approach to this. For the first problem:

Step 1: Round up a million men, each of whom has two children.
Step 2: Tell all of the men who have no daughters to go home.
Step 3: Ask all of the remaining men who have two daughters to raise their hands.

Obviously, about one third of the remaining men will raise their hands: about 750,000 men remain, and about 250,000 of them have two daughters.

For the second problem:

Step 1: Round up a million men, each of whom has two children.
Step 2: Tell all of the men who don't have a daughter named Victoria to go home. (We can ignore the scholarship.)
Step 3: Ask all of the remaining men who have two daughters to raise their hands.

Now, suppose 1 in 100 girls are named Victoria. (The exact figure doesn't matter.) Then of the 500,000 fathers with a daughter and a son, 5,000 of them will have daughters named Victoria; and of the 250,000 fathers with two daughters, 5,000 of them will also have a daughter named Victoria (because they have 500,000 daughters in total). Therefore, of the 10,000 men remaining, 5,000 will raise their hands.

So the probability that Mr Smith has two daughters is $1/2$.

TonyK
  • 64,559
  • Thank you for the answer. One main point I could deduce from the answers was that the mapping between Mr. Smith's statement and a probability distribution is the crucial problem. Simply put, it is not clear under which circumstances Mr. Smith's statement has happened. As soon as the probability distribution is known, the answer becomes straightforward. Since you provide a backstory for both examples that define a probability, your conclusion is right, but without that backstory, it is not defined. – Thern Feb 18 '17 at 19:49
  • @TonyK Exactly! I used two different TV show scenarios to make it a little more realistic, but I made the same point: The OP was absolutely correct intially to say it is $\frac{1}{2}$, rather than $\frac{1}{3}$ – Bram28 Feb 18 '17 at 19:50
  • 1
    @Nebr But you told us the backstory ... Which is that you don't know anything about Mr. Smith other than him having two kids. That makes it $\frac{1}{2}$. If the back story was that you already knew that Mr.Smith has a daughter, and then says 'I am so glad Victoria got a scholarship!', then it would be $\frac{1}{3}$. – Bram28 Feb 18 '17 at 19:53
  • @Bram28 I am currently not sure if "not knowing anything" can be identified with a uniform distribution. Note that the "Two envelopes problem" (link in my original question) shows that it can be very wrong to assume a uniform distribution when nothing is known. – Thern Feb 18 '17 at 20:04
  • @Nebr I think the issue with the two envelopes problem has nothing to do with uniform or non-uniform distributions. And in this case: if I genuinely don't know anything about Mr. Smith other than him having two kids, I would say it is perfectly reasonable to assume a uniform distribution between MM, MF,FM, and FF before he makes his comment about Victoria getting a scholarship. – Bram28 Feb 18 '17 at 20:13
  • @Bram28 As I have understood the two envelopes problem, it has all to do with assuming a uniform distribution: Only in the case of a uniform distribution, the paradox unfolds, since only then I can argue that the expectation value for switching is (0.5x + 2x)/2 > x, regardless of the value of x. But I agree that assuming a uniform distribution MM, MF, FM, and FF is reasonable - it is just not compellent. – Thern Feb 18 '17 at 20:26
  • @Nebr Sorry, I wasn't clear. Yes, in the two envelope problem we are assuming a uniform distribution ... But still get a paradox. In this case, once you assume a uniform distribution, any kind of confusion goes away. And yes, with the Mr.Smith case you did not spell out exactly what the conditions were, even when you said that all you knew of Mr. Smith is that he has two children. But to me that is like saying that when we throw some dice, we can't apply any probabilities because I didn't say that the dice were not biased. At some point, you have to make some reasonable assumptions. – Bram28 Feb 18 '17 at 20:36
  • I read the details of this answer after posting my own, but I have a "backstory" that corresponds exactly to the situation that someone walks up to me and makes the same statement Mr. Smith did, where the only other relevant information I have is that this person does not have more than two children or fewer than two children. And the conclusion is the same. (The scholarship doesn't change the odds, but the name does.) – David K Feb 18 '17 at 21:54
  • I come up with a very slight difference in the answer, however, by assuming a certain dependency in the joint distribution of girls named Victoria in two-daughter families. Under this assumption, it turns out the frequency of the name Victoria does matter. – David K Feb 18 '17 at 21:58
  • @DavidK: Yes, a girl is slightly less likely to be named Victoria if she has an older sister, because her older sister might already be named Victoria. And you are correct that the size of the appropriate adjustment does depend on the frequency of Victorias. I ignored this subtlety, because the important point is that the answer is (approximately) $1/2$, not (approximately) $1/3$. BTW, I like your answer. – TonyK Feb 18 '17 at 22:51
  • Right, when I said "it matters" I meant if you make some absurd assumption such as $P(\mathrm{Victoria})\approx1$ for a first daughter, you get a significantly different answer. Once you get down to $1/100$ or less (where the actual probability is), worrying about the dependence between sisters' names is like worrying about the fact that the male-female ratio in the population isn't actually $1:1.$ – David K Feb 19 '17 at 02:11
7

Let $XY$ denote that the sex of the younger sibling is $X$ and that of the older sibling is $Y$. $X$ and $Y$ may be $M$ or $F$, male and female. We have the following three equally likely elementary events

$$\{FF, FM, MF, MM\}.$$

These are equally likely, so $P(\{XY\})=\frac14$ for all possible $X,Y$.

The event that at least one of the siblings is a girl is

$$\{FF, FM, MF\}.$$

The event that both siblings are females is

$$\{FF\}.$$

We want to calculate the following conditional probability

$$P(\{FF\}\mid \{FF, FM, MF\})=\frac{P(\{FF\})}{P(\{FF, FM, MF\})}=\frac{\frac14}{\frac34}=\frac13.$$

The question remains: Do we agree that the following two questions are the same questions?

  • What is the probability that in a family both children are girls assuming that at least one of the children is a girl?

  • Assume that in a family of two children one of the children is a girl. What is the probability then that the other child is also a girl?

EDIT

Assume that a father says that he has a daughter and that daughter is older than the other child of his. Then our question modifies:

Assume that the older kid is a girl, what is the probability that the younger child is also a girl. Our conditional probability is then:

$$P(\{FF\}\mid \{FF, MF\})=\frac{P(\{FF\})}{P(\{FF,MF\})}=\frac{\frac14}{\frac12}=\frac12.$$

So, there is no contradiction. The second question is simply another question.

EDIT 2

I am only thinking... I realize that whatever the most honorable father's answer is the probability changes to $\frac12$. Wrong! Let's see what if I don't get an answer. Then the answer is either yes or no. That is, we have the following conditional probability:

$$P(\{FF\}\mid \{FF, MF\}\cup\{FF, FM\})=\frac{P(\{FF\})}{P(\{FF, FM, MF\})}=\frac{\frac14}{\frac34}=\frac13.$$

zoli
  • 20,452
  • Thank you, but I know the explanation of the 1/3 probability. The question is where the error is in my logic. Since it is a step-by-step derivation, at least one step must be wrong. – Thern Feb 18 '17 at 11:49
  • @Nebr: I did an edit, please take a look. – zoli Feb 18 '17 at 12:00
  • Okay, but by simply not asking him, I change the question and the probabilities - and that is the paradox. Please note that in my example, I did not ask Mr. Smith (it was just a thought experiment) and thus did not modify the question. – Thern Feb 18 '17 at 12:06
  • OK. Make me think! I am getting too lazy. – zoli Feb 18 '17 at 12:07
  • @Nebr: I've edited... – zoli Feb 18 '17 at 12:15
  • Yes, I see now, my thinking that the question for the elder daughter changes probabilities is indeed wrong, thanks. – Thern Feb 18 '17 at 13:26
7

One way to interpret an interview such as this with a particular person (while avoiding concerns such as whether we can define things like the probability that Napoleon had blue eyes) is to view the conversation as the result of some kind of sampling process. Then the question becomes one of the ratio of second-daughters to no-second-daughters in the population from which Mr. Smith was "sampled." As pointed out in other answers, however, we have information that restricts the sub-population to which Mr. Smith might belong, and the construction and composition of that subpopulation are what matters.

In the original two-daughter question, we have to be very careful to get the information we get in such a way that we have an unbiased sample of one family from the space of all two-child families in which at least one child is a girl. Essentially, we want something equivalent to where we ask a yes-no question whose answer is "yes" in the cases FF, FM, and MF and "no" in the cases MM and anything except exactly two children. Then we accept the first person who answers "yes" to this question as our sample of one.

Equivalently, we could list all the people in the world that have exactly two children in birth order FF, FM, or MF--that is, we remove from the list anyone with children MM or with more or fewer than two children--and randomly sample one person from that list.

When our sampling method consists of Mr. Smith volunteering the information about Victoria's scholarship, this is equivalent not to sampling from a list of people with children in birth order FF, FM, or MF, but to a much more limited list of people:

  1. Starting with a list of all people in the world, first we remove everyone with more or fewer than two children or with two boys.

  2. Now we remove anyone who doesn't have a child that recently received a scholarship.

  3. Now we have to start making some reasonable assumptions, such as that if both of Mr. Smith's children had recently received scholarships, he would have mentioned both of them. So we remove anyone from the list whose children both received scholarships recently.

  4. Now we remove anyone from the list who has not just now bragged to a comparative stranger about their child's recent scholarship.

  5. Now we remove anyone from the list whose child who recently received a scholarship is not named Victoria.

The remaining list after these five steps is the list of people from whom Mr. Smith has been selected in an unbiased fashion.

Up through the end of Step 4, it seems reasonable to assume that the list of people has equal numbers of people from the population with children born in the order FF, FM, of MF. But Step 4 changes that. Assuming no boys are named Victoria, but that every girl has an equal (but small) chance to be named Victoria, we retain about twice as many people from the FF part of the list as from the FM part of the list. The precise proportions are, if each girl has a small probability $p$ to be named Victoria unless they already have a sister named Victoria, and if we have $N$ people in each of the sublists FF, FM, and MF, we keep $Np$ people from each of the sublists FM and MF and $N(1 - (1-p)^2) = N(2p - p^2) \approx 2Np$ people from the FF list.

We therefore have $2Np$ people remaining in the list with children in birth order FM or MF, and approximately $2Np$ people remaining in the list with two daughters. As Mr. Smith is sampled unbiased from this list, he has a second daughter with probability approximately $\frac12.$

If we know the probability of a random girl to be named Victoria, we could calculate a more exact probability, which would be slightly less than $\frac12.$ But the only way the probability would be $\frac13$ is if practically every girl is named Victoria except those who have older sisters named Victoria. (Blended families with half-sisters could also complicate this calculation a bit if we tried to account for them, but let's assume there are few enough of these that they have only a small effect on the probabilities.)

The reason that you get the probability $\frac12$ after asking Mr. Smith apparently useless questions such as whether Victoria is his older daughter, is because the probability was already $\frac12$ before you asked the useless questions.

David K
  • 98,388
  • Nice! I was about to add something about the probability of girls being named Victoria myself but you just did. Excellent analysis! – Bram28 Feb 18 '17 at 22:53
  • I like this approach. Although you can of course argue that this is only correct if Mr. Smith would have only told me about the scholarship of his daughter if she was named Victoria, and else would have remained silent - which is a quite unlikely assumption in a real-world scenario. (Else, the probability would be exactly 1/2.) – Thern Feb 19 '17 at 09:02
  • It's kind of weird to talk about sampling when there doesn't need to be any sampling involved. Do you really need that? – user541686 Feb 19 '17 at 09:02
  • @Mehrdad Without knowing about the sampling method, there is no possibility to calculate a probability. Of course, this is also a valid approach: Stating that the sampling method is ultimately unkown, although reasonable (but not strictly provable) assumptions exist, and thus rejecting the question because of incomplete information. – Thern Feb 19 '17 at 09:06
  • @Nebr: I'm not sure if you read my comment. I'm saying sampling is a red herring (i.e. it is irrelevant) and doesn't even need to occur here at all. Here's why: instead of picking out 1 man and asking about his children, you can just ask "what fraction of men ..." and talk about the composition of the entire population as a whole. You could even let the population be infinitely sized if you want. Now your entire answer breaks down because it relies on sampling, and yet nobody is sampling anything. So you might want to rewrite your answer. – user541686 Feb 19 '17 at 09:15
  • @Mehrdad I did not write this answer. But isn't asking "what fraction of men from the whole population..." completely equivalent to assume a sampling over all people with a uniform distribution? Note that this uniform distribution is not mandatory, it is already an assumption about the sampling from which Mr. Smith is picked. – Thern Feb 19 '17 at 09:20
  • @Nebr: (Oh sorry, I didn't realize it wasn't you.) But you're saying the same thing I'm saying. As it is right now, the answer is pointing to the sampling process as the culprit, yet my entire point is the that an equivalent question with the exact same problem doesn't need to involve sampling at all... so if you think the problem is sampling, then you haven't identified the problem correctly, because the problem is still there without sampling. See what I mean? – user541686 Feb 19 '17 at 09:29
  • @Mehrdad The question "What fraction of men..." does not involve any problem. "What fraction of men with two children of which at least one is a daughter has two daughters" is unequivocally 1/3. "What fraction of men with two children of which at least one is named Victoria has two daughters" is also clearly defined, if you know the frequency of the name Victoria (as David K pointed out in his answer). There is no mathematical problem here, all problems arise from having to guess which sample to take (see also the answer of Bram28). – Thern Feb 19 '17 at 09:59
  • @Nebr: Yes, you caught me.. because this is English and we're talking about the real world. That's fine for answering the question; I'm not objecting to that. What I am objecting to is that that fact does not justify pointing to sampling as the source of the mathematical problem, because the problem can arise without sampling. It just wouldn't involves fathers and daughters. For example: (cont'd) – user541686 Feb 19 '17 at 10:29
  • @Nebr: (cont'd) or actually, maybe it's not even unrealistic... maybe it can still involve fathers & daughters. Consider what would happen if the distribution of children in 2-child families was P(M,M) = 1/3, P(M,F) = 1/6, P(F,M) = 1/6, P(F,F) = 1/3, rather than the usual P(·,·) = 1/4. In both cases, the male/female ratio is 50/50 across the population. Now if you ask "What fraction of men with two children of which at least one is a daughter have two daughters?" the answer is unequivocally 1/2, right? (cont'd) – user541686 Feb 19 '17 at 11:20
  • @Nebr: So now, what would happen if I didn't tell you what the families were like, and I asked you the same question—about the entire population? The question would now have the same ambiguity, even without any sampling being involved. So my point is, the actual problem isn't the sampling. It's the ambiguity in the full joint distribution of the population under consideration, and it's there regardless of whether we are talking about sampling or about the whole population. Does this make sense? – user541686 Feb 19 '17 at 11:20
  • @Nebr: By the way, the reason I'm pointing out the distinction between a sampling process and the joint distribution of the population is just to prevent a poor soul studying probability from reading this answer and naively thinking that it doesn't apply to a similar problem that concerns joint distributions of random variables just because there might not be any sampling involved in question. I think this is likely to occur for a student starting to study probability (especially Bayesian) but not yet thinking about statistics/sampling. I do realize you see what is going on! :) – user541686 Feb 19 '17 at 12:02
  • @Mehrdad Actually the "sampling" is not the main concern here, it's the subpopulation from which you sample. Steps 1 through 5 are all about "what fraction of men". The only role of sampling here is to translate between the underlying "what fraction of men" problem and the anecdote that involved an interview with one particular man. By the way, I originally had steps 4 and 5 in the other order but swapped them in order to avoid suggesting that people only brag about scholarships won by children named Victoria. Step 5 might come about because people brag about Mary instead when that's her name. – David K Feb 19 '17 at 14:10
  • @DavidK: "Actually" is a weird word choice considering you're saying exactly the same thing I've been saying in my comments. We're both saying the concern is the population that you're studying, not really the sampling itself. We don't disagree. Great! Change the beginning of your answer to reflect what you just said so it doesn't potentially confuse people who are learning, is all I'm saying. The first sentence should say something about the population under study instead of focusing on sampling. – user541686 Feb 19 '17 at 14:19
  • @Mehrdad By "actually" I meant "actually I did say in my answer". I did not mean to contradict your comment! But you have a good point about the first paragraph; to mention only "sampling" there is a distraction from the point of the answer. I have reworded it. – David K Feb 19 '17 at 14:35
  • Ah. +1 okay great! – user541686 Feb 19 '17 at 20:57
2

Your error: in $FF, FM, MF$ of the two-daughter problem, Victoria rules out one of $MF, FM$.

  • Thank you for your answer. I didn't understand it though. Why should Victoria rule out MF or FM? It is still perfectly possible that she has an older or younger brother. – Thern Feb 18 '17 at 11:48
  • She isn't of male gender. –  Feb 18 '17 at 11:48
  • 1
    Of course not. She is the F in MF or FM. I still didn't get you, sorry. – Thern Feb 18 '17 at 12:07
  • @Nebr: so obviously the other combination is impossible. –  Feb 18 '17 at 12:49
2

Summary

Mr. Smith has Victoria (i.e. has at least one daughter):

  1. Victoria has a sister with probability $\frac{1}{3}$.

  2. Knowing the above now we ask if Victoria is an older child:

    • Victoria is older child - she has younger sister with (still the same) probability $\frac{1}{3}$.

    • Victoria is younger child - she has older sister with (still the same) probability $\frac{1}{3}$.

Details

As was noted, if Mr. Smith has at least one daughter then the probability of the following (ordered) child pairs FM, FF, MF is equal to $\frac{1}{3}$, so Victoria has a sister with probability $\frac{1}{3}$.

If Mr. Smith is talking about Victoria then it is talking about:

  • younger daughter from the pair FM with probability $\frac{1}{3}$
  • younger daughter from the pair FF with probability $\frac{1}{6}$
  • older daughter from the pair FF with probability $\frac{1}{6}$
  • older daughter from the pair MF with probability $\frac{1}{3}$

Now we ask if Victoria has a sister:

If Mr. Smith says that the Victoria was his younger daughter then with the probability $\frac{\frac{1}{3}}{\frac{1}{3} + \frac{1}{6} } = \frac{2}{3}$ she comes from the pair FM, and with the probability $\frac{\frac{1}{6}}{\frac{1}{3} + \frac{1}{6} } = \frac{1}{3}$ she comes from the pair FF. In other words Victoria has older sister with the probability $\frac{1}{3}$.

Similar analysis can be done for the opposite situation: if Victoria is older child then she has younger sister with the probability $\frac{1}{3}$.

Rationale

Victoria has a sister with probability $\frac{1}{3}$. During initial analysis we can think about pairs FM,MF,FF as pairs of "child with smaller favorite bear, child with bigger favorite bear". So it doesn't matter if next we ask who is older, as age is independent (at least we can assume that :) ). So the answer for the question about age should not change probabilities.

I hope I was clear enough.

  • +1 I have the feeling that this points into the right direction. If we take the situation that I absolutely know that Mr. Smith is chosen from a set of equally distributed families where simply the M/M cases have been removed, then indeed probabilities remain at 1/3 even if I ask if Victoria is the younger / older daughter. But I have to know that; solely from the information "Two kids, Victoria has got a scholarship" I can't directly infer this distribution. So it seems to boil down to the question which distribution is assumed. – Thern Feb 18 '17 at 11:59
  • @Nebr Assumptions are 1) all pairs MM, MF, FM, FF have exactly the same probability $\frac{1}{4}$ 2) we know nothing more than Mr. Smith has "Two kids, one is Victoria". So we don't know that Mr. Smith is my neighbor and I know his family ;) – user402556 Feb 18 '17 at 12:15
  • @Nebr and the last one 3) there is no correlation between gender and relative age of children. – user402556 Feb 18 '17 at 12:23
  • 2
    And point 1) is exactly the critical one. We have to assume that a father with MF, FM and FF is equally likely to tell us about the scholarship of his daughter. But a father with FF is twice as likely to tell us about this! So the assumption that all cases are equally likely is at least questionable from the original talk with Mr. Smith. It depends on the circumstances. – Thern Feb 18 '17 at 13:21
  • Still, it is absolutely correct that asking for the elder/younger daughter does not change probabilities, as you pointed out. Either the starting probability weights are (0,1,1,2) or (0,1,1,1), but whatever they are, the not-asking can't change this (as your calculation proved). Thanks for pointing out. – Thern Feb 18 '17 at 13:24
  • @Nebr: This answer is wrong, and should be unaccepted. See my answer for details. – TonyK Feb 18 '17 at 19:39
  • @TonyK The answer is correct in a way that it answers my question "Where is my error?": I assumed that asking for Victoria being the older sibling changes probabilities, but that is wrong, as is shown here. But there are many good answers here, and if I could accept more than one, I would do so. – Thern Feb 18 '17 at 19:52
  • @Nebr: klajok says "Victoria has a sister with probability $\frac13$." This is wrong. – TonyK Feb 18 '17 at 21:23
  • It is such a terrible shame that just because a girl named Victoria has a younger sister instead of a younger brother, her chance of getting a scholarship is cut in half. How do the colleges make such decisions? – David K Feb 19 '17 at 07:30
  • It's true that the question doesn't change the probability, but the only reason this answer seems to support that is because the answer is wrong twice. It's like if you subtracted $9-2$ and got $6$ one time, and then you subtracted $9-2$ in a different way and got $7,$ and someone resolves the contradiction by telling you why the second way also should have come out to $6.$ – David K Feb 19 '17 at 07:34
  • @DavidK I changed the accepted answer because obviously it rises more question for people than answers, but I still think it is correct in the sense that it pointed out my main logical error in the "proof". Note that in the case of me already knowing that Mr. Smith has been selected from a uniformly distributed {M/F, F/M, F/F} setting, it would be correct to say that asking if Victoria is the elder child does not change probabilities. So it's more like if I subtracted 8-2 and got 7, and someone pointed out it is 6, even though the correct equation would be 9-2=7. – Thern Feb 19 '17 at 08:53
  • @Nebr OK, your analogy may be more apt. It's your choice which answer to accept, but for what my opinion is worth, I think you made a good choice. – David K Feb 19 '17 at 14:19
  • @Nebr: I think its misleading to say that the four cases are not equally likely -- a better description is to say that the sample space needs more information than just the sex of the two kids, and that the inequal probabilities you describe are the conditional probabilities resulting from the given information. –  Feb 19 '17 at 16:36
2

The only question here is whether the fact that Victoria is a girl made you more likely to hear about her.

Case 1: You live in a country where parents, if they have a daughter, will never say anything about their sons, but will instead talk about one of their daughters.

Case 2: Parents are equally likely to talk about kids of either gender.

Case 3: Parents are more likely to mention their daughters, but do sometimes talk about sons.

In case 1, the probability that Victoria's sibling is a sister is 1/3. In case 2, it's 1/2. In case 3, the probability is intermediate.

user49640
  • 2,704
  • A more striking case to add is a country where people always prefer to talk about sons if possible, in which case the fact you heard about Victoria means she has a sister with 100% probability! –  Feb 19 '17 at 16:38
  • @Hurkyl, yes, absolutely right. – user49640 Feb 19 '17 at 18:05
2

The error is that you consider the order of events for something where each event is independent. In other words - when you end up with three possibilities, i.e.

FF, FM and MF

you are wrong. The order doesn't matter.

You can look at like this.

There are 50% chance the known F is the oldest child. This give you two valid combinations:

FF and FM (with equal probability)

There are 50% chance the known F is the yongest child. This give you two valid combinations:

FF and MF (with equal probability)

So the probability for FF is:

P = 0.5*0.5 + 0.5*0.5 = 0.5

Now if you are told that the known F is the oldest, you'll instead get:

P = 1.0*0.5 = 0.5

As you can see that will not change the probability at all.

0

This problem actually bears a strong connection to the classic Monty Hall problem.

In this problem, the man tells you that at least one of his children is a girl - as the only other possibility is a boy, this gives three possibilities: FM, FF, or MF.

In the Monty Hall problem, Monty tells you that there are three doors, with one having a car and two having goats. Without loss of generality, let's say that the middle door actually has the car (but you don't know this) - so the order becomes GCG.

But in the Monty Hall problem, you choose one door, and then Monty reveals a goat behind one of the other two doors.

So if you look at the two remaining doors (other than the one you chose), you could have GC, GG, or CG - this maps perfectly onto the possibilities in this problem.

And then Monty revealing the goat behind one of these doors is equivalent to the man answering the question "is the girl you refer to the elder child?" - and the probabilities actually work exactly the same.

One of the interesting things about the Monty Hall problem is that the benefit of winning depends on the chances of the events playing out in certain ways - if Monty reveals a door randomly, then despite the actual result (Monty reveals a goat) being the same, the chances that you chose the car becomes 50% rather than the ~33% from the original formulation of the problem.

And the same applies to this problem - if you ask the man "Is the girl you refer to (Victoria) the elder child?" and he answers "yes", then the probability stays the same (~33% chance that the younger child is a girl). But if you ask "Is your elder child a girl?" and he answers "yes", then the probability that the younger child is a girl becomes 50%.

This isn't because of the information gained, but because of the underlying likelihood of the response given the question asked. In both cases, you learn that the elder child is a girl.

This is because you're not being asked to work out the chances of the man having two girls. You're asking for how confident you can be in the claim that the man has two girls given the information you have and the assumptions you take. And you're assuming a randomly chosen father of two children in a world where gender of each child is independent and equally likely to be male or female.

Then you're given further information. If you're simply informed, by another source, that his eldest child is a girl, without being told how that information was gathered, you're going to have to make an assumption about how the information was gathered.

If you assume that he was asked to identify one of the children (elder or younger) as a girl, then the chances that the other is a girl is 1/3. If you assume that he was asked whether his elder child was a girl, the chances that the other is a girl is 1/2.

And if you assume that he was asked if the younger child was a girl, then the chances that the younger is a girl is 0 - clearly, he said "no, the younger child is not a girl", and thus, in conjunction with "at least one child is a girl", it was concluded by your source that the older child must be a girl.


As I understand it, this is the principle behind Bayesian statistics - the "prior" is the assumption about how the information was obtained, in this case, and the prior is updated as appropriate.

Suppose that you were told that there were three possible questions, and one was selected from among those three - "Is your elder child a girl?", "Is your younger child a girl?", and "Please identify one of your children, elder or younger, as a girl" (assuming that the father would choose at random if both are girls).

Now, you're told that this man's elder daughter is a girl. You can immediately rule out the second question (no answer to the second question would allow your source to identify the elder daughter as a girl), and the first question is more likely than the third. This allows you to update your prior - your assumption about the information and its underlying distribution.

Glen O
  • 12,425
0

The crux of this problem, is whether or not your sampling method for the second sibling is biased in favour of sisters.

The reason your example is biased is that you are asking; "given that I have first selected and eliminated a daughter from your two children, what is the probability the 2nd is a daughter." This question is biased.

If you learn he has at least one daughter then the probability the other sibling is a sister, is always $\frac{1}{3}$ since the definition of the other is biased against sisters.

However if you first ask what his eldest child is, and he happens to answer "a daughter", or "a son" and you then ask the sex of the other, it will always be $50:50$, irrespective of the first answer, since you are simply taking a second unbiased sample from the population of children.

0

The way I see it, there're two ways to arrive at the (correct) conclusion that the probability of Victoria having a sister is 1 in 2.

Method 1 (The Genetics Method)

The probability of any given child being a daughter is 1 in 2. Therefore, the probability of Child 2 being a daughter is 1 in 2. This is unaffected by Child 1 (Victoria) happening to be a daughter.

Method 2 (The Four-Options Method)

For a family with two children, there are four possible configurations regarding the sex of the children, all of which are (for our purposes) equally likely.

Configuration A - Child 1 is a son; Child 2 is a son.

Configuration B - Child 1 is a son; Child 2 is a daughter.

Configuration C - Child 1 is a daughter; Child 2 is a son.

Configuration D - Child 1 is a daughter; Child 2 is a daughter.

When we are told that Child 1 (Victoria) is a daughter, this immediately eliminates configurations A and B, as each of these configurations involves Child 1 being a son, which we now know is not the case. Therefore, we are left with just two possible configurations:

Configuration C - Child 1 is a daughter; Child 2 is a son.

Configuration D - Child 1 is a daughter; Child 2 is a daughter.

These two configurations are equally likely, and, therefore, Child 2 (Victoria's sibling) is equally likely to be a son or a daughter.

Vikki
  • 131
  • How do you determine that Victoria is child 1 rather than child 2? –  Feb 19 '17 at 16:44
  • It also works equally well if Victoria is child 2. – Vikki Jun 26 '17 at 20:56
  • If Victoria is child 2, then the probability of child 2 being a daughter is 1 in 1, not 1 in 2 like the analysis assumes. –  Jun 27 '17 at 04:13
  • Poor phrasing; if Victoria is child 2, then we're talking about the probability of child 1 being a daughter. Hope that clears things up. – Vikki Jun 27 '17 at 22:07
  • Which goes back to my question: how do you determine whether Victoria is child 1 or child 2? The phrasing is fine: there's simply an essential gap in the argument. –  Jun 28 '17 at 05:11
  • Random choice (e.g., flip a coin). – Vikki Jun 28 '17 at 14:48
  • What do you do with the result of the coin flip? It's not clear how you can salvage your analysis with a coinflip. E.g. if you just randomly assign Victoria to one of the two children, you have to problems: (1) you've given one child a 50% chance of being male and an independent 50% chance of being Victoria, and thus a 25% chance of outright violating the givens, and (2) there's a 50% chance you've picked the child that is not Victoria, and mistakenly treating the problem as if that child were Victoria instead. –  Jun 28 '17 at 15:29
  • ... and if you instead mean to fix Victoria, and assign 1 vs 2 to her on a coin flip, then the underlying premise of the argument is inapplicable, since "child 1" doesn't refer to a specific birth, but instead a choice made in a way that depends, for example, on the genders of the two births. –  Jun 28 '17 at 15:33
  • OK, let's back up here. Victoria is Child 1 because she's the one we heard about first. Child 2 (Victoria's sibling) is either a boy (Configuration C) or a girl (Configuration D). These two outcomes are equally likely (because, if Victoria is the elder sibling, her being a girl has no effect on the sex of the younger child, and, if she's the younger sibling, then her being a girl is unaffected by what the sex of her older sibling was); ergo, Victoria has a 1-in-2 chance of having a brother, and a 1-in-2 chance of having a sister. – Vikki Jun 28 '17 at 15:37
  • I agree that the probability the youngest is female is 1/2 when conditioned on the eldest being female. I do not agree that the probability the youngest is female is 1/2 when conditioned on Victoria being eldest. Make a table including the four possibilities of (gender of elder), (gender of younger), and (whether Victoria is elder or younger). –  Jun 28 '17 at 16:17
  • If we assume that, even when additionally conditioned with being in the given problem, that the youngest child is female with probability 1/2, and vice versa, and we also assume Victoria is eldest with even odds, this is enough to determine the probabilities of the four rows of the table -- and the conclusion that Victoria has a younger sister with 33% odds. –  Jun 28 '17 at 16:18
  • (in the previous comment, I mean "the youngest child is female with probability 1/2 when conditioned on the eldest being female") –  Jun 28 '17 at 16:46
  • Maybe I should post the table to make it clear: $$ \begin{matrix} \text{Eldest gender} & \text{Youngest gender} & \text{Vic age} & \text{Probability} \ F & F & \text{Eldest} & 1/6 \ F & F & \text{Youngest} & 1/6\ F & M & \text{Eldest} & 1/3 \ M & F & \text{Youngest} & 1/3 \end{matrix} $$ –  Jun 28 '17 at 16:51
0

We make the simplifying assumptions:

  • Equal chance of boy or girl.
  • Victoria is a girl's name and is never given to boys.

Let's ask the generic question:

  • Mr Smith has two children, whom I've arbitrarily labeled Child A and Child B. Child A is girl with property Q or Child B is a girl with property Q. What is the chance that Child A is a girl and child B is a girl.

Let's rephrase the question to be more amenable to Bayesian Analysis.

  • A priori, we know that Mr Smith has two children, whom I've arbitrarily labeled A and B. Given that (Child A is a girl and Child A has property Q) or (Child B is a girl and Child B has property Q), what is the chance that Child A is a girl and child B is a girl.

Let

  • $A_G$ be the statement Child A is a girl
  • $B_G$ be the statement Child B is a girl
  • $A_Q$ be the statement Child A has property Q
  • $B_Q$ be the statement Child B has property Q

We ask for

\begin{align} P([A_G \text{ and } B_G] \text{ given } [(A_G \text{ and } A_Q) \text{ or } (B_G \text{ and } B_Q )]) \end{align}

Now, $P (A \text{ given } B) = \frac{P(A \text{ and } B)}{P(B)}, so our expression above is equal to

\begin{align} &\quad\frac{P([A_G \text{ and } B_G] \text{ and } [(A_G \text{ and } A_Q) \text{ or } (B_G \text{ and } B_Q )])}{P((A_G \text{ and } A_Q) \text{ or } (B_G \text{ and } B_Q ))}\\ & = \frac{P([A_G \text{ and } B_G \text { and } A_Q] \text{ or } [A_G \text{ and } B_G \text { and } B_Q])}{P((A_G \text{ and } A_Q) \text{ or } (B_G \text{ and } B_Q ))} \end{align}

For now, let us assume that Q implies you are a girl. So $A_Q \implies A_G$, and $B_Q \implies B_G$. This means $P(A_Q \text{ and } A_G) = P(A_Q)$.

We can then simplify the above statement to:

\begin{align} \frac{P([B_G \text { and } A_Q] \text{ or } [A_G \text { and } B_Q])}{P(A_Q \text{ or } B_Q)} \end{align}

Now, $P(A \text{ or } B) = P(A \text{ and }\neg B) + P (\neg A \text{ and } B) + P(B \text {and} A)$. ($\neg$ means not).

So, expanding the top and the bottom, our formula becomes,

\begin{align} \frac{P([B_G \text { and } A_Q] \text{ and } \neg [A_G \text { and } B_Q]) + P(\neg [B_G \text { and } A_Q] \text{ and } [A_G \text { and } B_Q]) + P([B_G \text { and } A_Q] \text{ and } [A_G \text { and } B_Q]) }{P(A_Q \text{ and } \neg B_Q) + P(\neg A_Q \text{ or } B_Q) + P(A_Q \text{ and } B_Q) } \end{align}

Now our equation is a mess. But we did this so we can apply some symmetry arguments, and because it will be easier to understand. Let's simplify the top a bit first though.

We note that

\begin{align} &\quad[B_G \text { and } A_Q] \text{ and } \neg [A_G \text { and } B_Q]\\ & = [B_G \text { and } A_Q] \text{ and } [\neg A_G \text { or } \neg B_Q]\\ & = B_G \text { and } A_Q \text { and } \neg B_Q &\text{since $A_Q \implies A_G$} \end{align}

Our formula then simplifies to

\begin{align} \frac{P(B_G \text { and } A_Q \text{ and } \neg B_Q) + P( A_G \text { and } B_Q \text { and }\neg A_Q) + P([B_G \text { and } A_Q] \text{ and } [A_G \text { and } B_Q]) }{P(A_Q \text{ and } \neg B_Q) + P(\neg A_Q \text{ or } B_Q) + P(A_Q \text{ and } B_Q) } \end{align}

We now apply the symmetry argument, noting that Child A and Child B are arbitrary labels, and as such

\begin{align} P(B_G \text { and } A_Q \text{ and } \neg B_Q) &= P( A_G \text { and } B_Q \text { and }\neg A_Q)\\ P(A_Q \text{ and } \neg B_Q) &= P(\neg A_Q \text{ or } B_Q) \end{align}

Our formula simplifies to

\begin{align} &\quad\frac{2 P(B_G \text { and } A_Q \text{ and } \neg B_Q) + P([B_G \text { and } A_Q] \text{ and } [A_G \text { and } B_Q]) }{2 P(A_Q \text{ and } \neg B_Q) + P(A_Q \text{ and } B_Q) }\\ &=\frac{2 P( [A_Q \text{ and } B_G] \text { and } \neg B_Q ) + P(A_Q \text { and } B_Q ) }{2 P(A_Q \text{ and } \neg B_Q) + P(A_Q \text{ and } B_Q) }\\ \end{align}

Now, we at last note that $P (A \text{ and } \neg B) = P(A) - P (A \text{ and } B)$ to rewrite the formula as:

\begin{align} &\quad\frac{2 P( [A_Q \text{ and } B_G] \text { and } \neg B_Q ) - 2 P(A_Q \text{ and } B_G \text{ and } B_Q) + P(A_Q \text { and } B_Q ) }{2 P(A_Q \text{ and } \neg B_Q) + P(A_Q \text{ and } B_Q) }\\ &=\frac{2 P( A_Q \text{ and } B_G) - 2 P(A_Q \text{ and } B_Q) + P(A_Q \text { and } B_Q ) }{2 P(A_Q \text{ and } \neg B_Q) + P(A_Q \text{ and } B_Q) }\\ & = \frac{2 P( A_Q \text{ and } B_G ) - P(A_Q \text{ and } B_Q) }{2 P(A_Q \text{ and } \neg B_Q) + P(A_Q \text{ and } B_Q) } \end{align}

In almost all cases, $A_Q$ and $B_G$ are independent. That is, a priori, $A$ having the property $Q$ does not affect $B$'s sex. One of the only case where $A_Q$ affects $B_G$ is if $A_Q$ states that A is the only girl. However, we cannot assume that $A_Q$ is independent of $B_Q$. For example, $A$ being the older child means that B cannot be the older child.

We then write our formula as

\begin{align} &\quad\frac{2 P (A_Q)P( B_G ) - P(A_Q \text{ and } B_Q) }{2 P(A_Q \text{ and } \neg B_Q) + P(A_Q \text{ and } B_Q) }\\ &=\frac{2 P (A_Q)P( B_G ) - P(A_Q \text{ and } B_Q) }{2 P(A_Q \neg B_Q) - 2P (A_Q \text{ and } B_Q) + P(A_Q \text{ and } B_Q) }\\ & = \frac{2P (A_Q)P( B_G ) - P(A_Q \text{ and } B_Q) }{2 P(A_Q ) - P (A_Q \text{ and } B_Q) } \end{align}

Finally, we note that $P(B_Q \text{ and } A_Q) = P(B_Q \text{ given } A_Q)P(A_Q)$

We finally have

\begin{align} &\quad \frac{2P (A_Q)P( B_G ) - P(B_Q \text{ given } A_Q) P(A_Q) }{2 P(A_Q ) - P (B_Q \text{ given } A_Q) P(A_Q)}\\ & = \frac{2P( B_G ) - P(B_Q \text{ given } A_Q) }{2 - P (B_Q \text{ given } A_Q) } \end{align}

Finally, We have, given the following assumptions:

  • Having the property Q implies being a girl. We can handle the case of Q not implying being a girl as well. Shown below.
  • Child A having property Q does not affect Child B's being a boy or a girl.

\begin{align} \frac{P( B_G ) - \frac{1}{2} P(B_Q \text{ given } A_Q) }{1 - \frac{1}{2} P (B_Q \text{ given } A_Q) } \end{align}

or if we set $P(B_G) = 0.5$

\begin{align} \frac{1 - P(B_Q \text{ given } A_Q) }{2 - P (B_Q \text{ given } A_Q) } \end{align}

We have done this to show that the probability ultimately depends on how likely it is that Child B has property Q given that Child A has property Q, a priori.

Let us look at two extreme cases. We'll also use these cases to demonstrate how to handle the case when Q does imply being a girl.

A) $B_Q$ cannot occur if $A_Q$ is true.

Mr Smith has two children, whom I've arbitrarily labeled Child A and Child B. Child A is girl and is oldest or Child B is a girl and is oldest. What is the chance that Child A is a girl and child B is a girl?

Let $Q$ be the property of being the oldest child. Q does imply being a girl. However, we can simply change $Q$ to be the property of "being a girl, and being oldest". This does not change our question at all.

Since only one child can be the oldest girl, $P(B_Q \text{ given } A_Q) = 0$. And our formula simplifies to $P(B_G) = 0.5$.

B) $B_Q$ and $A_Q$ are empty statements. Something that tells us nothing about A and B. For example

  • Mr Smith has two children, whom I've arbitrarily labeled Child A and Child B. Child A is girl (and is human) or Child B is a girl (and is human). What is the chance that Child A is a girl and child B is a girl?

Once again, an empty statement (or being human) does not imply being a girl. So Q must therefore be the statement "is a girl (and is human)". That means Q=G.

\begin{align} &\quad\frac{1 - P(B_Q \text{ given } A_Q) }{2 - P (B_Q \text{ given } A_Q) }\\ &=\frac{1 - P(B_G \text{ given } A_G) }{2 - P (B_G \text{ given } A_G)}\\ & = \frac{1 - P(B_G ) }{2 - P (B_G )} &\text{since $B_G$ and $A_G$ are independent}\\ & = \frac{1}{3} \end{align}

In summary, the more unlikely $P(B_Q \text{ given } A_Q)$ is, the more the probability tends toward $\frac{1}{2}$. We should think of $P(B_Q \text{ given } A_Q)$ as how unique the property $Q$ is. If Q is super unique, then we can identify the child with property Q, and the probability tends toward $\frac{1}{2}$. If Q is vague, and it is very possible that both child has property Q, then we cannot use property Q to identify the child, and the probability tends toward $\frac{1}{3}$.

To answer the question

Mr Smith has two children, whom I've arbitrarily labeled Child A and Child B. Child A is a girl named Victoria or Child B is a girl named Victoria. What is the chance that Child A is a girl and child B is a girl?

It depends how likely the parent named both girls Victoria.

0

The difference is down to fact that the probability of a randomly chosen child of a man with two children, at least one of which is a girl, being a girl is not equal to the probability of a man with two children, a randomly chosen one of which is a girl, having at least one daughter.

You probably read that last bit and though "well that's just a tautology, if he has a girl, then he must have at least one girl". Indeed, and I'm sure we can agree that him having a girl does not mean that if he picks a child at random that it will be a girl.

This answer tries to address why you seem to have a paradox: it's simply because the information we have is different. I've ignored anything to do with children's names, we just assume that if a child is called Victoria then it is female. I use straight up probabilities, so there is no risk of introducing unintended assumptions (at least I hope not).

Let's say he has two children, A and B. We express the probability of each being female as

P(A) = 1/2
P(B) = 1/2

This encodes our assumption about any child having an equal and independent probability of being female. Now we define P(C) = 1/2, which is the probability that upon seeing this man with one child that it is child A (and so the probability of seeing child B is P(¬C) = 1-P(C) = 1/2), that is to say, there is an equal chance of seeing a man with two children with his elder or younger child when he only has one child with him.

Duly, we can define random variable D, which is the probability that when we see this gent with a child that the child is female. For this, we simply compute the probability given all possible outcomes of C:

P(D) = P(A|C) * P(C) + P(B|¬C) * P(¬C)
     = 1/2 * 1/2 + 1/2 * 1/2
     = 1/2

We can also compute the probability of some man with two children having at least one daughter, which we all agree has an prior probability of 3/4.

P(A or B) = P(A) + P(B) - (P and B) = 3/4

Intuitively, we state that the probability of A or B given D must be 1 (we can't have seen a girl if there was none to randomly select):

P(A or B|D) = 1

Now we return to my first paragraph, and find the probability of D given A or B. Behold Bayes' theorem (it's a rearrangement of P(A and B) = P(A|B)*P(B) = P(B|A)*P(B)):

P(D|A or B) = (P(A or B|D) * P(D)) / P(A or B)
            = (1 * 1/2) / P(3/4)
            = 2/3

In effect, you are more likely to have seen such a man with a girl if he has at least one daughter than you are to see him with a boy, but it isn't certain.

For completeness:

P(A and B) = 1/4

From the working above, we can quickly compute the 'answer' to original problem by invoking Bayes' again. The traditional riddle:

P(A and B|A or B) = (P(A or B|A and B) * P(A and B)) / P(A or B)
                  = (1 * 1/4) / 3/4
                  = 1/3

Your adaption:

P(A and B|D) = (P(D|A and B) * P(A and B)) / P(D)
             = (1 * 1/4) / 1/2
             = 1/2

Note that we can rewrite these to make P(A and B) the subject of each.

P(A and B) = (P(A and B|A or B) * P(A or B)) / P(A or B|A and B)
P(A and B) = (P(A and B|D) * P(D)) / P(D|A and B)

As stated in your currently accepted answer, your mistake was to assume that P(A and B|D) = 1/3, likely a result of not realising that if a man has two daughters, you are more likely to see him with a daughter than if he has only one, and that this prior probability cannot be neglected (all humans are very good at neglecting prior probabilities (i.e. sampling biases)).

A classic error for people starting out with probabilities to make is to say "I know T to be true, therefore P(T) = 1". Do not do this! Instead, you should consider situations given T. P(T) is called a 'prior' probability, and P(X|T) is a posterior probability. Your observations can never change these probabilities: you just have to work out which probability is the answer you are looking for. Again, this same neglect of prior probabilities is a massive trap that I personally have to climb out of every time I try to do stats. I'm writing this answer as much for my benefit as anyone else, because hopefully I'll not be lured into making this mistake again!

Sorry this isn't so pretty as the other answers, I'm not clear on how to use MathJax, I'll try to format it in an edit. Edit: Having reformatted it as MathJax... I thought it was considerably more ugly and harder to read, so won't be committing the edit

0

The key point as to the difference between the two problems is the hypothesis that a child is selected in a gender neutral fashion; so


We can compute conditional probabilities

$$P(\text{Victoria has a sister} | \text{ the announcement}) \\= \frac{P(FF \text{ and the announcement})}{P(\text{the announcement})} \\= \frac{P(\text{the announcement} | FF) P(FF)}{P(\text{the announcement})}$$ $$P(\text{Victoria has a brother} | \text{ the announcement}) \\= \frac{P(\text{the announcement} | MF) P(MF)}{P(\text{the announcement})}$$

where "the announcement" is the proposition that you heard Mr. Smith make that announcement.

Now, we know that $P(FF) = 1/4$ and $P(MF) = 1/2$ (note that we still have $P(MM) = 1/4$; it's just that $P(MM | \text{the announcement}) = 0$). Thus, the relative probabilities of the two cases are:

$$R = \frac{P(\text{Victoria has a sister} | \text{ the announcement})} {P(\text{Victoria has a brother} | \text{ the announcement})} = \frac{1}{2} \cdot \frac{ P(\text{the announcement} | FF)} { P(\text{the announcement} | MF)} $$

So, we have to judge how the sex of the children influence the probability that Mr. Smith would make that announcement. One way is to separate the announcement into two parts:

  • Scholarship - referring to Mr. Smith announcing a child got a scholarship
  • Victoria - that the child referred to is named Victoria

We can rearrange the conditional probabilities into

$$ P(\text{the announcement} | FF) = P(\text{Victoria} | \text{FF and Scholarship}) P(\text{Scholarship} | FF)$$

and similarly for MF. Consequently,

$$ R = \frac{1}{2} \cdot \frac{P(\text{Victoria} | \text{FF and Scholarship}) P(\text{Scholarship} | FF)}{P(\text{Victoria} | \text{MF and Scholarship}) P(\text{Scholarship} | MF)}$$

At this point, the following hypotheses seem reasonable enough:

Hypotheses:

  • $P(\text{Scholarship} | FF) = P(\text{Scholarship} | MF)$
  • $P(\text{Victoria} | \text{FF and Scholarship}) = 2 \cdot P(\text{Victoria} | \text{MF and Scholarship})$

Therefore $$\frac{P(\text{Victoria has a sister} | \text{ the announcement})} {P(\text{Victoria has a brother} | \text{ the announcement})} = 1 $$

thus confirming the probabilities should be one-half.

0

Not sure , as there were so many answers, if it was answered or not.

Neither 1/2 nor 1/3 are correct. The chance for him having a 2nd daughter is 1/4.

This is about unordered sampling and you are confusing loads of various sample sets and dismissing certain things.

Disregarding twins, etc, and assuming equal frequencies for boys and girls, Mr. daddy has the following 4 outcomes for his two children:bb, bg, gb, gg.

So 1/4 for two boys, 1/2 for a boy and girl, (your questions about older/younger is about ordered sets) and 1/4 for two girls.

You come along and ask him, and meet him and his daughter. This has no affliction upon his existing children and thus the probability of him having 2 girls, you can easily surmise to have been 1/4, so it still is 25% chance for being a girl.

Mal
  • 3
  • I don't think that you may completely neglect the information that you just met a daughter of him. Imagine that afterwards you also met his other daughter - is the probability for two girls still 25% then (since meeting both daughters still has no affliction upon the existing children)? As David K pointed out in his answer, the term "probability" might not be completely correct here (since the sex of the children is already defined before any information is given), but statistics change if you get to know bits of information. – Thern Feb 20 '17 at 08:15
  • No Nebr. You are thinking of something else. The difference of the two problems are easier understood as this: Two persons choose a number from 1to 100 mentally, write down the number in an envelope. What is the original chance that they chose the same number? Now , one of them gets to open the other envelope, and thus knows if they both did choose the same or not. He then proceeds to remove all but two balls. Both chosen numbers remain. The 2nd lad doesn't know if his number which remains is the same as the other or not but is NOW asked if he wishes to CHANGE his ball to the other ball to – Mal Feb 21 '17 at 05:50
  • ...match the first one's ball, which was the aim. By not switching the chance 1 %, by accepting to swap, his chance is close to 1. This is what you are mixing up between the two situations. One AFFECTS the chances, (which is what you have been doing incorrectly), whereas the situation you posed gives the answer 25% that his other child is namely a girl. – Mal Feb 21 '17 at 05:57
  • Your comment about seeing the full sample set and still wondering if [I think] it would be 25% is entirely wrong.. I am sure you can see that? That is no longer statistics. What you are asking then , is like as lim #samples -> inf, P(2 girls) -> 1/4. Yes. The more samples we take, statistics will assume probability of coin flip will converge to 1/2, just as the amount of girl/girl pairs will converge to 1/4. Just because people write endlessly complex and long answers, just goes to show how misunderstood simple things can be. – Mal Feb 21 '17 at 05:58
  • So you are telling me, if I meet a man with 1000 children and I get to know 999 of his daughters, the probability that the last children is a girl is (1/2)^1000? – Thern Feb 21 '17 at 08:07
  • NO matter how many of his children you meet or not, this has nothing to do with the outcome of his children, which when we assumed was 50/50 for g.b, right? What is it you don't understand about that? and yes, the chance of last child also being a girl is 1/2^1000 although you would be quite a champ (or he rather) if you would have even gotten to 998 with only girls :) But sounds like you finally are grasping it.. or did you still think, that it's 50% chance that the last child is also a girl? – Mal Feb 21 '17 at 08:12
  • You are confusing the probability of winning the lottery with the notion : Either I win the lottery, or I don't, therefore it is 50% :/ – Mal Feb 21 '17 at 08:12
  • I think you overlook that knowing about his daughter indeed affects chances. What is the probability that he has two sons when I know about Victoria? 25%? Either you say yes now - which is obviously absurd - or you have to admit that the original distribution of {25%, 25%, 25%, 25%}, from which you derived your conclusion that it must be 25%, can't hold any longer. You seem to think that knowing about a daughter of him is irrelevant. But it obviously isn't. – Thern Feb 21 '17 at 09:19
  • Oh, you are wondering what YOUR chances of guessing what the 2nd child is, rather than what THE CHANCES are that it is a girl? That was not what you stated in your question. This has to do with the example I was using earlier. In the case of you guessing correctly, you have a 1/3 chance to guess correctly. And as all the bla blaing about indicates, sure with more information, like older or younger could have changed guessing chances, hence Guess is a function of information yes. – Mal Feb 22 '17 at 00:48
  • Funnily, guessing in real-world situations should typically yield a 1/2 chance, not 1/3 (see the answer of Bram28). That the a priori chances of two children being both girls is 25%, is trivial. – Thern Feb 22 '17 at 08:51
0

"But wait! What if I ask Mr. Smith first, if Victoria is his elder daughter? Assume his answer is yes (and ignore any problems with twins - even then one is typically a few seconds "older" than the other). So now I know that from the cases (F/F, F/M, M/F), M/F also drops out. And now, the probability for F/F just rose to 1/2."

If Smith says Victoria is his elder daughter, the other child could be a boy of any age, or a girl that is younger than Victoria. If Smith says Victoria is NOT his elder daughter, the other child must be a girl that is older than Victoria. So a "yes" answer allows GG, GB, or BG. A no answer only allows GG.

How many bits of information are transmitted with the answer to this question?

  • Why must the other child be a girl? If Victoria is NOT his elder daughter, she must be his younger daughter, which still leaves room for an elder brother. – Thern Feb 23 '17 at 07:41
-1

For me, probability theory is not applicable here in the "pure form": you cannot ask about probabilities of actual facts of life.

What is the probability that Napoleon had blue eyes according to what you know? There is no "probability" here: he either had blue eyes, or he hadn't, regardless of what you know. What is the probability I am typing this from my office on Saturday evening? What is the probability that Higgs boson exists? What is the probability that your wife/husband is cheating on you?

The problem would make more sense in the following form (for example): suppose you know that someone has had a daughter, what is the probability that their next child, if there will be one, will be a daughter as well?

You might argue that we can view the problem practically, as if someone tells you that someone else has two children, of which one is a daughter, and asks you to place a bet on the sex of the other child. But in this settings surely you should take into account the motives of the person who asks you this, because probably they wouldn't be asking if there were not good chance that you would lose your bet.

More formally: the probability space in such questions about past events or actual facts of life is not well defined, and most definitions would be practically irrelevant. For past events and actual facts of life one uses statistics, not probability theory.


Update.

After much criticism my point of view and interpretation of the problem have received, let me mention that I am also aware of the following simple probabilistic interpretation.

Consider the totality $X$ of all people with 2 children of which at least one is a daughter. This is about 3/4 of the totality of all people with 2 children. Now choose a person in $X$ at random. Then the probability that both their children are girls is approximately 1/3.

Alexey
  • 2,124
  • what is the probability that their next child, if there will be one, will be a daughter as well is NOT the same question. – Bohemian Feb 18 '17 at 15:39
  • Of course not, i explained why the original question makes no sense, while this one does. – Alexey Feb 18 '17 at 15:41
  • I can't believe I'm having to spell this out, but you introduced that statement with The problem would make more sense in the following form, but what followed is a different problem. So it doesn't "make more sense", because it changes the problem. You could say "A problem that would make more sense would be...", but again it's all irrelevant because it does nothing to answer the question that was asked. – Bohemian Feb 18 '17 at 15:52
  • Well, IMO you cannot save the original problem, so if you want to make it meaningful, you'd have to restate it, for example in this form (if this vaguely resembles the practical question you are interested in). As the problem is meaningless, its different meaningful "forms" can be mutually inequivalent. – Alexey Feb 18 '17 at 15:56
  • Comments are not for extended discussion; this conversation has been moved to chat. – user642796 Feb 19 '17 at 09:55
-1

If the father is chosen randomly(among those with 2 children) the probability he's a parent of 2 daughters is obviously 1/3. (each of the combinations from F/M, M/F, F/F set is equal).

"But it is very interesting that solely from the sentence "Victoria just got the scholarship she wanted!" I can NOT infer that Mr. Smith is indeed chosen from this uniform distribution." - this doesn't change the 1/3 result (nor any other event, like "my daughter has broken her leg"). You know upfront, without even starting talking with Mr.Smith, that with probability: Event_prob * 1/3 * 2 == Event_prob * 2/3 * 1 you gonna hear this information. (2 : 2 Fs in MF/FM : FF) (that is, regardless of which group Mr.Smith represents)

janek
  • 1
  • "nor any other event" - this is not correct. It can be shown - with an analogue reasoning - that the sentence "Victoria is born on a Friday" changes the probability of a second daughter to 13/27 - at least if you accept the reasoning for the 1/3 probability (which is fine in a specifically set up mathematical situation, but not in real life). – Thern Feb 20 '17 at 16:11
  • This is really a basic math: the main point I'm trying to make is that the crucial thing here is to distinguish between 2 events here: meeting a farther and getting some information originating from evenly distributed event space (scholarship in this case). The lack of this distinction leads to confusion of 1/3 vs 1/2 and some strange explanations. Simply put: unordered pairs (M,F) is twice as many as FF (let's call it amount), but each unordered pair (M,F) has twice less Fs than FF (frequency). After multiplication: amount * frequency you get the same value. – janek Feb 20 '17 at 17:23
  • To make even more clear paralell: imagine you have 2 boxes of sth, each containing 1 kg, and 1 box with 2 kg. Together they amount to 3 boxes, but each group weighs 2 kg. Each time you randomly choose one box it's 1/3 prob of that 2 kg and 2/3 1kg, but both sizes (when groupped together) still weigh the same. – janek Feb 20 '17 at 17:24