1

Okay, so I was asked this question in an interview on a machine learning expert position. To be honest, the question itself (and the hint by the interviewer) seemed quite ill-phrased, which probably is the reason I ended up failing the interview, and he thought I must be super dumb. Here is the original question.

You know your colleague has two kids, and also know one of them is a boy. What is the probability that the other one is a boy too?

I was a bit puzzled, then he gave me a hint, by asking me to use Bayes' theorem, which I knew from high school $$\mathbb{P}(A\cap B)=\mathbb{P}(A|B)*\mathbb{P}(B)$$ I could see that given one kid is a boy corresponds to event $B$, but could not really figure out the other quantities.

To confuse matters, he gave me hints like when you see people with two kids, most of the times it is a boy and a girl, right? I could not argue with him, obviously, but I cannot reach any such conclusion based on my personal observation either.

I tried to tell things like

  • to calculate it we need empirical data like survey of all couples having two kids in the city/country etc.
  • absent other information, the second child has the same probability of being a boy as the percentage of males in the country, assuming each kid's gender is independent

But seems she had some assumption about the scenario (that meant the problem can be solved purely mathematically) that I failed to clarify. Upon further thought, there may be some biological concepts on how chromosomes interact to decide the gender of the second kid (and whether it is biased one way or another), but that is hardly fair to expect from an ML engineer. Is that where the answer lies?

But the reason for this post is not to complain, but I am giving the context, just to ask what exactly am I missing in the question assuming it is meant to be a probability (and not biology) question.

Della
  • 867
  • 1
    The question should have been, “At least one is a boy. What is the probability both are boys?” The prior is that any kid has a $50$% likelihood of being a boy or a girl. – Robert Shore Feb 16 '24 at 07:50
  • @RobertShore how do you get the prior? – Della Feb 16 '24 at 07:52
  • Was the reason for your being puzzled initially, the fact that you were not able to confidently assert that the interviewer was looking for the probability assuming that boys and girls were equally likely? That's why you asked questions about empirical observations and so on, and in fact I'd actually have appreciated that. After all, machines are fed tons of empirical information from which they learn. For example, given data, a machine would accurately predict the probability of a girl/boy being born and then find the answer to this question. I'm baffled by the interviewer's response. – Sarvesh Ravichandran Iyer Feb 16 '24 at 08:07
  • I mean, you could still have answered the question by saying "assuming that a girl and a boy are equally likely to be born, I would proceed as follows..." but you aren't missing anything, in fact I'd argue the interviewer is missing the broader picture and purpose of ML. – Sarvesh Ravichandran Iyer Feb 16 '24 at 08:09
  • The issue is whether you come up with an answer at or near $\frac12$ or at or near $\frac13$ , and that is about understanding the wording of the question rather than worrying about biology. This has been asked here many times before and also has a Wikipedia article. – Henry Feb 16 '24 at 09:08

2 Answers2

3

I find nothing ambiguous about the question. You are told that:

  • Your colleague has two children
  • One of the children is a boy

You also can reasonably assume that

  • Boys and girls are equally likely and comprise the only options for the sex of children
  • The probability of a given child being a boy does not affect the probability of any other child being a boy

The answer follows readily from the above: we enumerate all possible equiprobable outcomes of the sex of two children:

$$(B,B), (B,G), (G,B), (G,G)$$

where the two children are identifiable. You are given that one child is a boy; thus your colleague's children would be one of the three equiprobable outcomes $$(B,B), (B,G), (G,B).$$

Of these, only one case has the other child be a boy, thus the desired probability is $1/3$.


I do not think the question is ambiguous in the matter of whether it should be specified that at least one child is a boy. If I say I have two fair coins, say a quarter and a nickel, and I flip them, observe the result, and then tell you that one of them is heads, that should not automatically imply to you that the other must be tails. If I had said "I obtained one head," that is more ambiguous, because it implies that the total number of heads obtained is one. But saying "one is heads" is not implying anything about the other coin.


It is perhaps counterintuitive that, had the colleague said to you instead "my eldest child is a boy," that the probability of the other child is also a boy is $1/2$, not $1/3$, since in such a case, the only permissible outcomes are $$(B,B), (B,G),$$ if we take the first element in the ordered pair to be the sex of the older child and the second element is the sex of the younger child. Note that $(G,B)$ is no longer permitted because we were told that the eldest child is a boy. This is a well-known paradox in which others disagree about the interpretation of the original question as stated.

heropup
  • 135,869
  • It does not seem like the WP article evolves mostly around "my eldest child is a boy" (which they call First Question). The boy girl paradox is about the interpretation of "at least one of which is a boy" in the question from OP. The interpretation leading to $1/3$ is the one I understand. As an interviewer I would not ask for more. The WP article seems to have a hard time getting the other interpretation across, at least to me. – Kurt G. Feb 16 '24 at 09:30
2

Given your colleague has two kids, and hiddenly adopting binary biological sex (each kid is either a boy (B) or a girl (G)), the sample space is

$$\Omega=\{(G,G), (G,B), (B,G), (B,B) \}$$

with the following probability measure:

$$\mathbb P ((G,G))=p^2, \mathbb P ((G,B))=\mathbb P ((G,B))=p(1-p), \mathbb P ((B,V))=(1-p)^2$$

where $\color{blue}{p}$ denotes the probability that a kid is girl (hiddenly assuming the sex of each kid is independent from each other).

After being informed that one of the two kids is boy, i.e., the event $B_1=\{(G,B), (B,G), (B,B) \}$ , the conditional probability that the other is also boy, i.e., the event $B_2=\{(B,B) \}$, is given by

$$\mathbb P (B_2 | B_1)=\color{blue}{\frac{(1-p)^2}{1-p^2}}.$$

The parameter $\color{blue}{p}$ needs to be estimated based on some data set. As your location is Turkey now, from this Turkish official website, we have

According to birth statistics; the number of babies born alive in 2020 was 1 million 112 thousand 859. 570 thousand 892 of them were boys, and 541 thousand 967 of them were girls. 97.1% of the babies born alive were single births, 2.9% were twins, and 0.1% were triplets or more.

Hence, as the sample size is 1 million 112 thousand 859, a highly accurate estimate of $\color{blue}{p}$ based on 2020 data is $\color{blue}{0.4870041937}$ (note that it very close to 48.7%, the portion of the female child population of Turkey). Hence, the probability is

$$\mathbb P (B_2 | B_1)=0.3449861194.$$

You can see that though it is somehow close to $\frac{1}{3}=0.3\bar{3}$, obtained by assuming $p=\frac{1}{2}$; it is at least 0.01 greater than $\frac{1}{3}$.

Amir
  • 4,305