0

You have a database of $25,000$ potential criminals. The probability that this database includes the art thief is $0.1$. In a stroke of luck, you found a DNA sample of this thief from the crime scene. You compare this sample with a database of $25,000$ men. And lo and behold, a match is found! You are well aware that DNA matches are not always perfect: if you pick two different persons at random, the chance that their DNA samples would match using the current testing techniques is $1$ in $10,000$. What's the probability that the database includes the art thief, given that a DNA match has been found?

I'm sure this question uses Bayes' theorem where $A=$ the database includes the art thief, $B=$ a DNA match has been found. I need to find $P(A|B)= P(B|A)*P(A)/P(B)$ to calculate $P(B)$, there are different cases, 1). the database does include the thief, and a person matches with the DNA from the scene. 2). the database doesn't include the thief, but a person matches with the DNA.

I'm not really sure how to calculate 1) and 2). Please help me.

I also want to know if the probability of two people matching is $1/10000$, is the probability of two people not matching $9999/10000$?

RobPratt
  • 45,619
  • Related, perhaps helpful: https://math.stackexchange.com/questions/2279851/applied-probability-bayes-theorem/2279888#2279888 – Ethan Bolker Aug 08 '21 at 14:33

2 Answers2

1

Given the wording of the question, I think your formulation of the problem is reasonable. However you're not really given enough information to evaluate $\ P(B)\ $ properly. Was the sample from the crime scene merely tested against random entries in the database until a match was found, for instance, or was it tested against every entry in the database? In the latter case, the exact number of matches found is vital information which should be used to evaluate the posterior probability that the thief's DNA is in the database. Since you're not given that information, I think you can reasonably assume that something like the former procedure was used. In that case, $\ B=$ at least one entry in the database matches the thief's DNA.

If you also assume that when the thief's DNA is in the database it is certain to match the DNA from the crime scene, then you have $$ P\big(B\,\big|A\big)=1\ . $$ On the other hand, when the thief's DNA is not in the database, each entry in the database presumably has an independent probability of $\ \frac{1}{10000}\ $ of matching the DNA from the crime scene, and a probability of $\ \frac{9999}{10000}\ $ of not matching it. Therefore, \begin{align} P\big(B\,\big|A^c\big)&=1-P\big(B^c\,\big|A^c\big)\\ &=1-\Big(\frac{9999}{10000}\Big)^{25000}\\ &\approx0.918\ ,\\ P(B)&=P(B\,|A)P(A)+P\big(B\,|A^c\big)P\big(A^c\big)\\ &\approx1\times0.1+0.918\times0.9\\ &=0.9262\ ,\\ P(A|B)&=\frac{P(B\,|A)P(A)}{P(B)}\\ &\approx\frac{0.1}{0.9262}\\ &\approx{0.108}\ . \end{align}

For completeness, here is the calculation for the case when the thief's DNA was tested against the whole database, and exactly $\ n\ $ matches were found. Call this event $\ B_n\ $.

If the thief's DNA profile is in the database, then the probability that exactly $\ n\ge1\ $ matches will be found is just the probability that exactly $\ n-1\ $ matches will be found with the $\ 24999\ $ other potential criminals in the database. Thus, $$ P\big(B_n\big|A\big)={24999\choose n-1}\frac{1}{10000^{n-1}}\Big(\frac{9999}{10000}\Big)^{25000-n}\ . $$ If the thief's DNA profile is not in the database, then the probability that exactly $\ n\ge1\ $ matches will be found is just the probability that exactly $\ n\ $ matches will be found with the $\ 25000\ $ potential criminals in the database (none of whom is the thief). Thus $$ P\big(B_n\big|A^c\big)={25000\choose n}\frac{1}{10000^n}\Big(\frac{9999}{10000}\Big)^{25000-n}\ . $$ With a little bit of elementary arithmetic, it follows that \begin{align} \frac{P\big(B_n\big|A\big)}{P\big(B_n\big|A^c\big)}&=\frac{n}{2.5}\ ,\\ \frac{P\big(B_n\big|A\big)P(A)}{P\big(B_n\big|A^c\big)P\big(A^c\big)}&=\frac{n}{9\times2.5}\\ &=\frac{n}{22.5}\ , \end{align} and hence \begin{align} P\big(A\,\big|B_n\big)&=\frac{P\big(B_n\,\big|A\big)P(A)}{P\big(B_n\big)}\\ &=\frac{P\big(B_n\,\big|A\big)P(A)}{P\big(B_n\,\big|A\big)P(A)+P\big(B_n\,\big|A^c\big)P(A^c)}\\ &=\frac{P\big(B_n\big|A\big)P(A)}{P\big(B_n\big|A^c\big)P\big(A^c\big)}\Bigg(\frac{P\big(B_n\big|A\big)P(A)}{P\big(B_n\big|A^c\big)P\big(A^c\big)}+1\Bigg)^{-1}\\ &=\frac{n}{22.5+n}\ . \end{align} For $\ n=0\ $, $\ P\big(B_0\,\big|\,A\big)=0\ $, and $\ P\big(B_0\big)\ne0\ $, so $\ P\big(A\,\big|\,B_0\big)=0\ $ also.

lonza leggiera
  • 28,646
  • 2
  • 12
  • 33
0

** Recast answer(12 Aug) **

Answer recast to include possibility that a wrong match may be found even if thief is in the database, and to present with greater clarity and simplicity through a contingency table

If the probability of a match being wrong is $P=1/10 000$, that of it being correct must be $P=9999/10 000$

I'll also assume that

  • all $25,000$ are lined up and tested
  • exactly one match is found

Also, even if the thief is present $(T)$,the match might not be with the thief but with someone else, say $X$

Drawing up a contingency table,

$\quad\quad\quad\quad\quad\quad T\quad|\quad T^c$

Matches $T\quad \;\,A \quad\quad\; C$

Matches $X\quad \;\,B \quad \quad\;D$

$A = 0.1*0.9999*0.9999^{24999}= 0.82075$%

$B = 0.1*.0001*24999*.0001*.9999^{24998}=0.00021$%

$C = 0.9*0 = 0$%

$D = 0.9*0.0001*25000*0.9999^{24999}= 18.46866$%

Finally, P(thief present | match found)

$=\dfrac{A+B}{A+B+C+D} = \dfrac{0.82075+0.00021}{0.82075+0.00021+0+18.46866}$

$\boxed{\approx 4.26\%}$

  • Do you mean "two different persons match" or "two same DNAs match" by "event DNA matches" – serendipity0217 Aug 07 '21 at 07:30
  • If the thief is not in the database, the probability of matching will follow that of a random population – true blue anil Aug 07 '21 at 07:37
  • is (|) (9999/10000)^24999? is (|) 1/10000*(9999/10000)^24999? – serendipity0217 Aug 07 '21 at 08:57
  • See "added" portion of answer – true blue anil Aug 07 '21 at 09:35
  • So the probability of matching DNA when the thief is in the database is 1? – serendipity0217 Aug 07 '21 at 10:53
  • Yea, that's what I'm assuming, it obviously can't be $>1$, and they have not given any Pr for two readings from the same person matching. – true blue anil Aug 07 '21 at 11:02
  • I agree. But I feel like I need to multiply by (9999/10000)^24999 and 25000, since the rest don't match with the DNA, and there are C(25000, 1) possibilities? – serendipity0217 Aug 07 '21 at 12:50
  • sorry, I think it should multiply by (9999/10000)^24999 only. because there should be only one specific person that matches with the thief's DNA and so there aren't 25000 possibilities like the second case. – serendipity0217 Aug 07 '21 at 13:48
  • If you do that you'd get a probability of ~1/25000 which obviously too low . Personally, I am of the opinion that the question is vague and ill formulated – true blue anil Aug 07 '21 at 14:35
  • Please note that if $\ M\ $ is the event that exactly one match has been found, then \begin{align} P(M|D)&=1\times\Big(1−\frac{1}{10000}\Big)^{24999}\ &\approx0.08\ &\ll1 \end{align} More generally, if $\ M_n\ $ is the event that the thief's DNA matches exactly $\ n\ (,\ge1)\ $ of the entries in the database, then $$ \frac{P\big(M_n,\big|,D\big)}{P\big(M_n,\big|,D^c\big)}=\frac{n}{2.5}\ , $$ and $$ P\big(D,\big|,M_n\big)=\frac{n}{22.5+n}\ . $$ – lonza leggiera Aug 08 '21 at 04:24
  • Hi Ionza, I want to know whether I have to multiply by 25000 or not in the case when the thief is not in the database. If I multiply by 25000, the final answer for this question would be around 0.04. If not, the answer would be 0.991. – serendipity0217 Aug 08 '21 at 05:51
  • Pl. see the recast answer which typos had ruined ! (After effect of Olympic watch !) – true blue anil Aug 14 '21 at 15:34