0

The question is: A test for heart disease results in a false positive 5% of the time. 25% of the population has heart disease and 20% test positive. Given a negative test, what is the probability the patient does not have heart disease?

Let N=the event that a person doesn't have heart disease, -=the event that a person tests negative for heart disease

$$P[N|-]=\frac{P[N \cap -]}{P[-]}=\frac{P[-|N]P[N]}{P[-]}=\frac{.95*.75}{.8}=.89$$

However, that is wrong, and I am confused as to why.

The correct solution is as follows:

Let event H be heart disease, N be no heart disease and event + be a positive test. $$(N)=(N|+)∗(+)+(N|−)∗(1−(+))$$ $$.75=.05∗.20+∗(1−.20)$$ $$.75=.01+.80$$ $$.74/.80=$$ $$=.925%$$

To be clear, this solution makes sense to me. My issue is that I don't understand why my original solution was wrong.

I suppose key to understanding why I was wrong is, is the probability of a true negative equal to 1 minus the probability of a false positive?

elbecker
  • 138
  • 9
  • Given $10000$ people, we know that $2500$ actually have the disease and $7500$ don't. We also know that $\frac {7500}4=1875$ people test positive but do not have the disease, and that $2000$ people test positive for the disease, and we can use all of that to compute the probability that a person with the disease tests negative for it. – lulu Dec 24 '23 at 18:33
  • @Lulu thank you! But do you mind explaining to me why my solution is wrong? I am just wanting to see my error, thank you – elbecker Dec 24 '23 at 18:34
  • I don't think you are wrong. I get the same result you get. Why do you think it is wrong? – lulu Dec 24 '23 at 18:39
  • @Lulu according to my source, the correct answer is 92.5% https://analystprep.com/actuarial-exams/soa/exam-p-probability/soa-exam-p-probability-practice-questions/ – elbecker Dec 24 '23 at 18:40
  • There are four types of people (true positive, false positive, true negative, false negative). Using my description, with $10000$ people in total, I get these as $(1625,375,7125,875)$ which makes the computation: $\frac {7125}{7125+875}=.890625$ – lulu Dec 24 '23 at 18:42
  • to be clear: there's no significance to the $10000$....I just find that it sometimes clarifies things to work with an explicit population. – lulu Dec 24 '23 at 18:44
  • @lulu See https://math.stackexchange.com/questions/2279851/applied-probability-bayes-theorem/2279888#2279888 – Ethan Bolker Dec 24 '23 at 18:59
  • What does “results in a false positive 5% of the time” mean? Does it mean that 5% of all tests administered result in false positives? Or that 5% of the tests administered to people without heart disease result in false positives? These are quite different. – mjqxxxx Dec 24 '23 at 19:00
  • Or yet a third interpretation: that 5% of all positive tests are incorrect. – mjqxxxx Dec 24 '23 at 19:12
  • @lulu You are assuming that "A test for heart disease results in a false positive $5$% of the time" means the false positive rate is $5$%. It does not. It means either: $5$% of positive tests are false positives (the answer says) or $5$% of all tests are false positives. Either way, neither of these are false positive rates. – Robert Murray Dec 24 '23 at 22:54

2 Answers2

4

To begin with, the problem statement is a little bit ambiguous. "A test for heart disease results in a false positive 5% of the time." Is that $5\%$ of all tests or $5\%$ of the tests performed on people without heart disease? A literal reading suggests the first interpretation, but the second interpretation is what is usually measured. So let's assume the second interpretation.

Translating this interpretation into a formula. $$ P(+ \mid N) = 0.05. $$

The "correct" solution has instead assumed that $P(N \mid +) \stackrel?= 0.05$, which I cannot see any way to justify.

I think the resolution to the discrepancy here is that the "correct" solution is incorrect and that your "incorrect" solution is correct.


OK, I think I see the interpretation of the "correct" solution. They apparently thought that "A test for heart disease results in a false positive 5% of the time" means that $5\%$ of all positive results are false. I still think this is an absurd interpretation.

David K
  • 98,388
2

The other answer does not explain how to read the solution (because its a bad problem) so here is how you read it:

A test for heart disease results in a false positive 5% of the time.

The other answer claims we should intepret this as $\mathbb{P}(+|N) = 0.05$. We definitely could, which would make sense if it said the false positive rate was $5$%, but it doesn't. That is, however, the natural reading of this, and if I didn't have the answer in front of me I would agree.

Instead, it wants us to interpret this as a positive test for heart disease is false $5$% of the time. Why? I believe by "test for heart disease" it means a test on heart disease that returns positive. This is within logical limits, since it never uses "test for heart disease" again, but it still should not be seen as a normal interpretation. But this gives us $\mathbb{P}(N|+) = 0.05$. This is also known as the precision rate of a test, which is fundamentally different from the false positive rate.

As a note, you read this as the false positive rate, but even if we didn't read it in the wacky way the problem wants to, it doesn't tell us the false positive rate. The other way to read this is that out of all tests, $5$% are false positives. Not out of all tests on patients without heart disease, which is what a false positive rate tells us. So, while your interpretation is much, much more understandable, it is still technically incorrect.

25% of the population has heart disease and 20% test positive.

This part makes slightly more sense as first glance, but it is still tricky. It says $20$% of tests are positive, meaning $80$% of tests are negative. Also, $25$% of the population has heart disease and $75$% does not.

We are finally ready to read this question: $$\mathbb{P}(N) = \mathbb{P}(N|+) \cdot \mathbb{P}(+) + \mathbb{P}(N|-) \cdot \mathbb{P}(-) \\ 0.75 = 0.05 \cdot 0.2 + \mathbb{P}(N|-) \cdot 0.8$$

is the probability of a true negative equal to 1 minus the probability of a false positive?

No, but also this doesn't appear in the solution. The solution uses the law of total probability for conditional probabilities, where if $B$ and $B^c$ are complementary events ($\mathbb{P}(B) + \mathbb{P}(B^c) = 1$): $$\mathbb{P}(A) = \mathbb{P}(A|B) \cdot \mathbb{P}(B) + \mathbb{P}(A|B^c) \cdot \mathbb{P}(B^c) \\ \mathbb{P}(B^c) = 1 - \mathbb{P}(B)$$

In sum, this is a poorly written problem. You should not worry about it. The answer is unattainable via any realistic interpretation of the problem. However, you should also be careful as to reading what a false positive rate is.

Edit: It seems this is a free question from AnalystPrep, meaning it is likely intentionally misleading. Looking at the other questions readily available on their website, a lot of these problems are designed to be misread. For example, from:

60% of an insurer’s policyholders are male and 40% are female. The chance of a male having a claim is twice the chance of a female having a claim.

You apparently should find that $\mathbb{P}(\text{male and a claim}) = 0.6 \cdot 2x$, which does not read well since, the chance of a male having a claim being twice the chance of a female having a claim would be $\mathbb{P}( \text{claim} | \text{male}) = 2 \cdot \mathbb{P}(\text{claim} | \text{female})$. These problems are written so that it can be read in many ways, and only one being correct, not nessecarily the most natural.

Overall, if you want to study for an actuarial exam, I recommend books and not websites where you cannot know the validity of them before purchasing. To my knowledge, The International Series on Acturial Science generally has good books on each exam, but I am not an Actuarian nor have I been or will be, so I do not pretend to actually know.

As a final final note: Do not trust free problems on websites. Why? If you get $100$% on the websites free practice questions, how likely are you to purchase the rest of their offerings? Very unlikely, since it seems to easy for you. If you get a $75$% due to misleading questions, you would be more likely, since you feel you could improve using these questions!

  • As you say, if you didn't have the answer in front of you you would read the "$5%$ of the time" as a false positive rate. What that says to me is that the answer is as wrong as any other answer that contradicts the natural reading of the problem statement. Also, thanks for informing me about "precision rate". Is that term used anywhere outside the realm of machine learning, data science, and associated fields? – David K Dec 24 '23 at 23:15
  • @DavidK Precision rate is a term in statistics in general, but outside of statistics nobody ever pays attention to these things so it's not really used. On your first statement about the answer being wrong, I initially disagreed until I saw the source of the problem, in which case I agree since the question is intentionally misleading. But, this also does not mean the question should be read as a false positive rate. We (and I suspect everyone who reads this the first time) are incorrect on our first read. I think the best interpretation is precision rate. – Robert Murray Dec 24 '23 at 23:44
  • I still have not found a use of "precision rate" in this sense outside the ML/AI context. In general statistical usage, there is the term positive predictive value, which is a lot easier to look up and means the same thing. That's the term used in medical diagnosis (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4608333/). – David K Dec 24 '23 at 23:52
  • Nice work finding the source of the problem! You make a good case for believing that the problem writer was likely being intentionally misleading, which puts the whole problem in context. +1 – David K Dec 25 '23 at 00:16