4

I have a dataset that gives information of a population. For instance, I know the fraction of people that are males (M) and that are within a certain age range (A), P(M & A), and then I know the fraction of males that live in a certain area (L), P(M & L).

What I'm interested in computing is P(M & A & L), which is the fraction of people that are males, are within a certain age range and live in a certain area.

Using Baye's formula I can say that

P(M & A & L) = P(M & A | L) P(L)

But my dataset only gives P(L) and not P(M & A | L). However, if I assumed that M & A and L are independent I have

P(M & A | L) = P(M & A) P(L)

How large is the error on P(M & A | L) if I make this assumption. Do you know of any other method I could use to estimate P(M & A | L) without assuming independence?

Brian
  • 143
  • 2

1 Answers1

1

Bayes Theorem applies to conditional probabilities. $ P(A|B) = \frac{P(A).P(B|A)}{P(B)} $

The question you have posed is one of multiple events occurring together. If the events can be considered to be independent of each other, then,

P(A & B & C) = P(A).P(B).P(C)

In your question above,

P(M & A & L) = P(M & A).P(L) = P(M).P(A).P(L)

Based on the data provided by you, you should have these probabilities or can derive them by appropriate summations.

Let's say that .1 probability of males are between 1-10 years of age. Further, probability of males in location L is 0.2. So the probability of males in 1-10 years of age in Location L is 0.1 X 0.2 = 0.02 . This is assuming that all locations have the same probability distribution of age ranges.

Ash
  • 181
  • 1
  • 5
  • thanks, but this is not what I'm asking. I'm interesting in knowing P(M & A | L) and I know that M and A are dependent (I also know P(M & A)), but I don't know if L depends on M & A, so they only thing I could do was to assume P(M & A | L) = P(M & A) P(L). So now I need to understand how bad this assumption is. – Brian Nov 18 '15 at 09:02