4

I am following through the Hypergeometric distribution:

The probability that we select a sample of size $n$ containing $r$ defective items from a population of $N$ items known to contain $M$ defective items is

$P(X = r) = C(M,r) * C(N-M,n-r) / C(N,n)$

where C(P,Q) is the combination of P items taken Q at a time.

Explanation on the above equation:

(a) we may select $n$ items from a population of $N$ items in $C(N,n)$ ways - understood

(b) we may select $r$ defective items from $M$ defective items in $C(M,r)$ ways - understood

(c) we may select $n−r$ non-defective items from $N−M$ non-defective items in $C(N−M,n−r)$ ways -did not understand

(d) hence we may select $n$ items containing $r$ defectives in $C(M,r) * C(N−M,n-r)$ ways -did not understand

Why both (b) and (c) must be considered and those factors got multiplied in (d)

Can anybody explain the hypergeometric distribution derivation in simple terms.

The above material is taken from here : The Hypergeometric distribution

Vinod
  • 2,209
  • 1
    (b) and (c) are practically the same. Can you describe essential differences that would explain your non-understanding of (c)? – drhab Nov 19 '14 at 09:37
  • @drhab I have updated my question: 'Why both (b) and (c) must be considered and those factors got multiplied in (d)' – Vinod Nov 20 '14 at 01:53

1 Answers1

5

Let's do it with an example: $N=5$ objects from wich $M=3$ are defective and $N-M=2$ are not defective. $n=3$ items are selected. What is the probability that $r=2$ of them are defective?

Set of objects: $\{D_1,D_2,D_3,N_1,N_2\}$. There are $C(5,3)=10$ ways to take $3$ out of $5$ ((a) understood).

Looking only at defective there are $C(3,2)=3$ ways to take out $2$ ((b) understood). Actually we have the possibilities: $D_1D_2$, $D_1D_3$ and $D_2D_3$.

Looking at non-defectives there are $C(2,1)=2$ ways to take out $1$ ((c) not understood). Actually we have the possibilities: $N_1$ and $N_2$.

That means that we have $3\times2=6$ possibilities for taking out $2$ defectives (and automatically one non-defective) wich are:

  • $D_1D_2N_1$
  • $D_1D_3N_1$
  • $D_2D_3N_1$
  • $D_1D_2N_2$
  • $D_1D_3N_2$
  • $D_2D_3N_2$

The probability that this happens is: $\dfrac{3\times2}{10}$.

drhab
  • 151,093
  • :That's a great explanation for hypergeometric distribution.I really understood it.Could you tell me what's geometric distribution. – justin Mar 25 '15 at 10:18
  • @justin Thank you for your compliment. However, it is not the right route to answer (seemingly) analogous questions by means of comments. Though, you could post a question on that subject and hope for answers. – drhab Mar 25 '15 at 10:18
  • :Do you mean to say that post a question with the title:Simple explanation of Geometric distribution? – justin Mar 25 '15 at 10:20
  • @justin If the geometric distribution is somehow mysterious for you than you can do that if you like. Of course in your question you must also describe what really makes it mysterious for you. Btw, do not understand me wrong here: I am not making any promisses of answering your question (my freedom in that is very valuable to me :)). Nevertheless there quite some people here who can help you of course. Good luck. – drhab Mar 25 '15 at 10:28
  • :Could you tell me how the name hypergeometry came for this type of distribution? – justin Mar 26 '15 at 12:51
  • @justin I only know that the Greek word "hyper" is the root of the Latin word "super" wich in many contexts can be translated as "above". So "hypergeometric" must be something as "measuring above the earth". No idea why that is used for this distribution. – drhab Mar 26 '15 at 13:17
  • great answer, helped me ! – Eduard Valentin Jan 28 '18 at 08:12