how does expectation maximization work?

Question

I'm reading a tutorial on expectation maximization which gives an example of a coin flipping experiment (the description is at http://www.nature.com/nbt/journal/v26/n8/full/nbt1406.html?pagewanted=all). Could you please help me understand where the probabilities in step 2 of the process (i.e. in the middle of part b in the below illustration) come from? Thank you.

Expectation maximization

EM starts with an initial guess of the parameters. 2. In the E-step, a probability distribution over possible completions is computed using the current parameters. The counts shown in the table are the expected numbers of heads and tails according to this distribution. 3. In the M-step, new parameters are determined using the current completions. 4. After several repetitions of the E-step and M-step, the algorithm converges.

Kindly pardon me but I am still not clear where is "0.45 x A and 0.55 x B" etc coming from. Can you pls guide me — Anuj Gupta, Aug 01 '17 at 09:40

score 55 · Accepted Answer · answered Mar 05 '11 at 09:26

55

These are the likelihoods of the corresponding set of $10$ coin tosses having been produced by the two coins (using the current estimate for their biases) normalized to add up to $1$. The estimated probability of $k$ out of $10$ tosses of coin $i$ ($i\in\{A,B\}$) yielding heads is

$$p_i(k)=\left({10\atop k}\right) \theta_i^k (1-\theta_i)^{10-k}\;.$$

The binomial coefficient is the same for both coins, so it drops out in the normalization, and only the ratio of the remaining factors determines the result.

For instance, in the second row, we have $9$ heads and $1$ tails. Given the current bias estimates $\theta_A=0.6$ and $\theta_B=0.5$, the factors are

$$\theta_A^9 (1-\theta_A)^{10-9}\simeq0.004$$

and

$$\theta_B^9 (1-\theta_B)^{10-9}\simeq0.001\;,$$

resulting in the numbers

$$\frac{0.004}{0.004+0.001}=0.8$$

and

$$\frac{0.001}{0.004+0.001}=0.2$$

in the second row.

answered Mar 05 '11 at 09:26

joriki

238,052

Thank you. So for each of 5 series of observations, we calculate the probability that the series is done by each coin by using the current bias estimates. Could you please help me understand: (a) how do we go from that probability distribution (0.8, 0.2) to the expectation (7.2H, 0.8T). I thought the expectation calculation cannot be (0.89 H, 0.81 T) because this way, we toss 10 times and only got 7.2+0.8 = 8 results? And (b) why do we need to do this? Many thanks again. – Martin08 Mar 18 '11 at 05:08
In case you haven't seen it: there's a follow-up question to your answer here. – t.b. Nov 11 '11 at 03:11
@Martin: Hi Martin, sorry I appear to have forgotten to answer your additional question -- I hope I've answered it now in my answer to the question t.b. links to above (in case it's still of any relevance to you...) – joriki Nov 11 '11 at 12:17
@joriki, have you implement this algorithm ? Because I implement it but found the result is A:0.66, B:0.66, which is different from the paper – zjffdu Nov 09 '13 at 11:21
@joriki can you see my answer. Is my understanding correct? – user13107 Jul 09 '14 at 06:57
Why did we choose to use H as k in the second toss? Is it because H > T? Which one did we choose in the beginning? – minerals Feb 12 '16 at 15:19
May be a naive question, can some one tell me what does 0.8 represent ? Is it Pr(H9T1|A) [Pr of seeing 9 heads and 1 tail, given coin is A] ? – Anuj Gupta Aug 01 '17 at 08:33
@joriki I know it has been a while, but maybe you good sir are still out here. I have a very similar problem to the one presented here. I understand all of it but I don't quite see why at step 1. they have probabilities of 0.6 and 0.5. Are we allow to choose random starting probabilities ? or maybe that 0.6 and 0.5 is not random at all? – Scavenger23 Apr 26 '18 at 21:57
@KuderaSebastian My understanding is there are several options for selecting initial values: either choose randomly or else make a good guess based on prior information. Since there may be multiple local maxima, it's a good idea to select several initial parameter values to see if they converge to the same solution. – RobertF May 17 '18 at 18:33
@AnujGupta The 0.8 probability is P(A|H9T1, theta_A=0.6, theta_B=0.5) which is found by computing the ratio: P(H9T1|theta=0.6)/[P(H9T1|theta=0.6) + P(H9T1|theta=0.5)]. – RobertF May 17 '18 at 18:46

user13107 · Answer 2 · 2014-07-09T07:02:03.397

8

Consider one of the coin-toss realizations in the figure.

Let $P(H_9T_1|A)$ be the probability of observing 9 heads, 1 tail when coin is A.

Let $P(H_9T_1|B)$ be the probability of observing 9 heads, 1 tail when coin is B.

Let $P(A|H_9T_1)$ be the probability of the coin being A when you observe 9 heads, 1 tail.

Let $P(B|H_9T_1)$ be the probability of the coin being B when you observe 9 heads, 1 tail.

Apply conditional probability definition.

$P(A|H_9T_1) = \frac{P(A) \cdot P(H_9T_1|A)}{P(H_9T_1)}$

$P(B|H_9T_1) = \frac{P(B) \cdot P(H_9T_1|B)}{P(H_9T_1)}$

Now,

$P(A) = 0.5 = P(B)$

Estimates of $P(H_9T_1|A)$ and $P(H_9T_1|B)$ are computed using method described by @joriki

Since the coin can either be A or B, $P(A|H_9T_1) + P(B|H_9T_1) = 1$

Hence you can calculate numbers in step 2. They are $P(A|H_9T_1)$ and $P(B|H_9T_1)$ respectively.

edited Jul 09 '14 at 07:02

answered Jul 09 '14 at 06:56

user13107

417

What is P(H9T1) ? Is it probability of getting 9 heads and 1 tail in unbiased coin? – Siddhesh Mar 15 '16 at 04:08
@Siddhesh P(H9T1) is the probability of getting 9 heads and 1 tail, which is equal to P(A) x P(H9T1|A) + P(B) x P(H9T1|B). – wwliao Sep 01 '16 at 18:59
Thanks for using good notation. Could you add to your answer how they generate the values in the red and the blue columns? And then how to generate theta A and B? I see how the numbers work, but I'm missing some of the formal notation you used. – user3731622 Sep 07 '16 at 21:23
Perfect. That's exactly what I'm thinking. Actually, "0.004/(0.004+0.001)" in the most voted answer comes from bayesian. So bayesian is indeed the essential thing. – zodiac Jun 22 '17 at 08:23

score 0 · Answer 3 · answered Oct 04 '18 at 13:54

In the above picture, the floats indicate how likely the heads are from the coin A or from coin B, provided the lastly estimated $\hat\theta_A$ and $\hat\theta_B$. The crux is that $\hat\theta_A$ and $\hat\theta_B$ are intialized randomly, but if any one of them is not set good enough the likelihood will be correspondingly small and then the other may be large after the following normalization step. And then the expected heads and tails are calcuated using the probability of the 10 flips coming from A or coming from B. Any observation can be from A or from B then the floats are the probabilities of these two conditions. Given the guessed probabilities we can then calculate the new $\hat\theta_A$ and $\hat\theta_B$. If these two parameters are estimated good enough the likelihood would be both high for them(approximate to the true likelihood) and the new parameters would be almost the same as the previous one.

Durgesh Kumar · Answer 4 · 2016-06-06T14:13:27.653

Let o1 <5H,5T>,o2 <9H,1T>,o3 <8H,2T>,o4 <4H,6T>,o5 <7H,3T> be five observation respectively.
So each of these observation can either come from coin A with probability 0.5 or from coin B with probability 0.5
If we select coin A , pA_Head denote the probability of Head from coin A , which is initialized as 0.60
If we select coin B , pB_Head denote the probability of Head from coin B , which is initialized as 0.50

Now we have to find out weather these five observation has come from Coin A or Coin B. i.e we have to find P(A | o2) and P (B | o2) Now, P(A|o1) = P(A,o2) / P(o2)

o2 can come from either Coin A or Coin B, So P(o2) = P(o2,A) + P(o2,B) [ Sum rule of Probability]
P(A,o2) = P(A) * P(o2|A)
P(B,o2) = P(B) * P(o2|B)
P(o2|A) = (9+1) C 9 * pA_head^(9) * (1-pA_head)^1
P(o2|B) = (9+1) C 9 * pB_head^(9) * (1-pB_head)^1
P(A|o2) = [P(A) * 10 C 9 * pA_head^(9) * (1-pA_head)^1 ] / [ {P(A) * 10 C 9 * pA_head^(9) * (1-pA_head)^1 } + {P(B) * 10 C 9 * pB_head^(9) * (1-pB_head)^1 } ]
Since P(A) = P(B) = 0.5 , Hence P(A|o2) = pA_head^(9) * (1-pA_head)^1 / [ pA_head^(9) * (1-pA_head)^1 + pB_head^(9) * (1-pB_head)^1 } ] = (0.6^9 *0.4 ) / [ (0.6^9 * 0.4^1) + ( 0.50^9 * 0.50^1)]

how does expectation maximization work?

4 Answers4

Linked