4

For a problem such as what is the probability of getting exactly $500,000$ heads out of $1,000,000$ (1 million) fair coin flips, we get one huge valued number and one tiny valued number as intermediate results, both of which are not able to be computed with many online tools such as combination calculators and other online calculators.

I think the correct answer to this is ${1,000,000 \choose 500,000}$ * $0.5^{1,000,000}$.

So my question is, if someone wanted to know this approximate probability in decimal form, how would they compute it? Is there any "shortcut"? For example, we know that ${1,000,000 \choose 500,000}$ is $1,000,000 * 999,999 * ... 500,001$ / $500,000$! so we know we can keep the intermediate or accumulated result from becoming super large or super small and thus "blowing up". We also know that there are $500,000$ terms that make up the numerator and ditto for the denominator, however there are $1,000,000$ powers of $0.5$ we need to multiply by so we can further "simplify" (or manipulate) that to be $500,000$ powers of $0.5^2$ which is $0.25 ^ {500,000}$. So to me it would make sense for a combination calculator to know these "tricks" and use them to it's advantage so the result can actually be computed. I see so many online combination calculators that cannot compute this expression. Instead it tells me $infinity$ or Nan (not a number). What it really means is their utility just blew a chunk and they are putting the "blame" on me that I did something wrong.

So for example, if I made a combination calculator for this problem, The first subterm, (out of $500,000$ of them), I would get would be ($1,000,000$ / $500,000$) * $0.25$ = $0.5$. The 2nd subterm would be $999,999$ / $499,999$ * $0.25$ = $0.500000500001000002000004000008$ and so on. The last ($500,000$th) subterm would be $500,001$ / $1$ * $0.25$ = $125,000.25$. At that point I would have the final answer since I'd be accumulating the intermediate results.

I also get a similar problem when trying to compute $0.5 ^ {1,000,000}$ so it seems like someone needs to write a better combination calculator to handle problems like this.

David
  • 1,702
  • 6
    http://en.wikipedia.org/wiki/Stirling's_approximation – vadim123 Oct 03 '14 at 19:12
  • 4
    http://en.wikipedia.org/wiki/Central_limit_theorem – Jack D'Aurizio Oct 03 '14 at 19:12
  • 1
    Wow those methods are very mathematical. Why can't they just use a method like mine where they prevent the subterms from getting excessively large or small by combining terms like I did and keeping them reasonable? I wonder why so many online calculators "blow up" for something like $1$ million choose $500$K. Perhaps they are trying to compute the numerator first without ever dividing terms from the denominator such as $1$M / $500$K. I guess they "cut some corners" thinking people wont generally ask it for terms like that. – David Oct 03 '14 at 19:20
  • 1
    By the way, Wolfram Alpha "complains" when I ask it to compute $100,000,000 \choose 50,000,000$. – David Oct 03 '14 at 20:36
  • 2
    They don't do what you did because of truncation error. The makers of W|A certainly didn't cut corners. There are many reasons why not to compute large binomial coefficients directly, the least of which is that such numbers are almost always irrelevant in practice. I find your commentary to be presumptuous. – Emily Oct 03 '14 at 21:00
  • I'm not sure if this is a question about math so much as it is about why free online tools don't expend extra effort to handle numbers that users almost never care about. – Erick Wong Oct 03 '14 at 21:03
  • 1
    You think that nobody would ever want to model $100$ million coin flips looking for $50$ million heads? There are way more than $100$ million people on the Earth so $100$M is not a very big number. – David Oct 03 '14 at 21:12
  • 1
    @David If your interest is in exact values, there exist tools to handle such calculations in many cases, but don't expect them to be free, easy, or fast. That is unrealistic. As I have shown you in my answer, modeling does not require exact calculations to get a functional and meaningful answer. – heropup Oct 03 '14 at 21:20
  • 1
    I agree approximations are all that is needed many times rather than an exact value when dealing with very large (and/or very small) numbers as long as the error is within acceptable limits. – David Oct 03 '14 at 21:35
  • @David So is your question specifically about exact values or isn't it? I meant that there is almost no context where you would need the exact value of $C(100000000,50000000)$, a number with millions of digits. – Erick Wong Oct 04 '14 at 04:15
  • 1
    @ErickWong: I only needed about 10 decimal places for the final answer which is about $0.00079788$ or about $1$ in $1253$ so I changed my original question to state I was looking for an approximate answer, not an exact one. Thanks for pointing out that an exact answer is not necessary. – David Oct 04 '14 at 13:46

4 Answers4

7

With such a large number of trials and with $p = 0.5$, a normal approximation to the binomial distribution would also work. if $X \sim \mathrm{Binomial}(n = 10^6, p = 0.5)$, then $$\Pr[X = n/2] = \Pr\left[\frac{X - np}{\sqrt{np(1-p)}} = \frac{n/2 - np}{\sqrt{np(1-p)}}\right] \approx \Pr[-1/\sqrt{n} \le Z \le 1/\sqrt{n}]$$ using continuity correction, where $Z \sim \mathrm{Normal}(0,1)$. Thus we have $$\Pr[X = n/2] \approx 2\Phi(1/\sqrt{n}) - 1$$ and for $n = 10^6$, this is about $0.000797884$. In fact, this approximation is good to about $10^{-10}$.

heropup
  • 135,869
4

Even the final results in the problems you quote are very small. In that case, it is often more useful to report the log of the answer. For that purpose, Stirling's approximation is your friend: it says $n! \approx \frac{n^n}{e^n}\sqrt{2 \pi n}$ or as logs $\log n! \approx n \log n - n +\frac 12\log(2 \pi n)$ It is very accurate. Actually Wolfram Alpha has no trouble with $1000000 \choose 500000$, reporting about $7.9E301026$ (it gives many more places.) Multiplying by $2^{-1000000}$ gives a very reasonable 0.00079788....

Ross Millikan
  • 374,822
  • It is interesting to me that even though $500,000$ heads is what is expected in $1,000,000$ fair coin tosses, the actual probability of that happening seems to be only about $8$% of $1$% which is very low. Also, I checked many online calculators but I must have missed the one that actually works. What tool did you use to compute $0.5 ^ {1,000,000}$? – David Oct 03 '14 at 19:27
  • Wow that Wolfram Alpha is much better than some of those other online combination calculators and other calculators. Even my windows calculator cannot handle $0.5 ^ {1,000,000}$ – David Oct 03 '14 at 19:41
  • @David The reason for your observation is analogous to the following situation: if I give you a fair coin and you toss is 10 times, the probability of getting exactly 5 heads and 5 tails is not actually that high--natural variation implies you could get 6 heads and 4 tails, or 4 heads and 6 tails. With a large number of trials--$10^6$ in this case--you could get 499,999 heads and 500,001 tails. Also, what if you flip the coin an odd number of times? The expected value isn't even an integer in such a case. – heropup Oct 03 '14 at 19:43
  • Yes but with only $10$ coin flips, looking for exactly $5$ heads, the probability is close to $25$% which is MUCH higher than $8$% of $1$% like in the $1$M flip looking for $500$K situation. Obviously the more trials, the more likely it is to not get the exact amount you are looking for since it is MUCH easier to be $1$ (or more) "off" with a large number of trials vs. a small number. I noticed the same pattern with digits of pi, when the powers are low such as $10 ^ 2$ digits, it is easier for a digit to appear exactly 10 times (the digit 4 actually does). Also $9999$ zeros out of $100,000$. – David Oct 03 '14 at 19:54
  • FYI, Wolfram Alpha blows up on $100,000,000 \choose 50,000,000$ so it too has limits. – David Oct 03 '14 at 20:33
  • Actually I got Wolfram Alpha to work for ${100,000,000 \choose 50,000,000}$ once but then I couldn't get it to evaluate it again so that was very strange. Why would it work one time only and then fail on all the other attempts? I got it to work once by "creeping up" on those large numbers slowly like ${20,000,000 \choose 10,000,000}$ then ${40,000,000 \choose 20,000,000}$ .... Here is the message I get: Wolfram|Alpha doesn't know how to interpret your input. – David Oct 04 '14 at 13:33
3

In addition to the other answers, the binomial number of the form $ {2n \choose n}$ can be approximated asympotically (using the Stirling approximation) by $ \frac{4^n}{\sqrt{\pi n}}$. so

$$ {2n \choose n} 2^{-2n}\approx \frac{1}{\sqrt{\pi n}} $$ which for $n=500000$ gives $0.0007978836\cdots$

leonbloy
  • 63,430
  • Wow that is pretty darn accurate considering it is an approximation. – David Oct 03 '14 at 21:38
  • If you can explain why the square root and $\pi$ are in there, I am tempted to give you the checkmark for the most useful answer since it is the most compact formula to get the right answer (or very close to it). – David Oct 03 '14 at 23:01
  • @David: They come from Stirling's approximation. Note there are two factorials in the denominator and only one in the numerator. – Ross Millikan Oct 04 '14 at 14:24
  • @David The great thing about Stirling's approximation is that, even its most basic form, the relative error in $\log n!$ is $O(1/n)$, so for really large $n$ the approximation is superb. – Erick Wong Oct 04 '14 at 15:37
  • I agree the approximation is "spot on" for my example but what if I didn't use ${2n \choose n}$ would I then not be able to use the approximation? For example, if I had ${5n \choose n}$ instead. – David Oct 04 '14 at 15:44
1

In this answer an elementary proof is given that $$ \frac{4^n}{\sqrt{\pi(n+\frac13)}}\le\binom{2n}{n}\le\frac{4^n}{\sqrt{\pi(n+\frac14)}} $$ so, with $n=500000$, this becomes $$ \frac1{\sqrt{\pi(500000+\frac13)}}\le\binom{1000000}{500000}2^{-1000000}\le\frac1{\sqrt{\pi(500000+\frac14)}} $$ That is, $$ 0.0007978842948\le\binom{1000000}{500000}2^{-1000000}\le0.0007978843613 $$ This shows why the approximation in leonbloy's answer is so good.

robjohn
  • 345,667