2

I would like to find the probability that the results of 100 dice rolls sum to 400. I think to do this exactly would take a lot of work unless there's some trick I am not privy to?

I believe the sum is distributed with a multinomial distribution with 100 trials. I need to figure out the number of ways 100 dice rolls can sum to 400, which seems to be a tall task. Any hints?

The other thought I have is typically for these kind of problems, I can use CLT and approximate the sum using a normal distribution. But typically in those kind of problems, I am trying to find the probability that a variable falls within a range of values. Here I am trying to find that the sum is exactly 400, and it's not clear to me how I can apply CLT and the normal distribution. One thought I did have is I could use a normal distribution with mean 350 and variance 291, and then integrate the area around the curve right around 400, e.g., the bounds / limits of integration would be $400 \pm \epsilon$, but it's not clear to me what $\epsilon$ should be.

5 Answers5

2

I would definitely go here for generating function and then for complex derivative. Now the number of possible outcomes is listed in the coefficients of

$$ G(x)=(x+x^2+x^3+x^4+x^5+x^6)^ {100} $$

You need the coefficient at $x^{400}$. If you find $400$-th derivative you just need to multiply it by $400!$.

But this is all what Cauchy integral formula is about

$$G^{(n)}(0) = \frac{n!}{2\pi i} \oint_\gamma \frac{G(z)}{z^{n+1}}\, dz$$

which comes to calculating

$$N(100,400)=\frac{1}{2\pi} \int_0^{2\pi} \frac{(e^{ix}+e^{2ix}+e^{3ix}+e^{4ix}+e^{5ix}+e^{6ix})^{100}}{e^{400ix}}\, dx$$

which can be shortened to

$$N(100,400)=\frac{1}{2\pi} \int_0^{2\pi} \frac{(e^{6ix}-1)^{100}}{(e^{ix}-1)^{100} e^{400ix}}\, dx$$

Now this integral can be solved in many ways, starting from using different approximations to going for solving this integral directly.

Dividing the result by $6^{100}$ you have got close to $0.000317214$

1

Your approach using the CLT seems the way to go. We have

$$Y = \sum_{i=1}^{100} X_i$$ where $\mu_X = 7/2$ and $\sigma_X^2=35/12$. Hence $\mu_Y = 350$ and $\sigma_Y^2=3500/12=291.66$, $\sigma_Y=17.08$

Then, assuming the distribution $Y$ approaches a Gaussian with that mean and variance $g(x)$ we can approximate

$$P(Y=400) \approx \int_{400-1/2}^{400+1/2} g(x) dx \approx g(400)$$

This should give an very good approximation.

Search for "continuity correction", for example here or here.

You can even refine it (with quite more work) using Edgeworth expansion. To see how that works (and how the above integral can be justified) you can see this answer.

Let's compute the exact value numerically with Octave/Matlab, and compare with the CLT approximation:

>> p0 = [0 1 1 1 1 1 1]/6;
>> p = [1 1 1 1 1 1]/6;
>> for n = 2:100
>>    p = conv(p0,p);
>> endfor
>> p(400)
ans =  0.00031721
>> s = 3500/12
s =  291.67
>> (1/sqrt(2*pi()*s))*exp(-(400-350)^2/(2*s))
ans =  0.00032152

The approximation $3.2152 \cdot 10^{-4}$ differs from the exact value $3.1721 \cdot 10^{-4}$ in less than $1.5 \%$

leonbloy
  • 63,430
  • The $\pm 0.5$ is what I was unsure of. How do you know it should be $\pm 0.5$? Depending on what you choose, your answer could change substantially. – student010101 Oct 07 '20 at 00:23
  • @student010101 I gave you several links. "your answer could change substantially" not much while the interval has length $1$. And that what's matter, because only then you cover all the probabilities, i.e. you know that $\sum_x P(X=x) = 1$, then if you want to approximate each $P(X)$ by an area below the continuous curve, you must take each base of length $1$, so that $\int_{\infty}^{\infty} g(x) = 1$ – leonbloy Oct 07 '20 at 01:23
  • I'm still kind of doubting the 0.5. I feel like $\epsilon$ should be dependent on the scale of the problem, which in this case is the variance. Like if we have a problem with a very large variance vs. a problem with a very large variance, it would seem to make sense that $\epsilon$ scales with the variance. But on second thought, maybe it shouldn't scale with the variance because the magnitude of the pdf also changes with the variance, so the effects could end up being canceled out? – student010101 Oct 08 '20 at 00:05
1

You are looking for $$ \eqalign{ & N_b (300,5,100) = \cr & = {\rm No}{\rm .}\,{\rm of}\,{\rm solutions}\,{\rm to}\;\left\{ \matrix{ {\rm 1} \le {\rm integer}\;x_{\,j} \le 6 \hfill \cr x_{\,1} + x_{\,2} + \; \cdots \; + x_{\,100} = 400 \hfill \cr} \right. = \cr & = {\rm No}{\rm .}\,{\rm of}\,{\rm solutions}\,{\rm to}\;\left\{ \matrix{ {\rm 0} \le {\rm integer}\;y_{\,j} \le 5 \hfill \cr y_{\,1} + y_{\,2} + \; \cdots \; + y_{\,100} = 300 \hfill \cr} \right. \cr} $$ whose exact solution reads as $$ N_b (s,r,m)\quad \left| {\;0 \leqslant \text{integers }s,m,r} \right.\quad = \sum\limits_{\left( {0\, \leqslant } \right)\,\,k\,\,\left( { \leqslant \,\frac{s}{r+1}\, \leqslant \,m} \right)} {\left( { - 1} \right)^k \binom{m}{k} \binom { s + m - 1 - k\left( {r + 1} \right) } { s - k\left( {r + 1} \right)}\ } $$ as explained in this related post and various others.

In terms of probability that becomes $$ p_{\,b} (s;r,m) = {{N_{\,b} (s,r,m)} \over {\left( {r + 1} \right)^{\,m} }} $$

Although heavy, the two formulas above can be computed by a good CAS, giving $$ p_{\,b} (300;5,100) = 0.0003172 \ldots $$

If you need instead an asymptotic formula, you can resort on CLT.

To apply that properly, we shall first convert the sum of $m$ discrete uniform variables over $[0,r]$ into the approximating sum of $m$ continuous uniform variables on the support $[-1/2, \, r+1/2]$.
The relevant distribution is the Irwin-Hall distribution.

From this you can pass to $$ \eqalign{ & p_{\,b} (s;r,m) = {{N_{\,b} (s,r,m)} \over {\left( {r + 1} \right)^{\,m} }} \approx {1 \over {\sqrt {2\pi m\sigma ^{\,2} } }} e^{\, - \,{{\left( {s - m\mu } \right)^{\,2} } \over {2m\sigma ^{\,2} }}} \cr & = {{\sqrt {6/\pi } } \over {\sqrt {m\left( {\left( {r + 1} \right)^{\,2} } \right)} }} e^{\, - \,6{{\left( {s - mr/2} \right)^{\,2} } \over {m\left( {\left( {r + 1} \right)^{\,2} } \right)}}} \cr} $$ but you cannot pretend it to be very precise.

G Cab
  • 35,272
1

For your approach you should take $\epsilon=0.5$. You are wanting to integrate the error function between $399.5$ and $400.5$ because that is the continuous approximation to your distribution. Another way to use this approach is to use a z-score table, computing the number of standard deviations $399.5$ and $400.5$ equate to.

To get an exact count would not be so hard if spreadsheets could handle $97$ digit numbers. Make columns from $1$ to $100$ for the number of dice and rows from $1$ to $400$ for the sum. Each cell is the sum of the four cells in the prior column from $1$ to $4$ rows above corresponding to rolling $1$ to $4$ on the last die. Write that in the second column and copy right and down and you are done except for overflow. Python gives arbitrary precision integers as soon as you need them, so will do this easily.

Ross Millikan
  • 374,822
0

The exact answer to your question is:

  1. Yes you can use Normal distribution
  2. You need to adjust to so called Continuity Correction

All together:

$$\operatorname{Mean}(m) = \frac{7}{2}m$$ $$\operatorname{Variance}(m) = \frac{35}{12}m$$

You use for $P(X=n)$

$$P(n – 0.5 < X < n + 0.5)$$

In your case:

$$m=100, n=400$$