0

Suppose I draw several times from an uniform distribution, $X\sim\mathcal{U}(0, 1]$. (I'll use $\mathrm{R}()$ to denote an independent drawing.) What is then the PDF of several draws, added and/or subtracted from each other? How do I calculate it?

For example:

$r = \sum_{i=1}^{5} \mathrm{R}() - \sum_{i=1}^5 \mathrm{R}()$

If I draw 15,000,000 samples, round them, and note their total number of occurrences, the plot I get looks like a normal distribution with $\mu = -2.5$. But “looks like” is not really satisfactory, because I am not even sure whether it really is a normal distribution I'm getting this way.

Can somebody help me to derive the type of distribution and its parameters of the formula above (or any other formula where several independent draws are added/subtracted)?

Technaton
  • 123
  • Do you want a "true" normal distribution or do you want to fix your addition/subtraction method as best as possible. – MaxW Nov 04 '15 at 16:33
  • I want to fix it: These calls come from a computer program that I'm profiling. I figure that instead of using several drawings, I can draw from the proper distribution - which would save me a lot of calls to this pseudo random number generator. I'm, however, stuck at describing the actual result of those calls mathematically. I'd like to avoid replacing something that works with something that is faster, but broken wrt the original design. – Technaton Nov 04 '15 at 16:44

2 Answers2

1

When you add two independent variables, you can calculate the distribution with the convolution of the two pdfs. If Z = X + Y,

$f_Z(z) = \int_{-\infty}^{\infty} f_X(z-x)f_Y(x) dx$

This question explains the density of two uniform random variables: density of sum of two uniform random variables $[0,1]$

For subtraction, use the pdf of -Y, which is $f_Y(-x)$.

If you repeat this convolution process, you can calculate the distribution for the sum of any number of iid Uniform random variables (e.g. for $U_1 + U_2 + U_3$ you find $Z = U_1 + U_2$ and then $Z + U_3$), but the formula will get more and more complicated.

This is what mathematica calculated for $\sum_{i=1}^5 U_i - \sum_{i=6}^{10} U_i$, where $U_i$ are iid Uniform[0,1]:

pdf of $\sum_{i=1}^5 U_i - \sum_{i=6}^{10} U_i$

Calculating the coefficients of the polynomials for these distributions would probably be an interesting combinatorics problem. So in theory you could calculate an exact formula for your distribution and draw from this, but it may be unwieldy if you are adding and subtracting too many times.

On the other hand, the reason this distribution looks sort of normal is because you are adding iid random variables, which by the Central Limit Theorem will result in something that can be better approximated by a normal distribution the more iid random variables you add together. For the distribution you described, the mean is 0 and the variance is 10/12, since each of the 10 independent variables has a variance of 1/12. So a Normal[0,10/12] distribution may be a decent approximation.

  • Thank you very much! That gives me enough understanding to work with, and more to read up about the basics! :) – Technaton Nov 04 '15 at 21:16
0

You are not getting a normal distribution. First the data is bounded in $X\sim\mathcal{U}(0, 1]$. A true normal distribution goes to infinity in either direction.

The fact that you average 5 draws on the uniform distribution gives you a quasi binomail distributed random number. Thus what you do have is a quasi binomial distribution which has enough data points to approximate an normal distribution.

To get a true normal distribution you use the CDF of the normal distribution function to find the Z value such that it has the same value as $X\sim\mathcal{U}(0, 1]$.

so:

  X       F(X)
  0.05    -1.96
  0.50     0
  0.95     1.96

Such a reverse lookup is generally a built in function. Not sure if it would be faster, but there are linear equations which will calculate the reverse lookup accurate to 6 decimal points. That should be good enough for any practical purpose.

MaxW
  • 851
  • So what I have is not a normal distribution, but I can find one that approximates it? I'm not fixed on the normal distribution, but why is it a pseudo-binomial? I always thought a binomial distribution models yes/no drawings, i.e., discrete ones, not those from a continuous interval? Is there a way to formally describe the distribution I have --- or is the approximation with a normal distribution the best description? – Technaton Nov 04 '15 at 18:04
  • I'm not doubt using the terminology wrong. The point was you did 10 draws to get the two numbers. Think of using a true binomial distribution with p=0.5 and averaging over a number trials. As the number of trials increases the smoother the distribution of the average. – MaxW Nov 04 '15 at 18:21
  • The point is that the CDF for the normal distribution has values by definition between 0 and 1. Whatever language you're using probably has a function to do the reverse lookup. – MaxW Nov 04 '15 at 18:30
  • What I was suggesting is evidently more formally called Inverse transform sampling // https://en.wikipedia.org/wiki/Inverse_transform_sampling // Evidently the Box-Muller technique is more computationally efficient. https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform – MaxW Nov 05 '15 at 05:13
  • For functions that calculate the inverse of the CDF for the normal equation see Alfred L. Brophy, Approximation of the inverse normal distribution function, Behavior Research Methods, Instruments, & Computers May 1985, Volume 17, Issue 3, pp 415-417 // http://link.springer.com/article/10.3758%2FBF03200956 – MaxW Nov 05 '15 at 06:34
  • How much accuracy you need is determined by what you are doing with the tails of the distributions. So if you using a tail of +/- 10% then you need less accuracy than if you want to use a tail +/- 1%. – MaxW Nov 05 '15 at 16:40