7

Suppose that $X_1 ,\ldots,X_n,Y_1,\ldots,Y_n$ are all independent normal random variables with different means and variances. What is the PDF of the following random variable?

$$X_1Y_1+\cdots+X_nY_n$$

or is there any way that I can find an approximation for its PDF?

May
  • 1,204
  • 1
    I see no reason to expect a nice formula. Do you have one? – Did Sep 05 '13 at 14:39
  • unfortunately, no. – May Sep 07 '13 at 02:22
  • 1
    Then this is the prototypical question without answer, and opening a bounty will not change this. // @upvoters Why the upvote? – Did Sep 07 '13 at 09:21
  • I am interested to find even an approximation. – May Sep 07 '13 at 15:02
  • 1
    The best approximation is probably to invoke the CLT (if $n$ is large enough) to say that the whole thing will be normally distributed. Then all you have to know is the mean and variance. The mean you can certainly get easily. I'd have to think a bit about if you could get a formula for the variance in terms of the means and variances of the $X$ and $Y$ variables. – rajb245 Sep 07 '13 at 23:52
  • since these random variables are not identically distributed (different means and vaiances), we cannot use CLT. – May Sep 08 '13 at 03:39
  • 1
    There are generalizations of the CLT that apply even without identical distributions. Intuitively (forgetting the proofs), the sum of lots of random things happening tends towards normality, regardless of the underlying distributions, for many (most) practical problems in the real world. There are of course cases when this isn't true, but I suspect it is for your scenario. You can conduct some numerical experiments to convince yourself of this. I can provide some MATLAB code to motivate this, it's about ~15 lines. In my numerical investigations, even for $n=2$ its approximately normal. – rajb245 Sep 09 '13 at 03:46
  • For example, see the Lyapunov CLT. It just requires that as $n$ get's larger, the moments of the distributions don't get larger beyond a certain rate. If your means and variances are all finite, this condition holds I believe. – rajb245 Sep 09 '13 at 03:50
  • @rajb245 : Would you please give me your MATLAB code? – May Sep 09 '13 at 15:20
  • I also checked Lyapunov CLT, but the random variables should satisfy Lyapunov Condition. http://mathworld.wolfram.com/LyapunovCondition.html – May Sep 09 '13 at 15:44
  • 1
    http://pastebin.com/Pqi5fEu4 – rajb245 Sep 10 '13 at 16:17
  • @rajb245 thanks for your code. I need an approximation/exact pdf for all the cases small/large number of trials. For large number of trials as your code shows the pdf is normal. – May Sep 13 '13 at 17:09
  • There is the $n$ variable, the number of terms, and then there is the the number of trials, which is simply how many times we draw numbers from the above summation. If you set $n$ to 2, the simulation is for $X_1Y_1+X_2Y_2$; even in this case I observe normality. The large number of trails (not $n$) is simply how numerical simulations of continuous random variables must be done to approximate convergence to the continuous result. – rajb245 Sep 14 '13 at 12:54
  • @rajb245 you are right it is really interesting that even with small n it is approximately normal , do you know why? I need some analytical proof. – May Sep 14 '13 at 19:35
  • I also found this post http://math.stackexchange.com/questions/133938/what-is-the-density-of-the-product-of-k-i-i-d-normal-random-variables which shows this histogram of product of normal random variables which is not normal. I am curious why the histogram of their summation is normal! – May Sep 14 '13 at 19:44
  • @rajb245 try your code for standard normal random variables specially when we choose small N the PDF does not look like normal PDF. Thank you for your code, it was a great help. – May Sep 17 '13 at 19:18
  • 1
    For small terms in the sum, the idea is that the product of normals is approximately normal anyway, while the sum of normals is exactly normal. So small numbers of terms above give you approximate normality. For more terms, the CLT kicks in. You can just get closed form formulas for the moments and use these to estimate the distribution. You'll see that the higher order moments like skew and kurtosis will go to zero with additional terms as $\sqrt{n}$ if I recall correctly. – rajb245 Sep 18 '13 at 12:59

2 Answers2

5

You want to know the distribution of

$$ Z = \sum_{i=1}^n X_iY_i, $$

where $X_i$ and $Y_i$ are all normal and independent but not identically distributed. Let's calculate the mean and variance of $Z$:

$$ \mathrm E[Z] = \sum_{i=1}^n \mathrm E[X_iY_i] = \sum_{i=1}^n \mathrm E[X_i]\mathrm E[Y_i]. $$

You know the individual means of the normals, so the above sum can be calculated. Now the variance, knowing that covariance terms drop out because of uncorrelation due to independence

$$ \mathrm{Var}[Z] = \sum_{i=1}^n \mathrm{Var}[X_iY_i] = \sum_{i=1}^n \mathrm E[Y_i]^2\mathrm{Var}[X_i]+\mathrm E[X_i]^2\mathrm{Var}[Y_i]+\mathrm{Var}[X_i]\mathrm{Var}[Y_i] $$

Again, this is all in terms of quantities you know, the means and variances of the underlying distributions.

So, IF the CLT applies, your answer is that $Z\sim\mathcal{N}(\mathrm{E}[Z],\mathrm{Var}[Z])$. Rigorously proving that this is true for your particular case and your particular distributions is up to you, but my claim is that it is true. I pastebinned some numerical calculations to support my claim.

rajb245
  • 4,755
  • 15
  • 34
  • Updated MATLAB to demonstrate that the variance formula works too: http://pastebin.com/d6g5rs1A – rajb245 Sep 10 '13 at 16:46
  • I have a question about your MATLAB code. Lets say N=2, Ntrials=2, so we have X=[a b;c d], Y=[e f;g h], to make samples for X1Y1+X2Y2 you just multiply a by e, I think we need to consider all the cases ,i.e. we should also multiply a by f too, the same for the summation. so with 2 number of trials and N=2, you create 2 samples, but the total number of possible cases is 16. – May Sep 22 '13 at 20:32
  • $ae$ represents a single realization of the random variable $X_1Y_1$. $bf$ is another realization. Every time you want a realization of $X_1Y_1$, you generate two fresh random numbers and multiply them together. In fact, reusing them for the different possible "cases" as you suggest would lead to non-independent answers. Do you see how $ae$ and $af$ depend on each other? On the other hand, $ae$ and $bf$ do not. NEVER reuse random variables if you want independence. If you want independence, draw fresh numbers from the random number generators. – rajb245 Sep 25 '13 at 14:27
  • Thanks a lot for your explanation, you are right. Is there any theory/book regarding this issue (i.e. creating dependent/independent random numbers)? – May Sep 25 '13 at 21:22
2

The product $Z_i=X_iY_i$ obeys the product normal distribution, which has a characteristic function given in: Characteristic function of product of normal random variables. As a result the characteristic function of your random variable is given by a product of the similar characteristic function (after matching to the various means and standard deviations). You can solve the question by approximating the Fourier transform of the resulting characteristic function.

SBM
  • 375
  • how can I use FT to approximate it? – May Sep 13 '13 at 17:12
  • Sample the Fourier transform at the resolution you wish, while neglecting the tale of the characteristic function. If for example, all the expectations are 0 and all standard deviations are 1, start sampling the Fourier transform at the range $\omega \in [-10,10]] every 0.01 Hz and evaluate the PDF by performing a numeric Fourier transform over the samples. If you are using matlab I can help you write an example. – SBM Sep 14 '13 at 19:16
  • Yes I am using MATLAB. that would be perfect thanks. – May Sep 14 '13 at 19:36
  • The code assumes normalized random variables ($EX_i=0 \sigma_X=\sigma_Y=1$) and gives you two options of computations, numeric and analytic using the Matlab symbolic functions: N=5; $ $ syms W X;$ $ CF= (1/sqrt(1+W^2))^N;$ $ minFreq=-10;$ $ maxFreq=10;$ $ resolutionFreq=0.01;$ $ w=minFreq:resolutionFreq:maxFreq;$ $ cf=double(subs(CF,W,w));$ $ minX=-10;$ $ maxX=100;$ $ resolutionX=0.1;$ $ x=minX:resolutionX:maxX;$ $ % First Option$ $ E=exp(jx'w);$ $ pdf=Ecf.'/(2pi);$ $ plot(x,pdf)$ $ $ $ %Second Option $ $ pdf=(ifourier(CF,W,X))';$ $ pdf % Show the pdf as an expression $ $ ezplot(pdf)$ $ – SBM Sep 14 '13 at 20:46
  • Thank you. But the characteristic function in (http://math.stackexchange.com/questions/74013/characteristic-function-of-product-of-normal-random-variables) is for standard normal random variables. I think first I need to find the characteristic function for the general case. Any help would be appreciated, thank you. – May Sep 16 '13 at 23:14
  • Let $X~N(a,b) Y~N(c,d)$, so – SBM Sep 17 '13 at 07:34
  • @May I tried to perform the calculation but I couldn't. I think you should post another thread about this kind of calculation. Good Luck. – SBM Sep 17 '13 at 08:25
  • Thanks for your help. I posted it again http://math.stackexchange.com/questions/496465/characteristic-function-of-random-variable-z-xy-where-x-and-y-are-independent/496481?noredirect=1#comment1067575_496481 – May Sep 17 '13 at 19:21