3

For applied mathematics to evolutionary biology I am often faced to have to describe a probability distribution function (PDF) which results from the product of a function in which a parameter is drawn from a PDF. For example the random variable for which I'd like to describe the PDF is $Y$ such as

$$Y = \prod_{i=1}^{n} f(x_i)$$

, where each $x_i$ is drawn from a known PDF. Do you have some kind of general hints/advice for solving this kind of issue? If general advice are not possible, below I am suggesting two simple (or at least I hope they are simple) examples of problems:

  • Find the PDF of $Y$ such as $$Y = \prod_{i=1}^n x_i$$, where each $x_i$ is a value drawn from an exponential distribution with parameter $\lambda$. Below is the exponential distribution:

$$Pr(X=x) = \lambda e^{-\lambda x}$$

  • Find the PDF of $Y$ such as $$Y = \prod_{i=1}^n log_e(x_i)^2$$, where each $x_i$ is a value drawn from an gaussian distribution with mean $\mu$ and variance $\sigma ^2$. Below is the gaussian distribution:

$$Pr(X=x) = \frac{1}{\sigma \sqrt {2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma ^2}}$$

Remi.b
  • 1,605
  • 1
    I think what you are looking for is called "functions of random variables", where one wishes to analyze $Y=g(X)$ - for instance $Y=g(X)=X^n$ (as in your first example). You could look here http://en.wikipedia.org/wiki/Random_variable#Functions_of_random_variables – mathse Mar 25 '14 at 13:49
  • In the second example, you wrote $\log(x_i)^2$, do you actually mean $\log(x_i^2)$? – Did Mar 25 '14 at 14:30
  • @mathse In the first example, $Y\ne X^n$. – Did Mar 25 '14 at 14:31
  • I was actually typing too fast. You seem to be looking for the product of random variables and functions thereof. For instance, for discrete random variables, you would have $P[X\cdot Y=z]=\sum_{x\cdot y=z}P[X=x]P[Y=y]$ (under independence). – mathse Mar 25 '14 at 14:56
  • If random variables are independent, the PDF of their product is just the product of their PDF's. – Frank Mar 25 '14 at 15:36
  • @Frank, yes, this is true - the joint is a product of the marginals. But I think that the question is about the distribution (pdf) of the RV $Z=XY$. And this distribution, in the continuous case, is more difficult to evaluate - one could look at $\log(Z)=\log(X)+\log(Y)$ and then apply the convolution formula to determine the pdf of $Z$. – mathse Mar 25 '14 at 15:46
  • @Frank So does it mean that the result of my first example is $(\lambda e^{-\lambda x})^n$? – Remi.b Mar 25 '14 at 16:09
  • @Did I meant $log(x_i)^2$ as I wrote but I chose this just as a dummy example to understand the general calculations. If choosing another function would make the algebra easier, then please feel free to use another example. – Remi.b Mar 25 '14 at 16:10
  • Yes, that's right. – Frank Mar 25 '14 at 16:13
  • @Remi.b, I think this is completely wrong - you can look here how to compute the pdf of independent RVs: "To obtain the probability density function (PDF) of the product of two continuous random variables (r.v.) one can take the convolution of their logarithms. This is explained for example by Rohatgi (1976)." http://www.maths.bris.ac.uk/~macpd/georgiou/ProductRVs%20revised.pdf. Frank has just given you the formula for the joint pdf - but even then your formula would be wrong because you need to use different variables, $x_1, x_2,\ldots$. – mathse Mar 25 '14 at 16:20
  • To clarify the distinction between the joint pdf of $X$ and $Y$ and the pdf of $X\cdot Y$, in the discrete case, the joint distribution of $X$ and $Y$ is $p(x,y)=P[X=x,Y=y]=P[X=x]P[Y=y]$ - this is something very different from the pdf of $X\cdot Y$, which represents $P[X\cdot Y=z]$ ... – mathse Mar 25 '14 at 16:33
  • The ambiguity is still not solved. Once again, $(\log x)^2$ or $\log(x^2)$? – Did Mar 25 '14 at 16:33
  • @Did Sorry for being unclear. I meant $(log (x))^2 = (log\space x)^2$ – Remi.b Mar 25 '14 at 21:44
  • Then a gaussian distribution might be a problem since $P(X\leqslant0)\gt0$ for every nondegenerate normal random variable $X$ and $\log x$ does not exist when $x\leqslant0$. – Did Mar 25 '14 at 21:52

2 Answers2

3

As a general advice, you are looking for products of random variables (or, more generally, products of functions of random variables). To determine the distribution (pdf, cdf) of a product of random variables, different techniques may apply.

You could look here: What is the distribution of a random variable that is the product of the two normal random variables ? (look at all of the answers)

Or here: http://www.maths.bris.ac.uk/~macpd/georgiou/ProductRVs%20revised.pdf

mathse
  • 2,438
  • 12
  • 18
1

What about taking the logarithm of the product? Then you have a sum of random variables. Depending on the specifics, the central limit theorem (CLT) could apply to $\log Y$, that is, $\log Y$ could be normal, which would imply that $Y$ has a log-normal distribution.

Since here all your $X_i$'s have an exponential distribution with the same parameter $\lambda$ and are independent, the CLT does actually apply, after normalizing the variables. Let $Z_n = \log Y_n$ with $Y_n$ being the product over the first $n$ factors. Also, let

$$Z_n^* = \frac{Z_n - \mbox{E}(Z_n)}{\sqrt{\mbox{Var}(Z_n)}}.$$

Then $Z_n^*$ is normal and $\exp Z_n^*$ is log-normal as $n\rightarrow \infty$.

  • If the x_i's are correlated or not equally distributed, would your approach still work? – user1611107 Jun 17 '20 at 14:03
  • 1
    Yes if the $X_i$'s are weakly correlated (but no if there are strong long-term correlations), and even if the $\lambda_i$'s are different, under certain circumstances (e.g. if they are bounded and not too different from each other). – Vincent Granville Jun 17 '20 at 14:32
  • Thanks. So here is a follow-up question: You say that for a Markov process it is true that log<e^(x1+x2+x3+...+xn)>=sum of log(<e^x1>)+log(<e^x2>)+...+log(<e^xn>)? <-- I believe that is what your comment implies, because e^(x1+x2+...)=(e^x1)(e^x2)... am I correct? – user1611107 Jun 17 '20 at 14:44
  • In continuation with the previous: up there, I used your answer to justify the <log(Prod...)>=log(<Prod...>), if I'm correct – user1611107 Jun 17 '20 at 14:53
  • 1
    I wrote a few articles on the topic, they could be useful in this context: (1) https://www.datasciencecentral.com/profiles/blogs/the-fundamental-statistics-theorem-revisited (2) https://www.datasciencecentral.com/profiles/blogs/new-perspective-on-central-limit-theorem-and-related-stats-topics (3) https://www.datasciencecentral.com/profiles/blogs/long-range-correlation-in-time-series-tutorial-and-case-study – Vincent Granville Jun 17 '20 at 16:43
  • Great, I will see, thanks! – user1611107 Jun 18 '20 at 09:18