CDF of sum of N exponentially distributed random variables with condition

Question

I have $Y=X_1u(X_1-x_{th})+X_2u(X_2-x_{th})+\cdots+X_Nu(X_N-x_{th})$, with all the $X_i\sim\lambda e^{-\lambda}$, $u(t)$ is the unit step function and $x_{th}$ being the threshold which means that any $X_{i}$ will be added only if it is greater than $x_{th}$. I need to find the CDF of $Y$.

Any help in this regard will be appreciated.

BR Frank

Hint: Normally sum of iid exponential RV's follow a gamma distribution. Here we have a sum of truncated exponentials. It is not difficult to find the pdf of this truncated exponential (condition on $X_i>x_{th}$). Now apply the same technique used to prove that sum of exponentials is gamma. — A. Ray, Jun 01 '16 at 01:41
@A.Ray can you refer me a book or webpage where i can find the proof? — Frank Moses, Jun 01 '16 at 01:43
I can refer you to this stackexchange page where it has been discussed before. — A. Ray, Jun 01 '16 at 01:44

Ian · Answer 1 · 2016-06-01T11:46:42.937

Consider $\lambda=1$ and $x_{th}=1$ for ease of getting started. Then the moment generating function of one of your summands (say $Z_i$) is

$$M(t)=e^0 \int_0^1 e^{-x} dx + \int_{1}^\infty e^{tx} e^{-x} dx = 1-e^{-1} + \frac{e^{t-1}}{1-t}$$

for $t<1$. So the mgf of your $Y$ with these parameters is $\left ( 1-e^{-1}+\frac{e^{t-1}}{1-t} \right )^N$. Replacing $t$ by $-s$ switches to the usual Laplace transform notation, so you want the inverse Laplace transform of $\left ( 1-e^{-1} + \frac{e^{-s-1}}{s+1} \right )^N$.

This can be expanded as a binomial; letting $q=1-e^{-1}$, you have

$$\sum_{k=0}^N {N \choose k} q^{N-k} \frac{e^{-k(s+1)}}{(s+1)^k} = \sum_{k=0}^N {N \choose k} q^{N-k} e^{-k} \frac{e^{-ks}}{(s+1)^k}.$$

Now take the inverse Laplace transforms (which can be found in standard Laplace transform tables) and sum them. Note that when $k=0$ you need the inverse Laplace transform of a constant, which is that constant times the Dirac delta at zero. This is no surprise, because $P(Y=0)=q^N>0$, so $Y$ should not have an ordinary pdf.

I have figured out the term for $k=1$ in the CDF and its $N(1-e^{-1})^{N-1}(1-e^{-x})$. Can you tell if this is the right expression for $k=1$ case — Frank Moses, Jun 01 '16 at 04:07
The term for $k=1$ in the PDF is $Nq^{N-1} e^{-x} \theta(x-1)$, where $\theta$ is the Heaviside step function. Therefore the corresponding term in the CDF is $\int_{-\infty}^x N q^{N-1} e^{-y} \theta(y-1) dy$, which is $0$ for $x<1$ and $\int_1^x N q^{N-1} e^{-y} dy = N q^{N-1}(e^{-1}-e^{-x})$ otherwise. So you are not quite correct. — Ian, Jun 01 '16 at 11:36

score 0 · Answer 2 · answered Jun 01 '16 at 02:24

Because $\{X_k\}_{k\in\{1,..,N\}}\mathop{\sim}\limits^{\rm iid}\mathcal {Exp}(\lambda)$, then $\mathsf P(X_\star\geqslant \theta) = \mathsf e^{-\lambda\theta}$ is the probability of a particular one of these random variables exceeding the threshold, $\theta$.

Let $M$ be the count of such values . Then this $M$ is binomially distributed. $$M\sim\mathcal{Bin}(N, \mathsf e^{-\lambda\theta})$$

Let $Z_m$, be the sum of some $m$ of these exponentially distributed random variables. (Due to them all being iid we don't really care which). The sum of $m$ exponentially distributed random variables has some sort of well known distribution, which will allow us to evaluate $F_{Z_m}(z):=\mathsf P(Z_m\leq z)$

Let $\mathscr M$ is a collection of $m$ indices for these exponentially distributed random variables $\{X_k\}_{k\in\mathscr M}$. In particular we are interested in those which all exceed the threshold. Due to the memoryless property (and the above): $$\begin{align}\mathsf P(\sum_{k\in \mathscr M} X_k\leqslant y\mid \min_{k\in\mathscr M} X_k\geqslant\theta) = & ~ \mathsf P(\sum_{k\in \mathscr M}X_k\leqslant y-m\theta) \\ = & ~ F_{Z_m}(y-m\theta)\end{align}$$

Then we have the distribution of the count of samples which exceed the threshold, and the distribution of their sum when given that count.

So the probability we are interested in is $$\mathsf P(Y_N\leq y) = \sum_{m=0}^{\lfloor y/\theta\rfloor} \mathsf P(M{=}m)~F_{Z_m}(y-m\theta)$$

Your answer is very logical however I have little confusion regarding this answer for $M=1$ case. In this case we will have just one variable which is above the threshold and all the other variables are smaller than the threshold. The joint pdf of the maximum and second maximum of the variables can be written as $f_{highest x, second highest x}(x,y)=\frac{N!}{(N-2)!}e^{-x-y}(1-e^{-y})^{N-2}$. — Frank Moses, Jun 01 '16 at 03:25
Now if we integrate this pdf with the limits of $highest_x$ to be $x_{th} \to y+x_{th}$ and the limits of $second highest_x$ to be $0 \to x_{th}$ then, in my understanding, we should get the same answer as we will get from your last expression with $M=1$. However this is not the case because with the above reasoning we will get $N(1-e^{-x_{th}})^{N-1}(e^{-x_{th}}-e^{-y-x_{th}})$ while with your expression we will get $N(1-e^{-x_{th}})^{N-1}(e^{-x_{th}}-e^{-y-2x_{th}})$. P.S. I have used $\lambda=1$ in this case. — Frank Moses, Jun 01 '16 at 03:25

CDF of sum of N exponentially distributed random variables with condition

2 Answers2

Linked