2

Let $X,Y$ be independent exponential RV's with respective pdf's $f(x) = \lambda e^{-\lambda x}$ and $f(y) = \mu e^{-\mu y}$. We want to find the pdf of $Z=X-Y$.

I originally tried the convolution method (but with wrong limits). $$ P[X-Y \le z]= \int_0^{\infty} \Bigg( \int_0^{z+y} \lambda e^{-\lambda x} dx \Bigg) \mu e^{-\mu y} dy$$

which is wrong.

I was told that I'm supposed to split the inner integral into two, depending on $X-Y$ being positive or negative. I don't understand why.

I'm thinking that since we've got $X \in (0,\infty)$ and $Y \in (0, \infty)$ we should have $Z \in (-\infty, \infty)$.

Any hints?

I just thought of another way: the moment generating function. But there's a catch, we only know that product of MGF's is an MGF of a SUM (not difference). I'm wondering if a ratio of MGF's might be an MGF of a difference?

StubbornAtom
  • 17,052
shimee
  • 1,467

3 Answers3

5

This is a perfect example to show that indicator functions should not be omitted from the expressions of densities--or rather that, if they are, absurd conclusions may follow.

Here the densities are NOT what you wrote but the functions $f$ and $g$ defined on $\mathbb R$ by $$ f(x)=\lambda\mathrm e^{-\lambda x}\mathbf 1_{x\geqslant0},\qquad g(y)=\mu\mathrm e^{-\mu y}\mathbf 1_{y\geqslant0}. $$ Thus, for every $z$, $$ P[X-Y\leqslant z]=\iint f(x)g(y)\mathbf 1_{x-y\leqslant z}\mathrm dx\mathrm dy=\int_0^{+\infty}\int_0^{+\infty}\lambda\mathrm e^{-\lambda x}\mu\mathrm e^{-\mu y}\mathbf 1_{x-y\leqslant z}\mathrm dx\mathrm dy, $$ that is, $$ P[X-Y\leqslant z]=\int_0^{+\infty}\mu\mathrm e^{-\mu y}\int_0^{(y+z)^+}\lambda\mathrm e^{-\lambda x}\mathrm dx\mathrm dy=\int_0^{+\infty}\mu\mathrm e^{-\mu y}(1-\mathrm e^{-\lambda (y+z)^+})\mathrm dy, $$ and from this point, one should treat separately the cases $z\leqslant0$ and $z\geqslant0$.

A more direct approach is to note that the density $h$ of $X-Y$ is defined by $$ h(z)=\int f(y+z)g(y)\mathrm dy=\int \lambda\mathrm e^{-\lambda (y+z)}\mu\mathrm e^{-\mu y}\mathbf 1_{y+z\geqslant0}\mathbf 1_{y\geqslant0}\mathrm dy, $$ that is, $$ h(z)=\int_{z^-}^{+\infty}\lambda\mathrm e^{-\lambda (y+z)}\mu\mathrm e^{-\mu y}\mathrm dy=\frac{\lambda\mu\mathrm e^{-\lambda z}}{\lambda+\mu}\mathrm e^{-(\lambda+\mu)z^-}, $$ which can be rewritten as the more symmetrical $$ h(z)=\frac{\lambda\mu}{\lambda+\mu}(\mathrm e^{-\lambda z}\mathbf 1_{z\geqslant0}+\mathrm e^{\mu z}\mathbf 1_{z\leqslant0}). $$

Did
  • 279,727
1

The following hopefully explains the "split integral into two" statement you mentioned.

If you take two exponentials $X$ and $Y$ with support $\mathbb{R}^+_0$ then the variable $Z=X-Y$ has support $\mathbb{R}$.

To arrive at the "convolution" of the two, just find the values of $X$ and $Y$ so that their difference is $Z$. A graph is useful for explaining this:

enter image description here

Basically, the "convolution" needs to integrate along the green lines.

With this you can see why it is good to split it into two parts: the minimum $X$ or $Y$ is not smooth at $Z=0$.

So, if $z\geq0$ we need to consider values of $x \geq z$ and $y \geq 0$, which means we do the integral:

$$p_+(z) = \int_0^\infty p_X(s+z)p_Y(s) ds$$ $$ = \int_0^\infty f(s+z)g(s) ds$$ $$ = \lambda\mu\int_0^\infty e^{-\lambda (s + z) - \mu s }$$ $$ = \lambda\mu e^{-\lambda z}\int_0^\infty e^{-(\lambda + \mu)s}$$ $$ = \frac{\lambda\mu}{\lambda+\mu}e^{-\lambda z}$$

and for $z \leq 0$ we consider the values of $x \geq 0$ and $y \geq -z$, which means we do the integral:

$$p_-(z) = \int_0^\infty p_X(s)p_Y(s-z) ds$$ $$=\frac{\lambda\mu}{\lambda+\mu}e^{\mu z}$$

Combining the two gives:

$$p(z) = \frac{\lambda\mu}{\lambda+\mu} \left\{\begin{array}{ll} e^{-\lambda z} & z \geq 0 \\ e^{\mu z} & z \leq 0 \end{array}\right.$$

Although I agree with Did that indicator functions are a way forward, but they have the consequence of changing the support and introducing a measure zero regions. They are not necessary if you keep the support of the random variables in mind.

Regarding Did's question:

First, the general problem with using indicators...

There are two definitions of an exponential distribution we can work with, one can be written $A = \left(\mathbb{R}^+, \mathcal{B}(\mathbb{R^+}), \mu\right)$ and the other $B = \left(\mathbb{R}, \mathcal{B}(\mathbb{R}), \nu \right)$ where $\nu(E) = \int_E \mathbb{1}_{x\geq0}d\mu(x)$. Using the notation for a probability distribution where $P_i = (S_i,\Sigma_i,p_i)$, I will call $P_b$ a weakening of $P_a$, iff $S_a \subset S_b$, $\Sigma_a$ is the $\sigma$-algebra induced on $S_a$ by $\Sigma_b$ and $\forall E \in \Sigma_a : p_a(E)=p_b(E)$.

This corresponds to the notion in logic that $X\vee Y$ is a weakening of $X$ (and also of $Y$). This matches up in the sense that the logical proposition corresponding to inclusion of $e$ in $S_b$. $[[e \in S_b]]$ can be written as $[[e \in S_a]] \vee [[e \in S_b\setminus S_a]]$.

It is clear that $B$ is a weakening of $A$. Trying to work out which one was the OP's intent is not really worth the bother. Here, I wish to explain why that solving problems like this by weakening can be problematic, even if in this particular case it works out just fine. The problem does not stem from the mathematical details of measure theoretic probability theory, but lies in the consistency of it's application. The question is, when is it valid to weaken the support? As I said, this is not about the Kolmogorov formalisation or what have you, but about how one should go about using probabilities if your intention is to model real things.

With the weakening there is no constraint on what the added space represents as it comes from outside of the thing you are modelling. For example, if I were modelling radioactive decay time with an exponential distribution, what does a negative decay time mean? The answer is nothing: it's just nonsense. Probabilistic tachyons. Worse than that, it could be instead interpreted to mean something inherently contradictory, actively destroying any further inferences.

The conservative approach is to just avoid weakening all together, and to say that it is bad practice.

Similar arguments exists against (Lebesgue > 0) zero measure sets, and would usually dismiss the "there are complications" argument on the basis that the complications are "how it should be", or more explicitly, a good model should break exactly where it doesn't apply (hence mentioning the need to define $0\log 0$).

These arguments usually depend on some further assumptions, for example, a Bayesian might argue that one is never completely certain of anything, or, if they are being more formal, that you can't affect or obtain measure zero regions by updating - i.e. they are qualitatively different. Other arguments stem from the behaviour of information measures. I've heard quite a few. Personally though, I just like a clear (not just formal) distinction between the ontological claims described by the support, and the epistemological claims described by the measure - I see this as very important for ensuring information theory does not overstep its bounds, and understanding why the popular claim that "everything is information" is so very wrong.

Lucas
  • 1,469
  • You might want to expand on "but they have the consequence of changing the support and introducing a measure zero regions". What is "changing the support" and what is "introducing [a] measure zero regions"? – Did Jun 12 '13 at 12:01
  • @Did There's no need to be [sarcastic], using square brackets to indicate grammatical errors is somewhat redundant when it is in your power to fix them in the original. – Lucas Jun 12 '13 at 17:38
  • @Did Basically, using the densities over $\mathbb{R}$ is defining a new variable with a different support, which one uses in its stead. It's a bit of a hack. Then, there are many who think that there is something very wrong with probabilities of zero, even before you end up having to define $0 \log 0$. One reason is, as soon as you start extending the support to things that don't make sense by setting $Pr=0$ (negative times spans in this case), where do you stop? $Pr(\text{The color of 3 o'clock is apples})=0$? (note that this would imply $Pr(\text{The color of 3 o'clock is not apples})=1$) – Lucas Jun 12 '13 at 17:42
  • Not sure I understand your mathematical stance but I suspect it is deeply wrong. The main point is that a real random variable $X$ is a measurable function between a measure space $(\Omega,\mathcal F)$ and the measure space $(\mathbb R,\mathcal B(\mathbb R))$. When the measure space $(\Omega,\mathcal F)$ is endowed with a probability measure $P$, the distribution of $X$ is the measure $\mu$ on $(\mathbb R,\mathcal B(\mathbb R))$ image of $P$ by $X$. Thus, by definition, we are dealing with a measure $\mu$ on $\mathcal B(\mathbb R)$ whose density (when it is densitable) is a (class of) .../... – Did Jun 12 '13 at 18:44
  • .../... function(s) defined on $\mathbb R$. It may happen that $\mu(B)=0$ for Borel subsets $B$ of $\mathbb R$ whose Lebesgue measure is positive, say $B=(-\infty,0)$ when $X\geqslant0$ almost surely, but there is "nothing very wrong with [these] probabilities of zero". The alternative (defining $\mu$ on $(S,\mathcal B(S))$ only, where $S$ is the support of the distribution of $X$) leads quickly to complications. The invocations of apples, negative times and $0\log0$ are opaque to me in this context, but I am quite curious to know where you picked the idea that probabilities .../... – Did Jun 12 '13 at 18:45
  • .../... zero are a problem per se. Any source? (And, to prevent new misunderstandings, I am not being sarcastic here, just trying to understand.) – Did Jun 12 '13 at 18:45
  • @Did See my updated answer. TL;DR Even if something has a probability of zero, there still must be something to have a probability of zero. I see that as problematic. – Lucas Jun 12 '13 at 23:43
  • Wow. This seems to be a highly idiosyncratic approach, to say the least... Just one remark: you mentioned no source for these claims, maybe you could add some. – Did Jun 13 '13 at 00:26
  • @Did well, 75% is me trying to explain to you why I, personally, think there is something fishy about what I called "weakening", it would surprise me if anyone else has even wanted to define it. As for the other arguments, hard to remember where. I'm sure you will be in luck finding the updating thing somewhere in most textbooks - look at improper priors perhaps. The "you're never certain of anything" is a common but informal maxim, I found: "We should think about things in terms of how probable they are. You almost never have anything close to perfect certainty." - Spencer Greenberg .../... – Lucas Jun 13 '13 at 00:48
  • .../... There is a very breif discussion in Kullback and Leibler 1951 about absolute continuity, elsewhere too, but I forget. For the tendancy for information zealotry, see "The Bandwagon" by Shannon, or just hand out with the right kind of physicists. For weakening in logic, see wikipedia on disjuntive addition, and related is the problem of weakening from or with a contradiction, which is well known (Smullyan 1st order logic, perhaps, or just wikipedia) - I'm not saying that this is formally what is happening though, I was being careful not to in fact. .../... – Lucas Jun 13 '13 at 01:02
  • .../... as for generally not wanting to talk about things that don't exist, see pragmatism in philosophy, or even positivism for that matter. For what to do about statements that are just nonsense, see positivism too. – Lucas Jun 13 '13 at 01:07
  • @Did I think this is the most relevant wikipedia overall: http://en.wikipedia.org/wiki/Pragmatic_maxim – Lucas Jun 13 '13 at 01:11
  • I fail to see how these are even related to the question under discussion (which is whether the distributions of real valued random variables should be considered each as a measure on its own support, or all as measures on the whole real line), but maybe I am being dense. Anyway, thanks for the effort. – Did Jun 13 '13 at 01:24
  • @Did Simply put: defining a support is specifying the possible states of a system. In doing so you should be parsimonious (in terms of what is described, not how easy the maths is). – Lucas Jun 13 '13 at 01:42
  • Note that parsimony could suggest to fix once and for all the support, that is, the opposite of what you suggest. // Anyway, to summarize the discussion, I think we reached the following conclusions: your suggestion is highly personal, it has obvious drawbacks, it brings no advantage. As a consequence, I can only suggest to read some canonical literature on the subject, at least to know what people are doing (and possibly, why they are doing it). – Did Jul 05 '13 at 07:53
  • Not really, that would most likely be choosing your description so that the maths is easy, not for the type of simplicity described by the pragmatic maxim. // we is the incorrect personal pronoun here (and typically rude) - the advantage is realism, whilst I understand that doesn't matter for many mathematicians, it does to others. Just because I think using $1_x$ is stupid, does not mean that I do not know why others choose to use it, in fact I use it myself sometimes, but I still consider it to be a hack. Likewise, writing a computer algebra system that way would end in disaster. – Lucas Jul 05 '13 at 13:40
  • "We" is typically rude? Wow. "Realism doesn't matter for many mathematicians"? Wow-wow. (Not sure I succeeded in drawing your attention to the insularity of your analysis but, hey, at least I tried...) – Did Jul 05 '13 at 13:52
  • Yes - we did not reach the said conclusions, you did, passing your own judgement off like we'd actually discussed it, agreed, and come to some mutual agreement is rude. You were pretty adamant that you didn't follow what I was saying, so I do not see how that could be the case, unless you were only feigning ignorance, which would be both rude and annoying. Yes, there are plenty of mathematicians who like the notion that mathematical truth is detached from reality, not saying your one of them, but they most definitely exist. Now, if you do indeed wish to "[draw] my attention to the ... – Lucas Jul 05 '13 at 14:51
  • insularity of [my] analysis", perhaps just say that, instead of pretending that you actually care about what I have to say. It would save us both a lot of time. – Lucas Jul 05 '13 at 14:53
  • I was curious, which is the reason why I spent some time first trying to make you explicitly state your stance, then to understand it (and why I chose to ignore as long as possible your innuendos about me, mathematicians, whatnot). In the end, after these explanations on your part, yes, I feel that your position on this subject is insular and that your inability to cite any source is revealing. Whether such an opinion suits you or not, no drama is needed and I entirely agree to leave things as they are. – Did Jul 05 '13 at 15:35
1

You can use moment generating functions like so - if we know that $X$ and $Y$ are independent with moment generating functions $M_{X}(t) = \dfrac{\lambda}{\lambda - t}$ and $M_{Y}(t) = \dfrac{\mu}{\mu - t}$, then by definition,

$M_{Z}(t)=M_{X-Y}(t) = E\left(e^{(X-Y)t}\right) = E \left( e^{Xt}e^{-Yt} \right) = E\left(e^{Xt}\right)E\left(e^{-Yt}\right)=M_{X}(t)E\left(e^{Y(-t)}\right)=M_{X}(t)M_{Y}(-t)=\dfrac{\mu \lambda}{(\lambda - t)(\mu - (-t))} = \dfrac{\mu \lambda}{(\lambda - t)(\mu + t)} = \dfrac{\mu \lambda}{\lambda \mu + \lambda t-\mu t- t^2}$.

This moment generating function doesn't look familiar to me, unfortunately.

Clarinetist
  • 19,519
  • 1
    The very last expression for $M_Z(t)$ is incorrect. – Did Jun 12 '13 at 11:58
  • Ah, thanks for catching that! :) – Clarinetist Jun 12 '13 at 15:09
  • The (present) last expression of $M_Z(t)$ is not useful. Rather one could note that $M_Z(t)=p\lambda/(\lambda-t)+(1-p)\mu/(\mu+t)$ for some $p$ in $(0,1)$ and identify each $t\mapsto \nu/(\nu\pm t)$ as a well-known MGF. – Did Jun 12 '13 at 15:28