How to prove $ E(|X-Y|) \le E(|X+Y|)$ when $X,Y$ are i.i.d variables?

Question

Let $X$ and $Y$ be independent random variables having the same distribution and the finite mathematical expectation. How to prove the inequality $$ E(|X-Y|) \le E(|X+Y|)?$$

@ Calvin Lin : It means that their distribution functions are identical. Could you explain what you mean by IID? — user64494, Jun 03 '13 at 04:19
@user64494 IID is shorthand for “independent, identically distributed” — Ewan Delanoy, Jun 06 '13 at 07:55
@user64494: could you tell us where you have found this inequality? — Siméon, Oct 02 '13 at 11:01

Siméon · Accepted Answer · 2013-06-09T10:59:01.757

After a little inspection, we see that $$ E(|X+Y|-|X-Y|) = 2E[Z(1_{XY\geq 0}-1_{XY<0})] $$ where $Z = \min(|X|,|Y|)$.

Remember that for any non-negative random variable $T$, $$ E(T) = \int_0^\infty P(T>t)\,dt. $$

We apply this with $T=Z\,1_{X \geq 0, Y\geq 0}$, $T=Z\,1_{X < 0, Y< 0}$ and $T=Z\,1_{X \geq 0, Y< 0}$. Since $\{Z > t\} = \{|X| > t\}\cap\{|Y| > t\}$, we obtain

$$ E(Z \,1_{X \geq 0,Y\geq 0}) = \int_0^\infty P(X > t)P(Y > t)\,dt = \int_0^\infty P(X > t)^2\,dt $$

$$ E(Z\, 1_{X < 0, Y < 0}) = \int_0^\infty P(X < -t)P(Y < - t)\,dt = \int_0^\infty P(X < -t)^2\,dt $$

$$ E(Z\,1_{X \geq 0, Y< 0}) = E(Z\,1_{\{X < 0, Y \geq 0\}}) = \int_0^\infty P(X > t)P(X < -t)\,dt $$

So finally, $$ E(|X+Y|-|X-Y|) = 2\int_0^\infty (P(X>t)-P(X<-t))^2\,dt \geq 0 $$

Remark 1. The inequality is an equality if and only if the distribution of $X$ is symetric, that is $P(X > t) = P(X < -t)$ for any $t \geq 0$.

Remark 2. When $|X|=1$ a.s. the inequality is nothing but the semi-trivial fact that if $X$ and $Y$ are independent with same distribution, then $P(XY \geq 0) \geq \dfrac{1}{2}$.

Remark 3. It is worthwile to mention a nice corollary : $E(|X+Y|) \geq E(|X|)$. The function $x \mapsto |x|$ is convex hence $|X| \leq \frac{1}{2}(|X+Y|+|X-Y|)$. Taking expectations we find $$ \Bbb E(|X+Y|-|X|) \geq \frac{1}{2}\Bbb E(|X+Y|-|X-Y|) \geq 0. $$ Furthermore, there is an equality if and only if $X=0$ a.s.

Nice! I'm perfectly satisfied with this answer, I’ll wait before awarding the bounty to it, just in case someone comes up with an even better proof. — Ewan Delanoy, Jun 08 '13 at 15:01
I am ok with that. Notice that I added some remarks in my answer. — Siméon, Jun 09 '13 at 11:06

André Nicolas · Answer 2 · 2013-05-22T16:05:31.937

5

Edit: Question has changed. Will give answer when time permits.

By the linearity of expectation, the inequality $E(X-Y)\le E(X+Y)$ is equivalent to $-E(Y)\le E(Y)$, which in general is false. It is true precisely if $E(Y)\ge 0$.

Independence is not needed for the argument. Neither is the hypothesis that the random variables have the same distribution.

edited May 22 '13 at 16:05

answered May 22 '13 at 15:59

André Nicolas

507,029

Sorry, I missed the moduli. It has been fixed. – user64494 May 22 '13 at 16:03

zyx · Answer 3 · 2013-06-09T16:38:48.217

4

Let's consider the question of when $E[f(X,Y)] \geq 0$ in the generality of real-valued functions of arbitrary i.i.d. random variables on probability spaces. With no loss of generality take $f$ to be symmetric in $X$ and $Y$, because $E[f]$ is the same as $E$ of the symmetrization of $f$.

There is a simple, and greatly clarifying, reduction to the case of random variables with at most two values. The general case is a mixture of such distributions, by representing the selection of $(X,Y)$ as first choosing an unordered pair according to the induced distribution on those, and then the ordered pair conditional on the unordered one (the conditional distribution is the $1$ or $2$-valued distribution, and the weights in the mixture are the probability distribution on the de-ordered pair). One then sees, after some more or less mechanical analysis of the 2-valued case, that the key property is:

$f(x,y)=|x+y| - |x-y|$, the symmetric function for which we want to prove $E[f(X,Y)] \geq 0$, is diagonally dominant. That is, $f(x,x)$ and $f(y,y)$ both are larger than or equal to $|f(x,y)|$. By symmetry we really need only to check one of those conditions, $\forall x,y \hskip4pt f(x,x) \geq |f(x,y)|$.

A function satisfying these conditions, now on a general probability space, has non-negative expectation in the 2-valued case, because for $p+q=1$ (the probability distribution), $$E[f] = p^2 f(a,a) + q^2 f(b,b) + 2pq f(a,b) \geq (p-q)^2|f(a,b)| \geq 0$$

The equality cases when expectation is zero are when $p=q$ and $f(a,b) = -f(a,a) = -f(b,b)$. For 1-valued random variables, equality holds at values where $f(p,p)=0$. Due to diagonal dominance these are null points, with $f(p,x)=0$ for all $x$.

This allows a generalization and proof of Ewan Delanoy's observation, in the general situation: if the support of the random variable has an involution $\sigma$ such that $\sigma(p)=p$ for null points and for non-null points $b=\sigma(a)$ is the unique solution of $f(a,a)=f(b,b)=-f(a,b)$, then the expectation is zero (when finite) if and only if the distribution is $\sigma$-invariant. That is because the expectation zero case must be a mixture of the $1$ and $2$-atom distributions with zero expectation, and all of those assign probability in a $\sigma$-invariant way to the atoms.

Returning to the original problem, for $f(x,y)=|x+y| - |x-y|$ with the absolute value interpreted as any norm on any vector space, diagonal dominance follows from the triangle inequality, $0$ is a unique null point, and the involution pairing every non-null $x$ with the unique solution of $f(x,y)=-f(x,x)=-f(y,y)$ is $x \to -x$. This recovers the characterization that the distribution is symmetric in the critical case, for any $f$ derived from a norm.

Note (*). In passing between ordered and unordered pairs, there might be some issue of "measurable choice" on general measure spaces, or not, and it is an interesting matter what exactly is true about that and whether any condition is needed on the measure space. In the original problem one has a selection function $(\min(X,Y),\max(X,Y))$, if needed to avoid any complications, and the same would be true in any concrete case by using order statistics on coordinates.

edited Jun 09 '13 at 16:38

answered Jun 09 '13 at 16:22

zyx

35,436

Note that mixtures of diagonally dominant 2x2 matrices are diagonal dominant in the linear algebra sense, so the terminology is consistent, and one can quote the theorem on positive-semidefinite nature of such matrices as another argument. – zyx Jun 09 '13 at 16:49
Beautiful and unexpected. All that you say here, I had already guessed more or less intuitively, but I could not find a formal proof. So this proof is like my dream come true. – Ewan Delanoy Jun 09 '13 at 16:53
I am not sure I understand fully your reduction to the two-valued case. How do you choose the unordered pair and how do you manage the case $X=Y$? – Siméon Oct 02 '13 at 06:56
1

I apologize for my poor English, but I cannot understand what you said in the second paragraph. Did you mean that you had proved the following statement? If $X$ and $Y$ are i.i.d. and if $f(x,y)$ satisfies that $f(x,y)=f(y,x)$ and $f(x,x)\ge |f(x,y)|$ for any $(x,y)$, then $E[f(X,Y)]\ge 0$. However, the statement is clearly false in general. @Ju'x: What is you opinion about my comment? – 23rd Oct 04 '13 at 15:25
@Landscape, it is a few months since I wrote the above, so I do not remember exactly what I meant, but I think that is for the $2$-element case and the statement of diagonal dominance needed in the general case is for mixtures (weighted averages) of $\leq 2$-element cases, $f(x,x) \geq \int_y |f(x,y)|$. If you write it in this "global" form one can write the proof without any reduction to the $2$-element case. However, if the reduction is correct then we do not need to think about globalizing the condition and can reason about $2$-element situations. – zyx Oct 04 '13 at 15:58
@Ju'x, I thought it was enough that a probability measure exists for the unordered pairs, but let me consider your question and edit the answer in a few days. – zyx Oct 04 '13 at 16:03
(@Landscape : the integral is of the non diagonal $f(x,y)$, with $y \neq x$, or write the inequality as $2f(x,x) \geq \int_y$ if integrating on all $y$. The idea is that a diagonal term dominates its row and column. The inequality is equivalent to the statement that $f(x,y)$ is a positive linear combination of functions $F$ on $1$- and $2$-element sets with $F(x,x) \geq |F(x,y)|$ and $F(x,y)=F(y,x)$. ) – zyx Oct 04 '13 at 16:21
@zyx: Thank you for your reply. Sorry, I am still confused. By writing $\int_y |f(x,y)|$("integrating on all $y$"), do you mean the conditional expectation $E[|f(X,Y)|\big|X=x]$? – 23rd Oct 04 '13 at 16:38
Yes, but the integration condition should imply that if $g(X)$ is any function with a finite number of values, the matrix expressing the probability density of $(g(X),g(Y))$ is diagonally dominant. If this does not follow from the from the conditional expectation inequality alone, it can be added as part of the condition on $f$. In other words, the condition on $f$ is "whatever is equivalent to a mixture of $2$ element distributions for purposes of the argument". There is still Ju'X question of whether the reduction that avoids these globalized conditions is valid. @Landscape – zyx Oct 04 '13 at 17:19
(@Ju'x may be interested in the conversation here. As I mentioned, I will edit the answer in a few days when I have more time.) – zyx Oct 04 '13 at 17:24
@zyx: just like Landscape, I am still a bit confused by your arguments. I will be happy to read the details in your edited answer and to pursue the discussion from this basis. – Siméon Oct 04 '13 at 18:02
@zyx: However, the integration condition is weaker than $f(x,x)\ge|f(x,y)|$, so it's insufficient to deduce $E[f(X,Y)]\ge 0$. Maybe I didn't make I point clear at the beginning. What I want to ask is just the same as Ju'x, i.e. how you could reduce the problem to $2$-value case. Since I cannot understand what you said in the second paragraph of your answer, I guessed your reduction to $2$-value case could imply the statement mentioned in my first comment. Now it seems that I misunderstood your meaning, so let me wait for you to update your answer. Sorry for bothering you so much. – 23rd Oct 04 '13 at 18:04
Thank you for the questions. It is better to have things clear than vague (or wrong). @Landscape – zyx Oct 04 '13 at 18:16
@Landscape : anything new about this approach? I've been able to extend my own proof to independent random variables $X,Y$ (with possibly distinct distributions) in a Euclidian space. The inequality becomes $E(|X-Y|) \geq \frac{1}{2}E(|X-X'|)+\frac{1}{2}E(|Y-Y'|)$ where $X',Y'$ are independent copies. – Siméon Nov 04 '13 at 10:33
@Siméon: I am the same user as "Landscape". I changed my user name one month ago, so I didn't notice your message until I come back to this post now. Unfortunately, I have nothing new about either zyx's approach or any other intuitive way in solving the original problem. I wasn't so self-confident that I could solve it by myself within a short time, so I even didn't think about it deeply. However, I am still interested, so please let me know if you succeed in finding a more intuitive way to solve the original problem, and then I can post a question and/or set a bounty for you to answer. – 23rd Nov 25 '13 at 09:28

score 3 · Answer 4 · edited May 26 '13 at 15:02

Below is a set of remarks that’s too long to be put in a comment.

Conjecture. The inequality becomes an equality iff $-X$ has the same distribution as $X$.

Remark 1. The “if” part of the conjecture is easy : if $X$ and $-X$ have the same distribution, then by the independence hypothesis $(X,Y)$ and $(X,-Y)$ have the same joint distribution, therefore $|X+Y|$ and $|X-Y|$ share the same distribution, so they will certainly share the same expectation.

Remark 2. Let $\phi_n(t)=t$ if $|t| \leq n$ and $0$ otherwise. If the inequality holds for any $(\phi_n(X),\phi_n(Y))$ for any $n$, then it will hold for $(X,Y)$ also, by a dominated convergence argument. So we may assume without loss of generality that the support of $X$ is bounded.

Thank you. It is helpful. It would be interesting to answer the question in the partial case of absolutely continuous distributions. — user64494, May 31 '13 at 06:52

mercio · Answer 5 · 2013-06-08T12:02:56.670

let $F(x) = P(X < x)$. I assume that $F$ is differentiable so there is no atom and $F'$ is the cdf of $X$ (and $Y$).

$E(|X+Y|) - E(|X-Y|) = E(|X+Y|-|X-Y|) \\ = 2E(X \; 1_{Y \ge |X|} + Y \; 1_{X \ge |Y|} - X \; 1_{-Y\ge |X|} - Y \; 1_{-X \ge |Y|}) \\ = 4E(X (1_{Y \ge |X|} - 1_{-Y \ge |X|})) \\ = 4E(X(1-F(-X)-F(X))) \\ = 4 \int_\Bbb R x(1-F(-x)-F(x))F'(x)dx \\ = 4 \int_\Bbb R (-x)(1-F(x)-F(-x))F'(-x)dx \\ = 2 \int_\Bbb R x(1-F(x)-F(-x))(F'(x)-F'(-x))dx \\ = \int_\Bbb R (1-F(x)-F(-x))^2dx - [x(1-F(x)-F(-x))^2]_\Bbb R \\ = \int_\Bbb R (1-F(x)-F(-x))^2dx \ge 0$

I am not entirely sure about the last step. $G(x) = 1-F(x)-F(-x)$ does converge to $0$ at both ends, and $G$ has finite variation. But still I am not convinced we can't carefully pick $F$ such that the bracket doesn't vanish.

However this is valid if $X$ has compact support or if $G(x)$ vanishes quickly enough (like the normal distribution for example). In this case it also proves Ewan's conjecture : the difference is $0$ if and only if the distribution is symmetrical with respect to $0$.

For general $F$, you can find the proof in this answer. – Zhanxiong Jan 22 '23 at 02:45 — Zhanxiong, Jan 22 '23 at 02:45

How to prove $ E(|X-Y|) \le E(|X+Y|)$ when $X,Y$ are i.i.d variables?

5 Answers5

Linked