Why does the median minimize $E(|X-c|)$?

Question

Suppose $X$ is a real-valued random variable and let $P_X$ denote the distribution of $X$. Then $$ E(|X-c|) = \int_\mathbb{R} |x-c| dP_X(x). $$ The medians of $X$ are defined as any number $m \in \mathbb{R}$ such that $P(X \leq m) \geq \frac{1}{2}$ and $P(X \geq m) \geq \frac{1}{2}$.

Why do the medians solve $$ \min_{c \in \mathbb{R}} E(|X-c|) \, ? $$

Did · Accepted Answer · 2016-11-24T06:04:34.543

63

For every real valued random variable $X$, $$ \mathrm E(|X-c|)=\int_{-\infty}^c\mathrm P(X\leqslant t)\,\mathrm dt+\int_c^{+\infty}\mathrm P(X\geqslant t)\,\mathrm dt $$ hence the function $u:c\mapsto \mathrm E(|X-c|)$ is differentiable almost everywhere and, where $u'(c)$ exists, $u'(c)=\mathrm P(X\leqslant c)-\mathrm P(X\geqslant c)$. Hence $u'(c)\leqslant0$ if $c$ is smaller than every median, $u'(c)=0$ if $c$ is a median, and $u'(c)\geqslant0$ if $c$ is greater than every median.

The formula for $\mathrm E(|X-c|)$ is the integrated version of the relations $$(x-y)^+=\int_y^{+\infty}[t\leqslant x]\,\mathrm dt$$ and $|x-c|=((-x)-(-c))^++(x-c)^+$, which yield, for every $x$ and $c$, $$ |x-c|=\int_{-\infty}^c[x\leqslant t]\,\mathrm dt+\int_c^{+\infty}[x\geqslant t]\,\mathrm dt $$

edited Nov 24 '16 at 06:04

answered Nov 25 '11 at 07:54

Did

279,727

Thanks! (1) By "integrated version", is it to first integrate your last formula wrt $P$ over $\mathbb{R}$, then apply Fubini Thm to exchange the order of the two integrals, and so get the first formula? (2) If temporarily change the notation $P$ to represent the cdf of $X$, is Sivaram's Edit correct? Specifically, do those Riemann-Stieltjes integrals exist? – Tim Nov 25 '11 at 08:16
@Rasmus, you are right, thanks, misprint corrected. – Did Nov 25 '11 at 08:21
Tim: Much too complicated... The last equation of my post states that $u(x)=v(x)$ for every $x$, for some functions $u$ and $v$. Then $E(u(X))=E(v(X))$, end of story. (And if you have questions on @Sivaram's post, ask Sivaram.) – Did Nov 25 '11 at 08:24
Dear Did, I know the definition $$E(X)=\int X dP = \int \mathrm{id} d(X_* P) = \int x\cdot X_* P(dx)$$ for an integrable real random variable $X$. I don't understand how you can derive your first equation from this. Could you help me, please? Thank you. – user8463524 May 10 '16 at 09:16
@user8463524 This follows from the (deterministic) identity, valid for every real number $x$, $$|x-c|=\int_{-c}^{+\infty}\mathbf 1_{x\leqslant -t},\mathrm dt+\int_c^{+\infty}\mathbf 1_{x\geqslant t},\mathrm dt.$$ – Did May 10 '16 at 10:00
7

This is a very nice proof. A clarification for future readers, who, like me, would be perplexed by the notation $[x\leq-t]$ and $[x\geq t]$: If $A$ is an event, $[A]$ denotes the indicator function $\mathbb{1}A$. In particular, $[x\leq-t] = \mathbb{1}{{x \leq -t}}$, and likewise $[x\geq t] = \mathbb{1}_{{x\geq t}}$. – Evan Aad Oct 18 '16 at 09:31
There's something I don't understand about the proof. You claim that the function $c\mapsto E(|X-c|)$ is everywhere differentiable and that its derivative at the point $c$ equals $P(X\leq c) - P(X\geq c)$. But isn't this the case only for those $c$ where both $c\mapsto P(X\leq c)$ and $c\mapsto P(X\geq c)$ are continuous? But then the proof would not hold for every random variable $X$, in contradiction to your opening statement. – Evan Aad Oct 18 '16 at 10:49
@EvanAad Almost everywhere, sorry about that. Yes the proof works for every random variable. – Did Oct 18 '16 at 14:27
Thanks for the correction. The good news is that now I see why the deduction is sound. The bad news is that now I fail to see how it solves OP's question. I may be wrong, but it seems to me that in order to tie your answer to OP's question you implicitly invoke the first derivative test for optimality. – Evan Aad Oct 18 '16 at 15:55
The problem with this test, though, is that it assumes (at least in the Wikipedia version linked to above) that the derivative exists everywhere in the interval where optimality is to be determined (this interval being $\mathbb{R}$ in the case of OP's question) except possibly at a single point, namely the point of optimality. – Evan Aad Oct 18 '16 at 15:55
2

@EvanAad Adding convexity to the pot will allow you to conclude. – Did Oct 18 '16 at 16:11
@EvanAad Please stop modifying my answer. – Did Oct 20 '16 at 10:03
Since differentiability is only present almost everywhere, how did you conclude by the signs of $u'(c)$ where they exist that the minimiser must lie where $u'(c)=0$? Is there a generalisation of Fermat's lemma? (I doubt it, though, since Cantor's function has zero derivatives almost everywhere yet doesn't attain minimum or maximum anywhere in between). – Vim Nov 24 '16 at 02:55
8

@Vim The convexity of the function $u$ makes these objections moot, but here is a more direct route: for every $x$ and $c$, $$ |x-c|=\int_{-\infty}^c[x\leqslant t],\mathrm dt+\int_c^{+\infty}[x>t],\mathrm dt $$ hence, for every median $m$, $$E(|X-c|)=E(|X-m|)+\int_m^cv(t)dt$$ with $$v(t)=P(X\leqslant t)-P(X>t)=2P(X\leqslant t)-1$$ Then $v$ is nondecreasing and $v(m)\geqslant0$ hence, for every $c>m$, $v\geqslant0$ on $(m,c)$, which implies $E(|X-c|)\geqslant E(|X-m|)$. Likewise for $c<m$. – Did Nov 24 '16 at 06:03
@Did I later saw in Evan's answer about the convexity of $u$, although it wasn't mentioned explicitly in yours. Thanks anyway for your more direct approach. – Vim Nov 24 '16 at 06:07
@Vim See comment on Oct 18 at 16:11. – Did Nov 24 '16 at 06:09
1

why is the first equation true? I´ve tried to find an explanation on the net but it seems to be intuitiv for everyone but me – Lillys Nov 16 '20 at 16:34
this is not quite right for every random variable. Imagine the random variable being uniform on $[0,1]$ with probability $1/2$ and $1$ with probability $1/2$, then $u'(c)=P(X\le c) - P(X\ge c) = \tfrac{c}2 - (\tfrac12 + \tfrac{1-c}{2}) = c -1$ for all $c<1$ while $u'(1) = 0.5$ and $u'(c)=1$ for all $c>1$ therefore there is no $m$ such that $c'(m)=0$. And at the median 1, $u'$ is greater than zero – Felix B. Feb 14 '22 at 15:42

score 21 · Answer 2 · 2011-11-25T07:38:36.527

21

Let $f$ be the pdf and let $J(c) = E(|X-c|)$. We want to maximize $J(c)$. Note that $E(|X-c|) = \int_{\mathbb{R}} |x-c| f(x) dx = \int_{-\infty}^{c} (c-x) f(x) dx + \int_c^{\infty} (x-c) f(x) dx.$

To find the maximum, set $\frac{dJ}{dc} = 0$. Hence, we get that, $$\begin{align} \frac{dJ}{dc} & = (c-x)f(x) | _{x=c} + \int_{-\infty}^{c} f(x) dx + (x-c)f(x) | _{x=c} - \int_c^{\infty} f(x) dx\\ & = \int_{-\infty}^{c} f(x) dx - \int_c^{\infty} f(x) dx = 0 \end{align} $$

Hence, we get that $c$ is such that $$\int_{-\infty}^{c} f(x) dx = \int_c^{\infty} f(x) dx$$ i.e. $$P(X \leq c) = P(X > c).$$

However, we also know that $P(X \leq c) + P(X > c) = 1$. Hence, we get that $$P(X \leq c) = P(X > c) = \frac12.$$

EDIT

When $X$ doesn't have a density, all you need to do is to make use of integration by parts. We get that $$\displaystyle \int_{-\infty}^{c} (c-x) dP(x) = \lim_{y \rightarrow -\infty} (c-y) P(y) + \displaystyle \int_{c}^{\infty} P(x) dx.$$ Similarly, we also get that $$\displaystyle \int_{c}^{\infty} (x-c) dP(x) = \lim_{y \rightarrow \infty} (y-c) P(y) - \displaystyle \int_{c}^{\infty} P(x) dx.$$

edited Nov 25 '11 at 07:38

answered Nov 25 '11 at 07:07

Thanks! But does $X$ always have a density? – Tim Nov 25 '11 at 07:09
@Tim: I don't think it is hard to adapt the same idea for the case when $X$ doesn't have a density. – Nov 25 '11 at 07:15
So you are thinking $P$ as cdf of $X$? – Tim Nov 25 '11 at 07:30
don't you want to say minimize instead of maximize ? – Valentin Feb 18 '21 at 13:11
It doesn't change the structure of the proof but could there be a typo in $\frac{dJ}{dc} = (c-x)f(x) | {x=c} + \int{-\infty}^{c} f(x) dx + (x-c)f(x) | _{x=c} - \int_c^{\infty} f(x) dx$? Applying https://en.wikipedia.org/wiki/Leibniz_integral_rule, it seems that the third term should be $- (x-c)f(x) | _{x=c}$ instead of $+ (x-c)f(x) | _{x=c}$. – FZS Nov 28 '22 at 02:49

grand_chat · Answer 3 · 2020-10-26T17:55:46.927

18

Let $m$ be any median of $X$. Wlog, we can take $m=0$ (consider $X':=X-m$). The aim is to show $E|X-c|\ge E|X|$.

Consider the case $c\ge 0$. It is straightforward to check that $|X-c|-|X|=c$ when $X\le0$, and $|X-c|-|X|\ge -c$ when $X>0$. It follows that $$ (|X-c|-|X|)\,I(X\le0)=c\,I(X\le0)\tag1 $$ and $$(|X-c|-|X|)\,I(X>0)\ge-c\,I(X>0).\tag2 $$ Adding (1) and (2) and taking expectation yields $$ E(|X-c|-|X|)\ge c\left[P(X\le0)-P(X>0)\right].\tag3 $$ The RHS of (3) equals $c\,[2P(X\le0)-1]$, which is non-negative since $c\ge0$ and zero is a median of $X$. The case $c\le0$ is reduced to the previous one by considering $X':=-X$ and $c':=-c$.

edited Oct 26 '20 at 17:55

answered May 21 '18 at 18:05

grand_chat

38,951

Sorry, I know this should be easy but would you be able to elaborate on how the case $c\leq 0$ reduces to the case you proved? – EE18 Oct 18 '20 at 21:31
2

@1729_SR In the case $c\le0$, define $c':=-c$ and $X':=-X$. Then $0$ is still a median of $X'$, while $c'\ge0$, so by the just-proved case we deduce $E|X'-c'|\ge E|X'|$. Now observe that $|X'-c'|=|X-c|$ and $|X'|=|X|$. Substituting, we conclude $E|X-c|\ge E|X|$. – grand_chat Oct 19 '20 at 02:42
Thanks very much for the clarification. I suspected the proof would go something like that (and I do promise that I wrestled with it before asking!). When I did the problem, I didn't make the slick "take $m=0$ argument that you did so I actually kept $m$ all the way through. Thus I was missing the key bit that $m(x) = -m(-x)$ where here I am interpreting $m$ as a function producing the (a?) median of a given random variable. Thanks again! – EE18 Oct 19 '20 at 12:38
how is this without loss? If replace $X:=X-m$, we can conclude that $\mathbb{E}|X-m-c|-\mathbb{E}|X-m|\geq c(\mathbb{P}(X\leq m)-1)\geq 0$, then we see $\mathbb{E}|X-m-c|\geq \mathbb{E}|X-m|$ Then, we can only conclude that $\mathbb{E}|X-m-c|$ is minimized at $c=0$? – JacobsonRadical Oct 23 '23 at 12:04
@JacobsonRadical The statement $E|X-m-c|\ge E|X-m|$ for all $c$ is equivalent to the statement $E|X-c|\ge E|X-m|$ for all $c$, which is what OP wants to prove. – grand_chat Nov 06 '23 at 00:36

score 5 · Answer 4 · edited Apr 13 '17 at 12:21

The following intends to complement Did's answer.

Claim

Denote by $M$ be the set of $X$'s medians. Then

$M = [m_1, m_2]$ for some $m_1, m_2 \in \mathbb{R}$, such that $m_1 \leq m_2$.

For every $m \in M$ and for every $x \in \mathbb{R}$ we have $$ E\left(|X-m|\right) \leq E\left(|X-x|\right). $$ (In particular, $m\mapsto E\left(|X-m|\right)$ is constant on $M$.)

Part 2's proof builds on Did's answer.

Proof

It is known that $M \neq \emptyset$. Define $$ \begin{align} M_1 &:= \left\{t\in\mathbb{R}\ |\!:\ F_X(t) \geq \frac{1}{2}\right\}, \\ M_2 &:= \left\{t\in\mathbb{R}\ |\!:\ P(X<t) \leq \frac{1}{2}\right\}. \end{align} $$ Then $M = M_1 \cap M_2$. It therefore suffices to show that $M_1 = [m_1, \infty)$ and that $M_2 = (-\infty, m_2]$, for some $m_1, m_2 \in \mathbb{R}$.

Since $\lim_{t\rightarrow-\infty}F_X(t) = 0$, $M_1$ is bounded from below. Since $\lim_{t\rightarrow\infty}F_X(t) = 1$, $M_1$ is an interval that extends to infinity. Hence $M_1 = (m_1,\infty)$ or $M_1 = [m_1,\infty)$, for some $m_1 \in \mathbb{R}$. It follows from $F_X$'s right-continuity that $m_1 \in M_1$. An analogous argument shows that $M_2 = (-\infty,m_2]$ (just verify that $t\mapsto P(X<t)$ is left-continuous).
Define a function $f:\mathbb{R}\rightarrow\mathbb{R}$ as follows. For every $c \in \mathbb{R}$, set $$ f(c) := E\left(|X-c|\right). $$

We will begin by showing that $f$ is convex. Let $a, b \in \mathbb{R}$, and let $t \in (0,1)$. Then $$ \begin{align} f\left(ta+(1-t)b\right) &= E\left(\left|X-\left(ta+(1-t)b\right)\right|\right) \\ &= E\left(\left|\left(tX-ta\right)+\left((1-t)X-(1-t)b\right)\right|\right) \\ &\leq E\left(\left|\left(tX-ta\right)\right|+\left|\left((1-t)X-(1-t)b\right)\right|\right) \\ &=E\left(\left|\left(tX-ta\right)\right|\right)+E\left(\left|\left((1-t)X-(1-t)b\right)\right|\right) \\ &= t\ E\left(|X-a|\right) + (1-t)\ E\left(|X-b|\right) \\ &= t\ f(a) + (1-t)\ f(b). \end{align} $$

Since $f$ is convex, then, by Theorem 7.40 of [1] (p. 157), there exists a set $A \subseteq \mathbb{R}$ such that $\mathbb{R}\setminus A$ is countable, and such that $f$ is finitely differentiable on $A$. Moreover, letting $m \in M$, and letting $x \in (-\infty, m_1)$, Theorem 7.43 of [1] (p. 158) yields that $f'$ is Lebesgue-integrable on $[x,m] \cap A$, and that $$ f(m) - f(x) = \int_{[x,m]\cap A} f'\ d\lambda. $$

Applying Did's answer, we find that $f'\leq 0$ on $[x,m]\cap A$. Hence $f(m) \leq f(x)$. Similar considerations show that, for every $x \in (m_2,\infty)$, $f(m) \leq f(x)$, and also that $f(m) = f(m_1)$ (implying that $f$ is constant on $M$, since $m$ was chosen arbitrarily in $M$).

(The argument of the last paragraph was suggested to me by copper.hat in their answer to a related question of mine.)

Q.E.D.

References

[1] Richard L. Wheeden and Antoni Zygmund. Measure and Integral: An Introduction to Real Analysis. 2nd Ed. 2015. CRC Press. ISBN: 978-1-4987-0290-4.

Thanks. Now I understand why if $M$ is an interval then any point where $f$ assumes zero derivative is a global minimiser of $f$. However, what if $M$ is a singleton and $f$ is not differentiable there? (Also, could you give the name of the Lebesgue integrability theorem you invoked?) — Vim, Nov 24 '16 at 03:17
@Vim: 1. A singleton ${s}$ is an interval of the from $[m_1, m_2]$, $m_1\leq m_2$ with $m_1:=m_2:=s$. 2. Here's a link to the theorem. — Evan Aad, Nov 24 '16 at 09:54
I was not asking whether a singleton is an interval or not, rather, I was thinking the how to apply the convexity in this case. Anyway it seems already solved to me now: even though $f$ can fail to be differentiable at this point, its left and right derivatives surely exist by convexity, and the left one is $\le 0$ and the right one $\ge 0$ by the definition of the median. — Vim, Nov 24 '16 at 10:06
indeed. I had been actually mainly reading the link in your answer, which seemed a bit simpler so that I could grasp it with less knowledge basis. The singleton concern arose from there but not from this answer. — Vim, Nov 24 '16 at 12:40

score 1 · Answer 5 · answered Nov 14 '21 at 15:32

Let $Y=\left|X-c\right|$,

Then, $$E(Y) = \int_0^\infty \left(1-F_Y(y)\right) dy$$

Note that, $F_Y(y) = F_X(c+y) - F_X(c-y),$

Thus $$ \begin{align} E(Y) &= \int_0^\infty \big( 1-F_X(c+y) + F_X(c-y) \big) dy \\ \frac{d E(y)} {dc} &= \int_0^\infty \big(-f_X(c+y) + f_X(c-y) \big) dy \\ &=\int_0^\infty f_X(c-y) dy - \int_0^\infty f_X(c+y) dy \\ &= \int_{-\infty}^c f_X(x) dx - \int^{\infty}_c f_X(x) dx \\ & = F_X(c) - (1 - F_X(c)) \end{align} $$

Equating it to zero, we have,

$$F_X(c) = \frac{1}{2}$$

Hence median is the minimiser of $E(|X-c|)$.

score 0 · Answer 6 · answered Nov 25 '23 at 06:45

Due to the fact that

$\forall x,c \in \mathbb{R}, \\|x-c| = (x-c)\unicode{x1D7D9}_{\{x>c\}} + (c-x)\unicode{x1D7D9}_{\{x\leq c\}} \\= \int_c^x \unicode{x1D7D9}_{\{x>c\}} \,dt + \int_x^c \unicode{x1D7D9}_{\{x \leq c\}} \,dt \\ = \int_c^{\infty} \unicode{x1D7D9}_{\{t<x\}} \,dt + \int_{-\infty}^{c} \unicode{x1D7D9}_{\{t \geq x\}} \,dt,$

$\forall$ continuous real-valued random variable $X$, $c \in \mathbb{R},$ by linearity of expectation,

$\mathbb{E}[|X-c|] = \mathbb{E}[(X-c)\unicode{x1D7D9}_{\{X>c\}}] + \mathbb{E}[(c-X)\unicode{x1D7D9}_{\{X\leq c\}}] $

$=\int_{-\infty}^{\infty} \int_c^{\infty} \unicode{x1D7D9}_{\{x>t\}} \,dt \,dx + \int_{-\infty}^{\infty} \int_{-\infty}^c \unicode{x1D7D9}_{\{x \leq t\}} \,dt \,dx.$

Since random variable and indicator function are measurable, by Fubini's theorem,

$= \int_c^{\infty} \int_{-\infty}^{\infty} \unicode{x1D7D9}_{\{t<x\}} \,dx \,dt + \int_{-\infty}^c \int_{-\infty}^{\infty} \unicode{x1D7D9}_{\{t \geq x\}} \,dx \,dt$

$=\int_c^{\infty} \mathbb{E}[\unicode{x1D7D9}_{\{t<X\}}] \,dt + \int_{-\infty}^c \mathbb{E}[\unicode{x1D7D9}_{\{t \geq X\}}] \,dt\\= \int_c^{\infty} \mathbb{P}(X>t) \,dt + \int_{-\infty}^c \mathbb{P}(X \leq t) \,dt.$

By Leibniz's integral rule, the first-order condition of $\mathbb{E}[|X-c|]$ is

$0=\partial_c \mathbb{E}[|X-c|] = (0-\mathbb{P}(X > c)) + (\mathbb{P}(X \leq c) - 0)$

$\implies 2\mathbb{P}(X \leq c) -1 = 0 \implies \mathbb{P}(X \leq c) = \frac{1}{2} = \mathbb{P}(X > c) = \mathbb{P}(X \geq c)$.

The last inequality holds since $X$ is a continuous real-valued random variable. The probability measure of a singleton is of measure $0$.

By definition, $c$ is the median of $X$ when $\mathbb{E}[|X-c|]$ is minimized.

Why does the median minimize $E(|X-c|)$?

6 Answers6

Linked

Related