0

I was wondering whether there is an easy way to show the following:

We have a data set $x_1,...,x_n$ and $m$ is a median if for at least half of the n data points we have that $x_i \le m$ and for the other half we have $x_i \ge m$.

Now I want to show that $m$ is a median iff $m$ minimizes the following 1 norm, such that: $||(x_1-m,...,x_n-m)||_1 = \text{inf }_x ||(x_1-x,...,x_n-x)||_1$.

Unfortunately it seems so, as if calculus would not work here, so is there an easy way to show this?

1 Answers1

6

I don't understand the fetish for calculus. There should be a way without it (in fact, personally I prefer avoiding it if necessary):

Order the data as follows: $x_1 < x_2 \ldots <x_{n}$. Let's suppose that $x \in [x_i,x_{i+1}]$. Then we have the following the $$\|(x_1, \ldots x_n) - (x,x\ldots x) \|_1 = \sum_{j \leq i}(x-x_j) + \sum_{j>i} {(x_j-x)}$$

which is a linear function (when restricted to the interval $[x_i,x_{i+1}]$) with gradient equal to the number of $j$ such that $j \leq i$ minus the the number of $j$ such that $j >i$. Thus the gradient is zero if it is less than a median, it is positive if greater than a medium, and zero precisely when there are as many data points to the left and right. In other words, $x$ decreases as we approach a medium from below, stays constant on the set of mediums, and then decreases afterwards. So minimums occurs exactly at mediums.

Note that this is a piecewise linear function (depends on the interval $x$ is in) so calculus is no good.

  • 3
    Your answer is correct, but gradients smell of the c-word. ;-) – JPi Mar 05 '14 at 02:18
  • 1
    Yeah, gradients -> calculus (just in case if, as me, you didn't get the first time what c-word was about) – PALEN Nov 04 '16 at 18:37