2

Consider an even-sized set of numbers $X = \{x_k\}$, such as $X = \{1, 2, 7, 10\}$.
The median $m$ is defined as:

$$m = \mathrm{arg \min_x} \sum_k \lvert x_k - x\rvert^1$$

Any $m \in [2, 7]$ is a minimizer of this function, and is therefore "a" median of this list.
Now, it is common practice to take the average of 2 and 7 and call it "the" median.

But that's lame, and I think I have invented (?) a more logical way to find a unique median $m^*$:

$$m^* = \lim_{\epsilon \to 0^+} \mathrm{arg \min_x} \sum_k \left\lvert x_k - x\right\rvert^{1+\epsilon}$$

Differentiation to find the minimum only gets us so far:

$$\sum_k \mathrm{sgn}{\left(x_k - m^*\right)}\left\lvert x_k - m^*\right\rvert^{\epsilon} = 0$$ This expression can be solved numerically for smaller and smaller $\epsilon$ to give $m^* \approx 4.85$ in this example, and I suspect the "correct" median is in fact $m^* = 34/7$, but I don't know how to prove it.

I have 3 questions:

  1. First of all, is this a well-known and/or useful approach? Does it have a name?
    I came up with the new formulation myself, but I've never seen it used anywhere.

  2. Is there some way to directly find the exact value of $m^*$, without numerical optimization?
    If not, is there a better/faster approach than brute-force numerical optimization techniques?

  3. Is this a (convex?) optimization problem, and if not, can it be reformulated as one?
    The trouble here is that I can't find any objective function that has a unique minimum at $m^*$.
    The best I can do is to find a generalized function (i.e., the limit of another function), but when I do that, I don't think the problem is a convex optimization problem anymore.
    Is there another way to pose the problem that conforms better to existing optimization frameworks?

user541686
  • 13,772
  • I am confused. If $m^$ is the minimizer of a function over the elements $x\in X$, then how could $m^=4.85\notin X$? –  Jun 26 '14 at 20:36
  • @NotNotLogical: Where did you get the idea that we must have $x \in X$? – user541686 Jun 26 '14 at 20:39
  • Ah, I was misreading. I think I now understand. –  Jun 26 '14 at 20:39
  • @PeterSheldrick: Huh? The problem is the very fact that it's not unique, even though there's a perfectly sensible definition that I demonstrated gives a unique answer. It doesn't make sense to say 2 is a median of that set, nor does it make sense to say 7 is, because neither is in the "center" of the data, even though they all minimize the typical objective. Hence it makes sense to look for a better number and thus a better objective. – user541686 Jun 26 '14 at 21:52
  • @PeterSheldrick: Because, logically, I think it's common sense that "the center" of something should be singular, not some sort of hypersurface, irrespective of how you want to measure central tendency. If you don't find this to be sufficient motivation to make the problem interesting then I'm sorry, I don't have a better motivation for you. – user541686 Jun 26 '14 at 22:12
  • Peter, I would disagree that it makes sense to refer to 2 or 7 as potential medians. In either case there are not an equal number of points on either side. Mehrdad has taken the median-finding property of $\ell_1$ regression and elevated it to a new definition of median. It is an abuse of notation at the very least. And calling it "lame" to choose the midpoint is, frankly, silly. – Michael Grant Jun 27 '14 at 05:01
  • @MichaelGrant: The reason I called it "lame" is that it mixes -- quite arbitrarily -- an L2 norm into a purely-L1 regression problem. There are no theoretical grounds for doing this; it's just something that has been done out of the blue with no justification, so I think it's lame. Sorry if you disagree. If you don't like the fact that I called it a "median", pretend I called it something else. The rest of the question (aside from the first bullet point) still stands. – user541686 Jun 27 '14 at 05:32
  • In what way is it mixing the $\ell_2$ norm into it? You certainly have not shown that here. Now it's possible that you can arrive at the midpoint using the $\ell_2$ norm somehow, but it by no means requires it; it would be just as much a convenience as using the $\ell_1$ norm to compute the median in the first place. And if theoretical justification is the standard, what could you possibly offer for your limiting case? It makes for a nice math problem, but it's no less arbitrary. What statistically interpretive value does 34/7 have? – Michael Grant Jun 27 '14 at 14:39
  • @MichaelGrant: I mean (no pun intended) that the mean minimizes the total L2 norm from the data, whereas the median minimizes the total L1 norm. Expectation (aka mean) is linear, whereas median isn't. But when people take the mean of the 2 median endpoints, they're assuming the rest of the data doesn't affect the answer -- whereas it does, because we're talking about medians, not means, and median isn't linear. My best interpretation for 34/7 is that it's the "most typical value" from that kind of set, but I'm not sure. It's certainly not arbitrary, as you can see from how I derived it. – user541686 Jun 27 '14 at 19:28
  • Well I know how the midpoint convention is derived as well. Simply knowing the derivation doesn't make something less arbitrary. Indeed, the $|\cdot|^{1+\epsilon}$ approach also has the rather unattractive (in my view) property that all of the data points affect the result. It seems to me that one of the nice features of the median that it is insensitive to the behavior of outliers; e.g., changing $1$ to $1.25$ or $10$ to $10.5$ would have no effect, whereas with the $|\cdot|^{1+\epsilon}$ method, it would. Clearly value is in the eye of the beholder here. – Michael Grant Jun 27 '14 at 22:15
  • @MichaelGrant: How is the midpoint convention "derived" though? It seems to me that you could just as well choose a value twice as close to the smaller endpoint, and it would still preserve the properties you wanted. That's why I say the midpoint is arbitrary. My method on the other hand doesn't leave any room for choice, it gives a singular answer, and it still prevents outliers from moving the median beyond the endpoints. – user541686 Jun 27 '14 at 22:51
  • Yes, a $2/3$rd split would also ensure the property I described above. But that's just one potential property. For instance, the midpoint is maximally robust to small i.i.d. perturbations in the elements. In contrast I'm not seeing any practical/statistic advantage to your approach here beyond the aesthetic. Yes, it's unique, it's correct, but so are a variety of others. – Michael Grant Jun 27 '14 at 23:05

2 Answers2

3

If the values are sorted $x_1 \le x_2 \ldots \le x_n$, then the value $m^*$ is the unique solution on the interval $[x_{n/2},x_{n/2+1}]$ to the following equation:

$$ (m^* - x_1)(m^*-x_2)\ldots(m^*-x_{n/2})=(x_{n/2+1}-m^*)(x_{n/2+2}-m^*)\ldots(x_{n}-m^*). $$

When $n=2$, it's just the mean, and when $n=4$, $m^* = (x_3x_4-x_1x_2)/(x_3+x_4-x_1-x_2)$, which in your example is $(7\cdot 10-1\cdot2)/(7+10-1-2)=34/7$. I don't see any simple way to solve the equation in closed form for higher $n$, other than standard techniques for finding roots of polynomials.

To prove that the above equation defines $m^*$, you just need to go few steps further in the manipulation of the derivative. That is $ m^* = \lim_{\epsilon\rightarrow 0^+} m_\epsilon $, where $m_\epsilon$ is the solution to:

$$\sum_{k=1}^n \mathrm{sgn}{\left(x_k - m_\epsilon\right)}\left\lvert x_k - m_\epsilon\right\rvert^{\epsilon} = 0$$

For small $\epsilon$, we should have $x_{n/2}\le m_\epsilon \le x_{n/2+1}$, so this becomes:

$$-\sum_{k=1}^{n/2} (m_\epsilon-x_k)^{\epsilon} + \sum_{k=n/2+1}^{n} (x_k-m_\epsilon)^{\epsilon}=0$$

Expanding to first order in $\epsilon$ gives: $$-\sum_{k=1}^{n/2} (1+\epsilon \log(m_\epsilon-x_k)) + \sum_{k=n/2+1}^{n} (1+\epsilon\log(x_k-m_\epsilon))=O(\epsilon^2)$$

The constant terms cancel, and dividing by $\epsilon$ gives: $$-\sum_{k=1}^{n/2} \log(m_\epsilon-x_k) + \sum_{k=n/2+1}^{n} \log(x_k-m_\epsilon)=O(\epsilon)$$

Then by taking the limit as $\epsilon \rightarrow 0^+$, we get: $$-\sum_{k=1}^{n/2} \log(m^*-x_k) + \sum_{k=n/2+1}^{n} \log(x_k-m^*)=0$$ Which is equivalent to the stated condition.

There are different ways to approximate the 1-norm with a differentiable function, and each approximation will give a different unique "median". I don't know of any reason to prefer any one approximation over another other than convenience.

p.s.
  • 6,401
  • +1 Holy cow, this looks exactly like the kind of answer I was hoping for! So you used approximated $m^* - x_k$ to first-order to be equal to its linearization $1 + \epsilon \log(m^* - x_k)$, because it's equal in the infinitesimal case? It seems so obvious in hindsight but it's very clever, I wouldn't have thought of it for quite a long time! Thanks so much, I learned something new today from your answer. :) – user541686 Jun 28 '14 at 00:30
  • 1
    Glad to help. I rewrote the argument to be more rigorous. Hopefully it's clearer now. – p.s. Jun 28 '14 at 01:18
  • Indeed, very nice! Well done. – Michael Grant Jun 28 '14 at 03:36
  • I'll bet that this is very amenable to a simple numerical search (Newton and/or bisection) with the midpoint as an initial condition. – Michael Grant Jun 28 '14 at 03:39
  • It also makes me wonder why first-order approximations are so special. A second-order approximation would seem correct but unhelpful, whereas a zeroth-order approximation would tell us nothing. So what's so special about first-order that gives us the answer we want exactly in the limiting case? Maybe I should ask that as a question... – user541686 Jun 28 '14 at 11:54
  • It seems to me they are special precisely because they are the simplest approximations that are also useful! – Michael Grant Jun 29 '14 at 16:13
  • @MichaelGrant: Haha yes but I meant what inherently makes them useful? On another note, it seems like the median can be defined as the number that equalizes the product of its distance to the numbers below it with that of the numbers above it, which I think is pretty cool! – user541686 Jul 08 '14 at 22:58
1

Unfortunately, this approach is not compatible with convex optimization in practice.

The reason is that in an optimization context, a convex function and its epigraph are assumed interchangeable. That is to say: consider the following two problems: $$\begin{array}{ll} \text{minimize} & f(x) \end{array}$$ $$\begin{array}{ll} \text{minimize} & y \\ \text{subject to} & f(x) \leq y \end{array}$$ These problems are equivalent if $f$ is convex: that is, given the solution to one, the solution to the other is evident, and vice-versa. Of course, the second one has an associated dual variable while the first one does not, but that doesn't change the equivalence.

Now let's consider your function for $f$, set in the second form above: $$\begin{array}{ll} \text{minimize} & y \\ \text{subject to} & \lim_{\epsilon\rightarrow 0^+} \sum_k | x_k - x |^{1+\epsilon} \leq y \end{array}$$ This is, in all practical respects, equivalent to $$\begin{array}{ll} \text{minimize} & y \\ \text{subject to} & \sum_k | x_k - x | \leq y \end{array}$$ which is of course what you'd get with the standard median function. Any practical system for optimization is really not going to be able to differentiate between the two forms. You could, of course, fix $\epsilon$ to be small and nonzero, but then you've destroyed equivalence, and of course made the leap from a linear problem to a nonlinear one.

Conceptually, what is happening here is that you are preferring a particular element of the arg min set over the rest. But establishing preferences among feasible points is precisely what the purpose of an objective function is. You need to find a way to integrate your preferences more directly into your objective or constraints. For instance, if you are determine to preserve the numerical results induced by the $|\cdot|^{1+\epsilon}$ approach---in particular, if it is important that $34/7$ be the correct answer in this example---then you will not be able to use this median function in convex optimization.

Michael Grant
  • 19,450
  • Thanks for the response, but I feel it's begging my own question when you say "You need to find a way to integrate your preferences more directly into your objective or constraints."... the entire point of my question was to figure out how I was supposed to do that, because as I had already mentioned, I already realized this was unlikely to work with standard (convex) optimization. Your answer just basically summarized the problem I was facing but didn't help me get anywhere. – user541686 Jun 26 '14 at 21:06
  • I am certainly not claiming to have answered every bullet point. But you did ask if this could be formulated as a (convex?) optimization problem, and I answered it. – Michael Grant Jun 27 '14 at 04:46
  • Besides: your definition of median is non-standard. You well know that the standard median is unique (4.5). Even if we set aside the midpoint portion of the standard statistical definition, your optimization-based definition includes two numbers (2, 7) that do not satisfy even the fundamental criteria of a median. Your definition is already a convenience, then. Certainly you are not the only one to employ this convenience in practice, but the fact remains. – Michael Grant Jun 27 '14 at 05:08
  • I did not ask if this could be formulated as a convex optimization problem, I asked if it was a (perhaps convex) optimization problem, and if not, whether it could be re-formulated as such a problem. All you did was tell me, "You need to find a way to integrate your preferences more directly into your objective or constraints". In other words, you just repeated back at me the obvious fact that I need to reformulate it as a convex optimization problem, without helping me actually get anywhere. – user541686 Jun 27 '14 at 05:37
  • No. I am, in fact, telling you that you cannot reformulate this as a convex optimization problem, and I explained why. The epigraph equivalence is geometrical fundamental to convex optimization, and your attempt to use a limiting argument just can't work around that. – Michael Grant Jun 27 '14 at 14:41
  • I'm confused, I thought you said "you need to find a way to integrate your preferences more directly into your objective or constraints"... are you now saying that cannot be done here? If so then I misunderstood your answer. – user541686 Jun 27 '14 at 19:28
  • I did say that, yes. But what I'm talking about here is changing your function choice. I know you think your $|\cdot|^{1+\epsilon}$ invention is cool, but if you need some sort of convex median-finder, it's simply not going to work. It seems to me that you need to go back to your fundamental goal: uniqueness. If you're determined to think that $34/7$ is the best possible function value here, well, you're out of luck. But if you are willing to take any function that returns a unique median, then there might very well be a convex formulation. – Michael Grant Jun 27 '14 at 22:11
  • I made an edit to your answer that I think clarifies it, but I'm not sure; is it correct? – user541686 Jun 27 '14 at 22:39
  • Sure, that sounds fine. – Michael Grant Jun 27 '14 at 22:44
  • My edit got rejected... would you mind trying to add it (or something to the same effect) again? I'll accept it after that's clarified, but until then it's too unclear for me to accept it. – user541686 Jun 27 '14 at 22:53
  • 1
    I see no reason to accept my answer. It only addresses a part of your question. Nevertheless I will incorporate your edit... – Michael Grant Jun 27 '14 at 22:59