3

Suppose I have a list of values $S$ containing $$x_1, x_2,...,x_n$$ where each $x_i \in \Bbb R$. Suppose I order them from smallest to largest $$x_{(1)}, x_{(2)},...,x_{(n)}$$

Is there a function or an optimization problem that achieves this? Or is the median found only through a procedure of reordering?

  • 4
    the question is not clear, do you search for algorithm to calculate the median? or an algorithm to change a set to an ordered set? – ℋolo Jan 14 '18 at 07:10
  • @holo I think he searches for a way to estimate the median without having to order the values. – zwim Jan 14 '18 at 07:15
  • 1
    Also keep in mind that sorting is a problem that can be solved in $O(n\log n)$ steps. Finding the median requires at least $n$ steps, since one has to check each value at least once. Hence, sorting is not that bad. – chi Jan 14 '18 at 10:06
  • 1
    @chi: Not only that, there are algorithms specifically designed to find medians (or any percentile) in $O(n)$ time. –  Jan 14 '18 at 11:19
  • Your question is phrased very confusingly. You have "Suppose I order them from smallest to largest" in the middle of your question, but the next sentence seems to have nothing to do with the thought that you started there. You have "Is there a function or an optimization problem that achieves this?" and semantically "this" would refer to ordering a list of values from smallest to largest, but it seems you meant for "this" to refer to finding the median in a list of values. – JLRishe Jan 14 '18 at 17:33

2 Answers2

11

A median is precisely a minimizer of the function $$t \mapsto \sum_{i=1}^{n}|t-x_i|.$$ (It may not be unique which is why I write a minimizer.) An optimization algorithm which does not reorder is described in Problem 1.4.11 of Kenneth Lange's MM Optimization Algorithms, SIAM.

Added: The algorithm proceeds as follows. Let $t_0$ be different from all the $x_i$. Given $k\in\{0,1,2,\ldots\}$ and $t_k$, set $w_{k,i} := 1/|t_k-x_{i}|$ and update $$t_{k+1} := \frac{\sum_{i=1}^{n}w_{k,i}x_i}{\sum_{i=1}^{n}w_{k,i}}.$$ I am not sure if this is faster than ordering, but it is interesting. There is even a variant not just for the median, but for quantiles.

max_zorn
  • 4,875
  • i am not sure if finding the minimizer of that function is better in runtime than just order the set. do you know $\mathcal O$ of what getting the minimizer of that function is? – ℋolo Jan 14 '18 at 07:27
  • For differentiable functions with appropriate first and second derivatives, we can just use calc to find the min or max. But this function doesn't let us do that. So in practice, the only way to find the median is like reordering, yes? – Stan Shunpike Jan 14 '18 at 07:39
  • @StanShunpike: You don't need to completely reorder; it suffices that you find the element/the two elements in the middle. Note that this can be done more efficiently than fully sorting; see e.g. here. – celtschk Jan 14 '18 at 08:38
  • This is slower than the good sorting algorithms, the main problem is that this can give sets that it is impossible to solve from, for example, the set $a,b$, for this set the function equal to $|a-b|$ on the interval $[a,b]$, actually for all sets with even number of elements, between the two middle elements (by value) the function has slop of $0$, you can approximate the value using that interval, but pinpoint the value is not easy(without using $\min$ and $\max$ functions) – ℋolo Jan 14 '18 at 09:42
  • @StanShunpike we can calculate the plot at any given point: $f'(t)=l(t)-g(t)$ where $g(t)$ is the number of elements in the set that their value is greater than $t$ and $l(t)$ is the same just with value less than $t$. But this won't help with finding the median, sadly – ℋolo Jan 14 '18 at 09:48
  • @Holo: The OP hasn't said anything related to a notion of speed! –  Jan 14 '18 at 11:31
  • @Hurkyl you are right, I think this is a great answer, I just gave my thought and what I know about this method – ℋolo Jan 14 '18 at 11:33
3

You don't say any way, so I assume you are looking for an algorithm that a computer might implement to do the calculation.

There is a very simple algorithm that doesn't involve reordering the values. For simplicity, I'll describe the $n$ odd case.

  • For each index $i$:
    • Count how many of the entries are less than $x_i$
    • Count how many of the entries are greater than $x_i$
    • Count how many of the entries are equal to $x_i$
    • If these counts are consistent with $x_i$ being the median, halt and output $x_i$

That said, I'm virtually certain this is an XY problem. You are only likely to get an answer that is useful for whatever goal you are trying to achieve if you ask a question about whatever goal you are trying to achieve. (which will likely require you to also specify the context you're working in as well)

  • Isn't this roughly equivalent to use counting sort in $O(n)$, and then find the median? If so, it works fine, under the assumptions of counting sort. – chi Jan 14 '18 at 11:38
  • 3
    @chi this has run time of $\mathcal O(n^2)$, for each $i$ check all $x_i$.(finding median from sorted set is $\mathcal O(1)$, so it is not the same) – ℋolo Jan 14 '18 at 11:39
  • @Holo That's correct, but that's so bad that I was assuming the above worked on the same hypotheses as counting sort, so to count "how many are equal" in $O(n)$ time. Otherwise the whole approach looks worse than the OP's approach (sort first). – chi Jan 14 '18 at 15:31
  • 1
    @chi as Hurkyl said on the other answer, the question has nothing to do with runtime, it is true that this is very bad algorithm for the runtime, but it answers OPs question – ℋolo Jan 14 '18 at 15:37
  • @chi: And it's not necessarily even bad for runtime; e.g. it parallelizes well which is likely to make it fairly effective in a many-thread small-data setting (although I don't doubt there are more complicated specialized algorithms for such a setting). There are so many things the OP might be trying to achieve; maybe my answer really is exactly what he needs! But really, the most important thing I intended to convey with my answer is the need to better elaborate upon the actual problem. –  Jan 14 '18 at 15:41
  • A parallel $O(n^2)$ looks worse than the non-parallel $O(n\log n)$ to me. Still, I agree that this answers the question -- it is a non-sorting algorithm that performs the task, even if it has worse complexity. – chi Jan 14 '18 at 15:59