7

Lets say we have a set of numbers $\{ 5, 7, 1, 2, 5, 100 \}$. I want to find a number $x$ such that the sum of distances of every number from the set to $x$ is minimal.

My first thought was that $x$ is the average of all elements of the set: $\frac{5+7+1+2+5+100}{6}$, but it is not true, it fails the above example.

Any help or hint will be appriciated, thanks.

  • The answer is wrong. For example, $100$ is closer to each element than $104$ is. – Harnak Jan 29 '19 at 11:01
  • 3
    https://math.stackexchange.com/questions/113270/the-median-minimizes-the-sum-of-absolute-deviations-the-l-1-norm – Matti P. Jan 29 '19 at 11:04
  • @Harnak Sorry but I don't quite understand your statement. $104$ is the total distance $\sum_{x \in S} |x - n|$ where $n$ is the desired number and $S = {1, 2, 5, 5, 7, 100}$. How do you achieve $100$? See my proof. – L. F. Jan 29 '19 at 11:12
  • @L.F., I was just answering to the OP before editing. He stated that the solution was 104, but I thought he referred to the minimizer and not to the distance. So, that's why I provided an example of why 104 couldn't be a minimizer. I think I just misinterpreted what he meant. – Harnak Jan 29 '19 at 11:16
  • @Harnak Oh, never mind. Everybody makes mistakes :) – L. F. Jan 29 '19 at 11:17
  • Setting $x=$ the "average" (="the mean") minimizes the sqrt of sum-of-squares of the distances of each value from $x$. (This should also be in wikipedia at keyword "variance" and/or "arithmetic mean") – Gottfried Helms Jan 29 '19 at 13:22
  • Anyone familiar with calculus can view this answer: https://math.stackexchange.com/a/1024462/1069875 – Aryan Jun 21 '22 at 15:19

2 Answers2

10

You are looking to minimize $$\sum_{y \in A} |y - x|$$ with respect to $x$ where $A$ is your set.

It can be proved that any median minimizes this problem. In your case, the only median is $5$, so that's the result.

Harnak
  • 1,587
  • 2
  • 10
  • 15
  • 9
    I upvoted your answer because it's correct, but I'd like to make a suggestion about the phrase "this is a well known result": it appears to serve the purpose of making you look well-read, and the OP look ignorant, which might both be true, but does saying it help? Probably not. If you'd included a reference, that'd be great -- OP could learn something. Most likely, this exercise is leading up to the very result you cited, and explaining the computation in this case would be more educational than citing the very thing that can be proved in general from similar computations. – John Hughes Jan 29 '19 at 13:26
  • You're right. However, that was not my intent. I'll edit this. Thanks for the suggestion :) – Harnak Jan 29 '19 at 23:15
5

First sort your [multi]set: $\{1, 2, 5, 5, 7, 100\}$. The number you want is $5$. The sum is $4 + 3 + 0 + 0 + 2 + 95 = 104.$

Proof: suppose you have another number $n \neq 5$.

Note that for any number $x$, $|x - a| + |x - b| \ge |a - b|$ by the triangle inequality, where equality holds if and only if $x$ is between $a$ and $b$ (inclusive). Hence, it must hold that the sum of its distances to the two $5$s, i.e. $$|n - 5| + |n - 5| \ge |5 - 5| + |5 - 5| = 0.$$ Similarly, $$|n - 2| + |n - 7| \ge |5 - 2| + |5 - 7| = 5,$$ $$|n - 1| + |n - 100| \ge |5 - 1| + |5 - 100| = 99.$$

You can't have the total distance any lower. Q.E.D.

In general, first sort your set, then any number between (including) the middle two numbers will do. For example, for set ${1, 2, 3, 4, 5, 6}$, any $x$ such that $3 \le x \le 4$ does.

L. F.
  • 1,940