What minimizes mean absolute cubic error?

Question

Given a family of real numbers $a_k$, the median minimizes $\sum |x-a_k|$ and the mean minimizes $\sum |x-a_k|^2$.

Is there a known simple form of the value that minimizes $\sum |x-a_k|^3$ or any higher odd powers?

Please see some context below. The focus is the cubic.

I emphasize that I am asking this as a serious mathematical question looking for a simple formula or a proof that it does not exist. Or at least some other mathematical studies of this problem including work on the complexity of the problem.

I am able to write naive brute force numerical code to determine the appropriate value in individual cases. So code to do this is not the interest here except perhaps if it represents a considerable reduction of complexity over the brute force naive method.

The interest is in the mathematics - hence the posting in mathematics stack exchange.

Where I got to ...

The derivative of $|x-a_k|^{2n}$ is $2n(x-a_k)^{2n-1}$ because the absolute value is redundant in this case. The problem is then solved by finding the roots of a polynomial. It is as hard or easy as that task.

But for odd powers, $|x-a_k|^{2n+1}$, the derivative is $(2n+1)sgn(x-a_k)(x-a_k)^{2n}$, where sgn is the signum (or sign) function.

Hence, the derivative of the sum is $(2n+1)\sum_k sgn(x-a_k)(x-a_k)^{2n}$, which is the negative sum of some of the terms and the positive sum of the rest. It is equal to zero when the balance point $m$ is found so that $\sum_{k=1}^m (x-a_k)^{2n} = \sum_{k=m+1}^n (x-a_k)^{2n}$, assuming that the $a_k$ are ordered so that $s_k \le a_{k+1}$. One could, of course, solve this numerically.

For the cubic (and more generally) it would be a kind of median in the sense that is would be the point that balances $\sum (x-a_k)^2$, so that there is equal weight of this on either side. This seems to be the more general concept - as the mean balances $\sum (x-a_k)^1$ and the median $\sum (x-a_k)^0$.

Notice that each $(x-a_k)^{2n}$ is positive definite with monotonic derivative. And so the sum of any of them is positive. If you partition it with nothing on the right then the left set is greater. Nothing on the left and the right set is greater. At some unique point they must flip over. That is there is a particular $k=m$ such that swapping this changes the balance. Now, if you put $x=a_m$, then they will be imbalanced. If the left is bigger, then reduces $x$ so that some of $(x-a_k)^{2n}$ will be subtracted, or the other way if the left is smaller. From this it is apparent that there exists a unique such $x$. It will be between $a_{m-1}$ and $a_{m+1}$. Except in the case that there are less than 3 points, which can be handled separately.

Can anyone give me references to other work on this problem?

You can make a loss function $e(x) = \sum_k |x - a_k|^n$ and solve for $e'(x) = 0$ to find the minimum. (with the caveat that we can exclude points with $x = a_k$ without changing the value of the sum to ensure that the function is always differentiable) From there, when $n$ is even we should be able to expand using the binomial theorem to get an analytic solution. (I'll post an example using the fourth power) — Stephen Donovan, Nov 09 '21 at 00:54
@StephenDonovan of course. That was where I got the information in the post from. But, the question is whether there is a simple way to solve for $e'(x)=0$. — Bruce, Nov 09 '21 at 03:41
I believe I've found a method for solving it, and given a little bit of testing it appears to be working. I'll type up my solution for how I made it work but until then here's a link to my coded solution: https://replit.com/join/hehrifswfv-stephendonovan1 — Stephen Donovan, Nov 09 '21 at 12:54
@StephenDonovan I don't have a replit account, but I am certainly keen to see what you come up with. I have been tinkering with some algorithms myself. — Bruce, Nov 10 '21 at 04:08
Oh sorry, I didn't think about people needing replit to look at it. I posted a new link to a text sharing site, the whitespace didn't translate but all of the code is there — Stephen Donovan, Nov 10 '21 at 13:45

Stephen Donovan · Accepted Answer · 2021-11-12T21:39:47.880

I don't believe there's a simple answer for the cubed errors due to some complications from the absolute value, but here's what I would do for the fourth power:

Consider the loss function $e(x) = \sum (x - a_k)^4.$ We can optimize $e(x)$ by looking for when $e'(x) = 0.$ Because the sum is finite, we can just differentiate termwise to get $$e'(x) = \sum 4(x - a_k)^3 = 0 \Rightarrow \sum (x - a_k)^3 = 0$$

Now we can expand using the binomial theorem to obtain

$$\sum (x^3 - 3a_kx^2 + 3a_k^2x - a_k^3) = nx^3 - 3x^2\sum a_k + 3x\sum a_k^2 - \sum a_k^3 = 0$$

Plugging in the necessary sums, we have a cubic which can be solved for $x$ using the cubic formula. (which I'm not going to write out in full)

In order to show that this point will minimize $e(x),$ we can take a second derivative, $e''(x) = \sum 12(x - a_k)^2,$ which must be positive if there are at least two distinct points, because all the terms are nonnegative and at least one would have to be positive. (I believe this also proves that there must be exactly one real solution to the above cubic equation: if I'm wrong or if somebody has an algebraic reason for why, I'd love to know)

Edit: since you've said that your focus was primarily on extending this logic to the cubes, here are my thoughts for how to extend this logic to the cubed errors.

As you said, when we take the derivative of our error function $e(x) = \sum |x - a_k|^3, $ because of the inner derivative of the absolute value we get $e'(x) = 3 \sum \text{sgn}(x - a_k)\cdot(x - a_k)^2 = 0,$ and the sign function makes it difficult to solve analytically.

Here my idea was that because the sign function is what makes this problem difficult, we can eliminate it by assuming that $x$ is between two of the points, let's say $a_{i-1}$ and $a_i.$ This allows us to calculate the coefficients on $x$ in our $e'(x) = 0$ equation and obtain some solution. If the solution is between $a_{i-1}$ and $a_i,$ then it is a solution to our equation and we're done. Otherwise, it is not a solution because it's outside the bounds where the equation is valid, and we have to keep looking.

Given some choice of $i,$ here's the algebra I did to get my equation for $x$:

$$\sum_{k < i} (x - a_k)^2 = \sum_{k \geq i} (x - a_k)^2$$ $$\sum_{k < i} (x^2 - 2a_kx + a_k^2) = \sum_{k \geq i} (x^2 - 2a_kx + a_k^2)$$ $$i x^2 - (2\sum_{k < i} a_k)x + \sum_{k < i}a_k^2 = (n-i) x^2 - (2\sum_{k \geq i} a_k)x + \sum_{k \geq i}a_k^2$$ $$(2i - n)x^2 + (-2\sum \text{sgn}(i-k)\cdot a_k)x + \sum \text{sgn}(i-k)a_k^2 = 0$$

(where I've taken $\text{sgn}(0) = 1$ for convenience)

Now this is just a quadratic equation which we can solve using the quadratic formula. (for higher orders, numerical methods such as Newton's method or regula falsi can be used)

So, we solve the equation, and if one of the solutions is within the proper bounds then it's the solution, otherwise we check the next $i$ and iterate through the list until we find the solution.

Edit 2: For large sets of numbers, applying Newton's method to the entire $e'(x) = 0$ equation should provide a major improvement because we would no longer have to check each interval separately. In order to do this, first we must show that $e'$ is differentiable.

To do this, consider that $e'$ is the sum of finitely many functions of the form $f_k(x) = 3(x - a_k)^2\text{sgn}(x - a_k).$ If we could prove that each of these functions are differentiable, then the entire $e'$ function must be differentiable.

Now consider the rewriting

$$f_k(x) = \begin{cases} -3(x - a_k)^2, & x \leq a_k \\ 3(x - a_k)^2, & x > a_k \end{cases}$$

For $x < a_k$ we should clearly have $f_k'(x) = -6(x - a_k),$ and for $x > a_k$ we have $f_k'(x) = 6(x - a_k).$ The only point where it's not clear the function should be differentiable is $x = a_k,$ however if we set up the limit definition it should be immediately clear that the derivative does exist:

$$\lim_{x \to a_k^+} \frac{f(x) - f(a_k)}{x - a_k} = \lim_{x \to a_k^+}\frac{3(x - a_k)^2 - 0}{x - a_k} = \lim_{x \to a_k^+} 3(x - a_k) = 0$$

$$\lim_{x \to a_k^-} \frac{f(x) - f(a_k)}{x - a_k} = \lim_{x \to a_k^-}\frac{-3(x - a_k)^2 - 0}{x - a_k} = \lim_{x \to a_k^-} -3(x - a_k) = 0$$

So, $f_k$ is differentiable and its derivative is $6|x - a_k|,$ which means that $e'$ is differentiable and we have $e''(x) = \sum_k 6|x - a_k|.$

Now implementing Newton's method gives us:

$$x_{n+1} = x_n - \frac{e'(x_n)}{e''(x_n)} = x_n - \frac{\sum_k \text{sgn}(x_n-a_k)(x_n - a_k)^2}{\sum_k 2|x_n - a_k|}$$

and simply starting at the average of the two extreme values of the set has been sufficient for my light testing, but other initial estimates can be chosen depending on the expected properties of the dataset. In general I don't think this function is particularly choosy in this regard, because $e'',$ Newton's method's relevant derivative, is always positive and generally very well-behaved.

I've implemented this solution in Python on Replit here if you would like to look.

Plaintext code for anyone who doesn't have Replit here.

A valid discussion of some of the context and the case for even powers. Yes, the even powers are easy. The problem is to do it for the cubic. I will modify my post to emphasize what the hard core of the question was. — Bruce, Nov 09 '21 at 03:46
I appreciate your putting the effort in, and discussing it on Stack Exchange with you has helped to clarify my thoughts. But, what your solution is is a paraphrase of what I was alluding to in the original post. That it is the balance point you are looking for. As I understand it - your solution is just the obvious naive brute force scan evaluating each option in turn. Did I misread this in some way? So, I take your answer to be that you are not aware of any simple solution or any other work on the topic? — Bruce, Nov 11 '21 at 00:45
I’m not aware of other work on the topic, and yes this is just searching for the balance point. I thought you just needed a way to calculate it, so I provided what I knew would work since we didn’t have anything for it yet. My apologies if the brute-forceness doesn’t meet your criteria. If it’s just the time efficiency that worries you then I did just realize that we can just do Newton’s method since $e’$ itself is differentiable, and that would make it work much better for large numbers of points. (I’d implement this and edit the answer now but I’m away from my computer) — Stephen Donovan, Nov 11 '21 at 01:21
The comment about Newton's method is an interesting one. But, so far my machinations on those types of approaches leads to the fact that the information that derivatives give about the function is trapped into the interval between two data points. The derivatives change suddenly at the boundaries. My agenda here, though, is that I am trying to understand an instability in this that arises in the context of doing it with 40 million numbers that are all the result of floating point crunching. That's why I need a mathematical theory - so that I can get some intuition about why it is unstable. — Bruce, Nov 12 '21 at 10:55
Are the derivatives not continuous at the boundaries? Based on my work I believe they would be, and my plots seem to suggest the same. In any case I think I got an implementation of Newton's method to match my naive approach. (new edit on the way) — Stephen Donovan, Nov 12 '21 at 11:13
A typical smooth function is non analytic. Being smooth does not mean that the derivatives at a point give global information at a point. Although you do open a definite question about the limit of the scope of Newton's method. https://math.stackexchange.com/questions/2407659/why-does-the-newton-raphson-method-not-converge-for-some-functions#:~:text=Newton%27s%20method%20does%20not%20always%20converge.%20Its%20convergence,the%20root%20you%20can%20have%20highly%20nontrivial%20dynamics. — Bruce, Nov 12 '21 at 11:24
The derivative is not positive - it is mono-tonic increasing (which is more to the point) and an important point that is. While I still feel that this might run into trouble with exponentially distributed data, I take your point. It seems that naive newton's method works okay here (I was so far from expecting this that I did not even try) - and its behaviour is not too far off quickselect, and maybe it can be reworked into a form of median-of-medians. Whatever the case -- I now feel that I have a mathematical entry into the problem. — Bruce, Nov 12 '21 at 12:45
Yes pardon me, to be more precise I meant $e''$ was positive, which is the relevant derivative for Newton's method since the original function it considers is $e'.$ (I'll correct this in another brief edit) I agree that Newton's method working was surprising, the only reason I realized it was plausible was because of plots I had made to verify my naive solution. In any case, I'm glad I could be of assistance, and I hope you have a nice day. — Stephen Donovan, Nov 12 '21 at 21:37

What minimizes mean absolute cubic error?

1 Answers1