Why do people prefer $(target-actual)^2$ over $|(target-actual)|$

Question

When computing loss functions, people use $(target-actual)^2$. They sqaure it to prevent any negative loss. But we can even use $|(target-actual)|$ to prevent any negative loss. So, why do people prefer the first option more than the second?

This question has been answered many times here. Only one additional remark: Without the square, you have $(x-y) \neq (y-x)$, if $x \neq y$, which is not what you want (so it would be not symmetric and also produce negative values). So one question would be, why not using $||x-y||{2} = \sqrt(\sum{i}(x_{i}-y_{i})^2)$, and why is optimizing $||x-y||_{2}^{2}$ better over $|x-y|$. You find all answers at DS:SE. — Graph4Me Consultant, Sep 23 '20 at 11:51
https://datascience.stackexchange.com/questions/63186/what-is-the-difference-between-euclidean-distance-and-rmse?rq=1 — Graph4Me Consultant, Sep 23 '20 at 11:57
https://datascience.stackexchange.com/questions/12728/minimize-absolute-values-of-errors-instead-of-squares/12739#12739 — Graph4Me Consultant, Sep 23 '20 at 11:58
Thankyou Everyone, I have found this post that answers my question: https://stats.stackexchange.com/a/48268/295839 — Dhruv Agarwal, Sep 23 '20 at 12:27

Graph4Me Consultant · Answer 1 · 2020-09-23T13:55:50.253

1

Apart from the correct answers which you find in the comment section, you mention "the square [..] prevent any negative loss".

In principle you can also have a negative loss. The point is without the square, you have $(x-y) \neq (y-x)$ for $x \neq y$. In particular, the loss would not be symmetric and for $x = 0$, you have $(x-y) = -y$. So by increasing $y$ you decrease the loss. The loss would thus not be lower bounded so that there is no global minimum for the loss.

edited Sep 23 '20 at 13:55

answered Sep 23 '20 at 13:19

Graph4Me Consultant

1,014
5
15

You said 'In principle you can also have a negative loss', I don't think we can have negative losses because then the negative losses could cancel out with the positive losses. – Dhruv Agarwal Sep 23 '20 at 13:22
If you minimize $\sum_{i} ||\mathrm{target}{i}-\mathrm{actual}{i}||{2}^{2}-\theta{i}$, where $\theta = \theta_{i} > 0$ is fixed. you will get the same optimal result and the loss can have negative values. – Graph4Me Consultant Sep 23 '20 at 13:24
ok, may be replace "loss" with "objective function". But still my point is that without the square, computing $x-y$ is completely wrong (and this is not due to negative values) but as it is not measuring anything usefull. – Graph4Me Consultant Sep 23 '20 at 13:41
For example, you could have a neural network that minimizes or maximizes the cosine similarity. In this case, the optimal objective value is either 1 or -1. – Graph4Me Consultant Sep 23 '20 at 13:44

Why do people prefer $(target-actual)^2$ over $|(target-actual)|$

1 Answers1