When computing loss functions, people use $(target-actual)^2$. They sqaure it to prevent any negative loss. But we can even use $|(target-actual)|$ to prevent any negative loss. So, why do people prefer the first option more than the second?
Asked
Active
Viewed 88 times
-2
-
This question has been answered many times here. Only one additional remark: Without the square, you have $(x-y) \neq (y-x)$, if $x \neq y$, which is not what you want (so it would be not symmetric and also produce negative values). So one question would be, why not using $||x-y||{2} = \sqrt(\sum{i}(x_{i}-y_{i})^2)$, and why is optimizing $||x-y||_{2}^{2}$ better over $|x-y|$. You find all answers at DS:SE. – Graph4Me Consultant Sep 23 '20 at 11:51
-
https://datascience.stackexchange.com/questions/63186/what-is-the-difference-between-euclidean-distance-and-rmse?rq=1 – Graph4Me Consultant Sep 23 '20 at 11:57
-
https://datascience.stackexchange.com/questions/12728/minimize-absolute-values-of-errors-instead-of-squares/12739#12739 – Graph4Me Consultant Sep 23 '20 at 11:58
-
Thankyou Everyone, I have found this post that answers my question: https://stats.stackexchange.com/a/48268/295839 – Dhruv Agarwal Sep 23 '20 at 12:27
1 Answers
1
Apart from the correct answers which you find in the comment section, you mention "the square [..] prevent any negative loss".
In principle you can also have a negative loss. The point is without the square, you have $(x-y) \neq (y-x)$ for $x \neq y$. In particular, the loss would not be symmetric and for $x = 0$, you have $(x-y) = -y$. So by increasing $y$ you decrease the loss. The loss would thus not be lower bounded so that there is no global minimum for the loss.

Graph4Me Consultant
- 1,014
- 5
- 15
-
You said 'In principle you can also have a negative loss', I don't think we can have negative losses because then the negative losses could cancel out with the positive losses. – Dhruv Agarwal Sep 23 '20 at 13:22
-
If you minimize $\sum_{i} ||\mathrm{target}{i}-\mathrm{actual}{i}||{2}^{2}-\theta{i}$, where $\theta = \theta_{i} > 0$ is fixed. you will get the same optimal result and the loss can have negative values. – Graph4Me Consultant Sep 23 '20 at 13:24
-
ok, may be replace "loss" with "objective function". But still my point is that without the square, computing $x-y$ is completely wrong (and this is not due to negative values) but as it is not measuring anything usefull. – Graph4Me Consultant Sep 23 '20 at 13:41
-
For example, you could have a neural network that minimizes or maximizes the cosine similarity. In this case, the optimal objective value is either 1 or -1. – Graph4Me Consultant Sep 23 '20 at 13:44