-2

When computing loss functions, people use $(target-actual)^2$. They sqaure it to prevent any negative loss. But we can even use $|(target-actual)|$ to prevent any negative loss. So, why do people prefer the first option more than the second?

  • This question has been answered many times here. Only one additional remark: Without the square, you have $(x-y) \neq (y-x)$, if $x \neq y$, which is not what you want (so it would be not symmetric and also produce negative values). So one question would be, why not using $||x-y||{2} = \sqrt(\sum{i}(x_{i}-y_{i})^2)$, and why is optimizing $||x-y||_{2}^{2}$ better over $|x-y|$. You find all answers at DS:SE. – Graph4Me Consultant Sep 23 '20 at 11:51
  • https://datascience.stackexchange.com/questions/63186/what-is-the-difference-between-euclidean-distance-and-rmse?rq=1 – Graph4Me Consultant Sep 23 '20 at 11:57
  • https://datascience.stackexchange.com/questions/12728/minimize-absolute-values-of-errors-instead-of-squares/12739#12739 – Graph4Me Consultant Sep 23 '20 at 11:58
  • Thankyou Everyone, I have found this post that answers my question: https://stats.stackexchange.com/a/48268/295839 – Dhruv Agarwal Sep 23 '20 at 12:27

1 Answers1

1

Apart from the correct answers which you find in the comment section, you mention "the square [..] prevent any negative loss".

In principle you can also have a negative loss. The point is without the square, you have $(x-y) \neq (y-x)$ for $x \neq y$. In particular, the loss would not be symmetric and for $x = 0$, you have $(x-y) = -y$. So by increasing $y$ you decrease the loss. The loss would thus not be lower bounded so that there is no global minimum for the loss.

Graph4Me Consultant
  • 1,014
  • 5
  • 15
  • You said 'In principle you can also have a negative loss', I don't think we can have negative losses because then the negative losses could cancel out with the positive losses. – Dhruv Agarwal Sep 23 '20 at 13:22
  • If you minimize $\sum_{i} ||\mathrm{target}{i}-\mathrm{actual}{i}||{2}^{2}-\theta{i}$, where $\theta = \theta_{i} > 0$ is fixed. you will get the same optimal result and the loss can have negative values. – Graph4Me Consultant Sep 23 '20 at 13:24
  • ok, may be replace "loss" with "objective function". But still my point is that without the square, computing $x-y$ is completely wrong (and this is not due to negative values) but as it is not measuring anything usefull. – Graph4Me Consultant Sep 23 '20 at 13:41
  • For example, you could have a neural network that minimizes or maximizes the cosine similarity. In this case, the optimal objective value is either 1 or -1. – Graph4Me Consultant Sep 23 '20 at 13:44