How ReLU is bringing non linearity and why it is not an alternative to dropout?

Question

The differentiation of ReLU function is 1 when input is greater than 0, and 0, when input is less than or equal to 0. In the backpropagation process it doesn’t change the value of d(error)/d(weight) at all. Either the gradient is multiplied by 1, or by 0. Which means it only helps to discard the negative inputs. It feels like it works as dropout. Instead of using ReLU, if we use dropout, shouldn’t be it almost same? We use non linear activation function to bring non linearity. But isn’t it also linear transformation. Suppose a training dataset where all the inputs are positive and in the initial model all the weights are positive. Then ReLu(wx+b) ultimately becomes wx+b. How come it is bringing non linearity? I am hella confused about the whole thing.

https://datascience.stackexchange.com/questions/26475/why-is-relu-used-as-an-activation-function#:~:text=bf(y)-,ReLU%20is%20not%20linear.,plane%20using%20a%20straight%20line.
this might help. — Arpit Sisodia, Mar 29 '23 at 02:09

score 0 · Answer 1 · answered Mar 29 '23 at 19:48

0

When all the weights and the inputs are positive you don't have a non-linearity.

To make it work, weights have to be randomly initialized, with some negative and some positive.

answered Mar 29 '23 at 19:48

Iya Lee

152
8

How ReLU is bringing non linearity and why it is not an alternative to dropout?

1 Answers1