4

Convolutional Neural Networks (CNNs) use almost always the rectified linear activation function (ReLU):

$$f(x) = max(0, x)$$

However, the derivative of this function is

$$f'(x) = \begin{cases} 0 &\text{if } x \leq 0\\ 1&\text{otherwise}\end{cases}$$

(ignoring that is not differentiable at $0$, as I think it is done in practice). For inputs > 0 this is fine, but why doesn't it matter that the gradient is 0 at every point < 0? Or does it matter? (Are there publications about this problem?)

If a neuron outputs 0 for every sample of the training data, it is basically lost, correct? Its weights will never be adjusted again?

Ethan
  • 1,633
  • 9
  • 24
  • 39
Martin Thoma
  • 18,880
  • 35
  • 95
  • 169

1 Answers1

3

ignoring that is not differentiable at 00, as I think it is done in practice

yes see ReLUs are not differentiable at zero

If a neuron outputs 0 for every sample of the training data, it is basically lost, correct? Its weights will never be adjusted again?

yes see What is the "dying ReLU" problem in neural networks?

Franck Dernoncourt
  • 5,690
  • 10
  • 40
  • 76