Convolutional Neural Networks (CNNs) use almost always the rectified linear activation function (ReLU):
$$f(x) = max(0, x)$$
However, the derivative of this function is
$$f'(x) = \begin{cases} 0 &\text{if } x \leq 0\\ 1&\text{otherwise}\end{cases}$$
(ignoring that is not differentiable at $0$, as I think it is done in practice). For inputs > 0 this is fine, but why doesn't it matter that the gradient is 0 at every point < 0? Or does it matter? (Are there publications about this problem?)
If a neuron outputs 0 for every sample of the training data, it is basically lost, correct? Its weights will never be adjusted again?