13

Background: While fitting neural networks with relu activation, I found that sometimes the prediction becomes near constant. I believe that this is due to the relu neurons dieing during training as stated here. (What is the "dying ReLU" problem in neural networks?)

Question: What im hoping to do is to implement a check in the code itself to check if the neurons are dead. After that, the code could refit the network if needed.

As such, what is a good citeria to check for dead neurons? Currently im thinking of checking for low variance in the prediction as a citeria.

If it helps, im using keras.

Aveiur
  • 141
  • 1
  • 3
  • Add a summary for the biases in tensorboard: https://www.tensorflow.org/get_started/summaries_and_tensorboard – Emre May 07 '17 at 17:17

2 Answers2

7

A dead ReLU pretty much just means that its argument value is negative such that the gradient stays at 0; no matter how you train it from that point on. You can simply have a look at the gradient during training to see whether a ReLU is dead or not.

In practice you may simply want to use leaky ReLUs, i.e. instead of f(x) = max(0,x) you set f(x) = x if x > 0 and f(x) = 0.01x if x <= 0. This way you always allow a small non-zero gradient and the unit should not get fully stuck in training anymore.

3

A dead neuron is a neuron that does not update during training, ie. 0 gradient.

Keras allows gradient extraction directly for a given row of data. (Another nice example)

Or you can extract the neuron weights and calculate the gradient yourself
(eg. for relu, negative argument to relu -> 0 gradient.)

Unfortunately, gradient is data point specific. Only if the gradient is 0 for every row of training data can you be sure that the neuron will not update for all minibatches during a training epoch.

Leaky relu can be a helpful strategy since there's no value for leaky relu where the gradient equals 0.

D Bolta
  • 131
  • 1