1

While implementing AlexNet (model-code), one of the thing I need to do was to initialize the biases of the convolutional layers and fully connected layers.

Normally we initialize biases with 0s, but the paper says:

We initialized the neuron biases in the second, fourth, and fifth convolutional layers, as well as in the fully-connected hidden layers, with the constant 1.

So I went ahead and initialized the biases to 1 as the paper says. But that didn't make the network learn at all. Basically the last fully connected layer was producing a lot of 0s, which is otherwise known as dying-relu-problem. Out of 4096 neurons only 40 or 50 were producing non-zeros.

After lot of debugging, I came to realize that: if I make the fully connected layers' bias to 0 than they are nicely learning. loss decreased nicely.

Now I'm wondering:

  • How bias plays the role for dying-relu-problem here ?
  • Can all dying-relu-problem be corrected using bias searching ?
Ethan
  • 1,633
  • 9
  • 24
  • 39
Abhisek
  • 111
  • 4

0 Answers0