Input normalization for ReLu?

Question

Let's assume a vanilla MLP for classification with a given activation function for hidden layers.

I know it is a known best practice to normalize the input of the network between 0 and 1 if sigmoid is the activation function and -0.5 and 0.5 if tanh is the activation function.

What about ReLu ?

Should I normalise the network input between 0 and 1, -0.5 and 0.5, or -1 and 1

Any known best practices there?

I am not talking about normalisation of the input of the ReLu like using Batch Normalisation just before or just after the ReLu : https://arxiv.org/pdf/1508.00330

But I am talking about normalising the input of the whole network.

One has to scale/normalize the input data so that no feature dominates during the learning process. — tagoma, Dec 20 '17 at 09:17

Green Falcon · Accepted Answer · 2017-12-20T18:46:00.567

You have to normalize your data to accelerate learning process but based on experience its better to normalize your data in the standard manner, mean zero and standard deviation one. Although mapping to other small intervals near to zero may also be fine but the latter case usually takes more time than the other. If you use ReLU, again based on experience, you have to normalize your data and use standard initialization techniques for your weights, like He or Glorot methods. The reason is that your should avoid each activation to be so large, because your net would be so much dependent to that activation, and you may have overfitting problem. When you use ReLU because there is no limit for its output, you have to normalize the input data and also use initialization techniques that avoid having large values for weights. For more information I encourage you taking a look at here and here.

Input normalization for ReLu?

1 Answers1