1

Can someone please refer a good article explaining why we use a Sigmoid activation in the final layer of a neural network and a ReLu activation in the middle layers and input layer while building a Convolutional Neural Network? I am not getting how that results in the right output.

1 Answers1

3

Relu solves the gradient vanishing problem and stops the inactive neurons. Using Sigmoid in output layer because its range is (0, 1) and it can represent the probability of binary class. When you are doing multi-classification, it is more appropriate to use Softmax function.

Maybe you need to read more materials, you can refer to CS231n. Thanks.