Can someone please refer a good article explaining why we use a Sigmoid activation in the final layer of a neural network and a ReLu activation in the middle layers and input layer while building a Convolutional Neural Network? I am not getting how that results in the right output.
Asked
Active
Viewed 4,853 times
1 Answers
3
Relu
solves the gradient vanishing problem and stops the inactive neurons. Using Sigmoid
in output layer because its range is (0, 1) and it can represent the probability of binary class. When you are doing multi-classification, it is more appropriate to use Softmax
function.
Maybe you need to read more materials, you can refer to CS231n. Thanks.