3

So I was working on a classification task with the help of a NN. The data-set was normalised, weights random between 0-1, and all the activations were sigmoid function.

Now, when I used a 2 hidden layer model the accuracy was 50%, whereas, when I used 1 hidden layer model the accuracy was 99%. Isn't this contrary to intuitive understanding about NN's. I knew more layers means better fitting even over-fitting, but apparently something different is happening in this case (maybe the values outputted by the second hidden layer is too small for the output layer to discern). So what exactly am I missing?

Green Falcon
  • 14,058
  • 9
  • 57
  • 98
DuttaA
  • 793
  • 6
  • 24

3 Answers3

3

Maybe you are making a mistake, put your code here. But without seeing your code, these are possible points:

  • Vanishing problem, I don't think you this problem due to having a very shallow network. You can change your activation function to relu for avoiding that.
  • Covariat shift, What it means is that similar to input features which have to be normalized, the inputs of the deeper layers have to be normalized to. The normalization process is in a way that the inputs to each layer should have a special distribution that does not change. You can use batch normalisation for avoiding that problem.
  • Bug in coding, you may have fed the activations of each layer to the next layer in a wrong way or you may have updated the weights not simultaneously If you have not used vectorization. There are also numerous different reasons which may lead to bugs in your code.

  • The number of neurons may not be enough for each layer, try to increase them. For understanding the meaning of increasing the layers and the number of neurons in each layer, you can take a look at here.

Green Falcon
  • 14,058
  • 9
  • 57
  • 98
  • Only the second one is feasible since the code works perfectly fine for single layer and also that the gradient for the last layer is almost identical for all connections..Or maybe I wasn't inputting enough significant features, although that shouldn't have been a problem since I tried with even 100 nodes for just 10 inputs – DuttaA Jun 08 '18 at 13:17
  • @DuttaA how many training data do you have? – Green Falcon Jun 08 '18 at 14:30
  • I used about 500 though I have 900 – DuttaA Jun 08 '18 at 14:54
  • I wasn't inputting enough significant features* I don't agree with that if you had reached to 99% accuracy.
  • – Green Falcon Jun 08 '18 at 15:20
  • @Media in my answer DuttaA claims to have constant accuracy and decreasing loss function. What's your opinion on this? Because, to me, it does not make sense at all. – David Masip Jun 12 '18 at 07:47
  • @DavidMasip basically I just say my opinion. loss function consists of a sum over the distance of desired outputs and the real value of different inputs. Suppose that for one class, for classification tasks, one class has numerous data and can be learnt very well by the network. and one other class for different reasons cannot be learnt very well. The class which is learnt very well my ouput 0.6 as the output of the softmax for that class or 0.9 which the latter shows a more confident prediction. going from 0.6 to 0.9 decreases the error and accuracy does not change. – Green Falcon Jun 12 '18 at 20:13