More layers in NN give worse result

Question

So I was working on a classification task with the help of a NN. The data-set was normalised, weights random between 0-1, and all the activations were sigmoid function.

Now, when I used a 2 hidden layer model the accuracy was 50%, whereas, when I used 1 hidden layer model the accuracy was 99%. Isn't this contrary to intuitive understanding about NN's. I knew more layers means better fitting even over-fitting, but apparently something different is happening in this case (maybe the values outputted by the second hidden layer is too small for the output layer to discern). So what exactly am I missing?

You probably want your weights to have some negative values... try -1 to 1 instead — kbrose, Jun 07 '18 at 02:47
What's your activation functions for the last layer? If possible share your code.. — Aditya, Jun 07 '18 at 08:09

score 3 · Answer 1 · answered Jun 08 '18 at 12:54

3

Maybe you are making a mistake, put your code here. But without seeing your code, these are possible points:

Vanishing problem, I don't think you this problem due to having a very shallow network. You can change your activation function to relu for avoiding that.
Covariat shift, What it means is that similar to input features which have to be normalized, the inputs of the deeper layers have to be normalized to. The normalization process is in a way that the inputs to each layer should have a special distribution that does not change. You can use batch normalisation for avoiding that problem.
Bug in coding, you may have fed the activations of each layer to the next layer in a wrong way or you may have updated the weights not simultaneously If you have not used vectorization. There are also numerous different reasons which may lead to bugs in your code.
The number of neurons may not be enough for each layer, try to increase them. For understanding the meaning of increasing the layers and the number of neurons in each layer, you can take a look at here.

answered Jun 08 '18 at 12:54

Green Falcon

14,058
9
57
98

Only the second one is feasible since the code works perfectly fine for single layer and also that the gradient for the last layer is almost identical for all connections..Or maybe I wasn't inputting enough significant features, although that shouldn't have been a problem since I tried with even 100 nodes for just 10 inputs – DuttaA Jun 08 '18 at 13:17
@DuttaA how many training data do you have? – Green Falcon Jun 08 '18 at 14:30
I used about 500 though I have 900 – DuttaA Jun 08 '18 at 14:54
I wasn't inputting enough significant features* I don't agree with that if you had reached to 99% accuracy.

Green Falcon

Jun 08 '18 at 15:20

@Media in my answer DuttaA claims to have constant accuracy and decreasing loss function. What's your opinion on this? Because, to me, it does not make sense at all. – David Masip Jun 12 '18 at 07:47

@DavidMasip basically I just say my opinion. loss function consists of a sum over the distance of desired outputs and the real value of different inputs. Suppose that for one class, for classification tasks, one class has numerous data and can be learnt very well by the network. and one other class for different reasons cannot be learnt very well. The class which is learnt very well my ouput 0.6 as the output of the softmax for that class or 0.9 which the latter shows a more confident prediction. going from 0.6 to 0.9 decreases the error and accuracy does not change. – Green Falcon Jun 12 '18 at 20:13

score 1 · Answer 2 · answered Jun 06 '18 at 23:24

1

Run 'Gradient Checking' to locate place where the error occurs because it looks a usual typical mistake. You shouldn't experience any numerical precision at such a shallow network

However, numerical precision might pop-in during the actual gradient checking. Have a look at this to be warned: link

Also, your weights might be too big - read about Xavier init

answered Jun 06 '18 at 23:24

Kari

2,726
2
20
49

No errors ...I always run gradient checking – DuttaA Jun 07 '18 at 04:24

score 1 · Answer 3 · answered Jun 07 '18 at 07:57

1

Training a nn is not always easy. If your traning accuracy is only 50% it means that your network is not really learning. Many problems may arise, but with such a simple problem I would bet that you are having a vanishing gradient problem. If you try activating your hidden layers with relu's, you might solve your vanishing gradient problem. I think it is worth giving a try.

If you are running your nn with Tensorflow, a way to check if your gradients are 0 is using Tensorboard.

answered Jun 07 '18 at 07:57

David Masip

6,051
2
24
61

You maybe right..Because the very first gradient for the last layer has same weights for all of the connections...Yet the cost decreases quite nicely – DuttaA Jun 07 '18 at 09:39
Does the cost decrease and the accuracy stays constant? – David Masip Jun 07 '18 at 09:46
Yep... Pretty much – DuttaA Jun 07 '18 at 11:20
Looks like a bug on how you measure accuracy. Are you doing batch gradient descent? – David Masip Jun 07 '18 at 14:16
Yes... How can there be a bug if single layer is working perfectly fine – DuttaA Jun 07 '18 at 15:27
I don't know, but if you think about it, it is kind of difficult to have a consant accuracy and decreasing loss. – David Masip Jun 07 '18 at 15:29

More layers in NN give worse result

3 Answers3