I have a Convolutional Neural Network, that is structured as binary classifier. I have two relatively standard convolutional/relu/pooling layers followed by 2 layer fully connected network outputting to a softmax with loss layer for the binary classification. However I observed something unusual:
In Version 1 of the network I had both convolutional layers derive 10 features, upon initialisation I had a Cross Entropy Error of around 28.
In Version 2 I upped the number of features of the convolutional layers to 64 features. Despite still having the same fully connected layers and the same softmax, my Cross Entropy Error jumped to 340.0
My question is, why would this happen. Surely the randomness is the same and the softmax with loss function should normalise so that both outputs add up to 1. So why would Cross Entropy suddenly jump so high
My understanding of the cross entropy effects for outputting large numbers was helped by this answer