I started to work with my own implementation of backpropagation algorithm, that I made five years ago. For each training sample (input-output pair), I make a forward pass (to compute outputs of each neuron), backward pass (to compute "Deltas" for each neuron) and then, I update the weights.
When there are 3 layers of neurons: input layer, hidden layer, output layer, there are "two layers of weights". In a forward pass, I need to go through all weights (both layers). But in a backward pass, I omit the first layer of weights (that goes between the input layer and the first inner layer).
Is seems to be wrong, as I thought a backward pass requires going through all weights. But it seems to work quite well. So is it true, that not all weights are used in a backward pass?
When I use it for 20x20px images (layers 400 : 60 : 10), a backward pass is 40x faster than a forward pass, as it processes only 60x10 weights, omitting 60x400 weights.