Please, could someone help me understand if the backpropagation explanations in these two articles about calculate the error are equivalent?

Question

I have a question about backpropagation, I'm a beginner, I'm studying the formulas to calculate the delta of neurons, there are several sources on the internet, which teach in different ways, so I'm confused about the formulas presented in the explanations, as they are a little different. Please, could someone help me understand if the explanations in these two articles are equivalent?:

This first article: https://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html, it says that the error of a neuron in the current hidden layer is the sum of all (error* connection weight) of the neurons in the next layer. The error calculation in this article does not use the derivative of the neuron activation when calculating the error. Instead, it will only use the derivative when updating the weights, using the formula "wNew = wOld + (learningRate * error * derivative * input)", this is where it will use the derivative.

However, in this second article: https://machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/, which teaches how to program a multi-layer perceptron from scratch in Python, he does it a little differently, In the formula for updating the weights he uses a variable called "delta", he explains that the "delta" of a neuron is the (neuron's error * derivative from the neuron's activation), and to calculate this delta we need the neuron's error. In the code presented in this article, the error of a neuron in the hidden layer is also a sum, but it is the sum of (neuron delta * connection weight) of the neurons in the next layer, therefore it first calculates the error of all neurons, then it calculates the "deltas" of all neurons (so, to calculate the deltas of neurons in one layer it uses the deltas of the next layer), and then updates the weights. And since in this article he already calculated the delta as being the (error * derivative), when updating the weights, the formula for updating the weights is simply "wNew = wOld + (learningRate * delta * input)"

So I'm confused, specifically about SUM. I would really like to understand why in the first article: https://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html, he calculates the neuron error without using the derivative, and only uses the derivative just when updating the weights. And on the other hand, in this other article: https://machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/, it calculates the errors and then calculates the deltas using the derivative to calculate these deltas, to then update the weights without needing to use a derivative in the formula to update the weights

Why are there these differences in the formulas of the two articles? This is a question I've had for a long time, I would really like to understand. Are the explanations in these two articles equivalent? Please help me clarify this doubt?

score 0 · Answer 1 · answered Nov 09 '23 at 21:54

The difference you spotted is only cosmetic when dealing with formulas yet both are equivalent. In general for any delta rule ultimately driven by some form of error be it primitive correction or Widrow-Hoff/LMS like SGD correction follows the three multiplicative terms (learning rate, error, input) format of your 2nd reference, thus the backpropagated error for each (hidden) neuron should've already absorbed the derivative of the neuron activation complying with this convention for the middle error term of the said neuron. This is reflected in the code below of the backward_propagate_error(network, expected) function in your 2nd reference. Of course there's nothing wrong to invoke the derivative of activation in the last update step as reflected in your 1st reference.

            neuron = layer[j]
            neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

Please, could someone help me understand if the backpropagation explanations in these two articles about calculate the error are equivalent?

1 Answers1