I have a question about backpropagation, I'm a beginner, I'm studying the formulas to calculate the delta of neurons, there are several sources on the internet, which teach in different ways, so I'm confused about the formulas presented in the explanations, as they are a little different. Please, could someone help me understand if the explanations in these two articles are equivalent?:
This first article: https://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html, it says that the error of a neuron in the current hidden layer is the sum of all (error* connection weight)
of the neurons in the next layer. The error calculation in this article does not use the derivative of the neuron activation when calculating the error. Instead, it will only use the derivative when updating the weights, using the formula "wNew = wOld + (learningRate * error * derivative * input)
", this is where it will use the derivative.
However, in this second article: https://machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/, which teaches how to program a multi-layer perceptron from scratch in Python, he does it a little differently, In the formula for updating the weights he uses a variable called "delta", he explains that the "delta" of a neuron is the (neuron's error * derivative from the neuron's activation)
, and to calculate this delta we need the neuron's error. In the code presented in this article, the error of a neuron in the hidden layer is also a sum, but it is the sum of (neuron delta * connection weight)
of the neurons in the next layer, therefore it first calculates the error of all neurons, then it calculates the "deltas" of all neurons (so, to calculate the deltas of neurons in one layer it uses the deltas of the next layer), and then updates the weights. And since in this article he already calculated the delta as being the (error * derivative), when updating the weights, the formula for updating the weights is simply "wNew = wOld + (learningRate * delta * input)
"
So I'm confused, specifically about SUM. I would really like to understand why in the first article: https://home.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html, he calculates the neuron error without using the derivative, and only uses the derivative just when updating the weights. And on the other hand, in this other article: https://machinelearningmastery.com/implement-backpropagation-algorithm-scratch-python/, it calculates the errors and then calculates the deltas using the derivative to calculate these deltas, to then update the weights without needing to use a derivative in the formula to update the weights
Why are there these differences in the formulas of the two articles? This is a question I've had for a long time, I would really like to understand. Are the explanations in these two articles equivalent? Please help me clarify this doubt?