No, I wouldn't consider backprop a training algorithm. Backpropagation is just a way to find the derivative of the loss function with respect to the inputs by using the chain rule. Computing a derivative doesn't train anything.
What you do with this derivative in order to minimize the loss function is the training part.
EDIT:
I think it will depend on who you ask. Take for example, this PyTorch tutorial. They say that "Backward propagation: In backprop, the NN adjusts its parameters proportionate to the error in its guess. It does this by traversing backwards from the output, collecting the derivatives of the error with respect to the parameters of the functions (gradients), and optimizing the parameters using gradient descent."
I.e. the two steps
loss.backward()
optim.step()
together are what they call backpropagation. This is what I'd call the more engineering view point and I believe is a semantic shift away from what I'd argue (see comments!) is actually backprop and that's just the loss.backward()
step.
The semantic drift of backprop meaning calculating the derivatives together with optimization makes sense in this context. Why would you call loss.backward()
and then not call optim.step()
? But, originally (and technically, the best kind of correct) backprop refers to just the computation of the derivatives and I'll think you'll find that terminology more in math/theory contexts instead of the programming/engineering contexts.